* [PATCH v1 00/13] NFS/RDMA patches for 3.17
@ 2014-06-23 22:39 Chuck Lever
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
The main purpose of this series is to address more connection drop
recovery issues by fixing FRMR re-use to make it less likely the
client will drop the connection due to a memory operation error.
Some other clean-ups and fixes are present as well.
See topic branch nfs-rdma-for-3.17 in
git://git.linux-nfs.org/projects/cel/cel-2.6.git
I tested with NFSv3 and NFSv4 on all three supported memory
registration modes. Used cthon04 and iozone with both Solaris
and Linux NFS/RDMA servers. Used xfstests with Linux.
---
Chuck Lever (13):
xprtrdma: Fix panic in rpcrdma_register_frmr_external()
xprtrdma: Protect ->qp during FRMR deregistration
xprtrdma: Limit data payload size for ALLPHYSICAL
xprtrdma: Update rkeys after transport reconnect
xprtrdma: Don't drain CQs on transport disconnect
xprtrdma: Unclutter struct rpcrdma_mr_seg
xprtrdma: Encode Work Request opcode in wc->wr_id
xprtrdma: Back off rkey when FAST_REG_MR fails
xprtrdma: Refactor rpcrdma_buffer_put()
xprtrdma: Release FRMR segment buffers during LOCAL_INV completion
xprtrdma: Clean up rpcrdma_ep_disconnect()
xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
xprtrdma: Handle additional connection events
include/linux/sunrpc/xprtrdma.h | 2
net/sunrpc/xprtrdma/rpc_rdma.c | 77 +++++----
net/sunrpc/xprtrdma/transport.c | 17 +-
net/sunrpc/xprtrdma/verbs.c | 330 +++++++++++++++++++++++++++------------
net/sunrpc/xprtrdma/xprt_rdma.h | 63 ++++++-
5 files changed, 332 insertions(+), 157 deletions(-)
--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v1 01/13] xprtrdma: Fix panic in rpcrdma_register_frmr_external()
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-06-23 22:39 ` Chuck Lever
[not found] ` <20140623223909.1634.33362.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-23 22:39 ` [PATCH v1 02/13] xprtrdma: Protect ->qp during FRMR deregistration Chuck Lever
` (13 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
seg1->mr_nsegs is not yet initialized when it is used to unmap
segments during an error exit. Use the same unmapping logic for
all error exits.
"if (frmr_wr.wr.fast_reg.length < len) {" used to be a BUG_ON check.
The broken code should never be executed under normal operation.
Fixes: c977dea22708688eae31774f70126c97aa4dfe83
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 13dbd1c..78bd7c6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1545,9 +1545,8 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
frmr_wr.wr.fast_reg.length = page_no << PAGE_SHIFT;
if (frmr_wr.wr.fast_reg.length < len) {
- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
- return -EIO;
+ rc = -EIO;
+ goto out_err;
}
/* Bump the key */
@@ -1565,8 +1564,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
if (rc) {
dprintk("RPC: %s: failed ib_post_send for register,"
" status %i\n", __func__, rc);
- while (i--)
- rpcrdma_unmap_one(ia, --seg);
+ goto out_err;
} else {
seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
seg1->mr_base = seg1->mr_dma + pageoff;
@@ -1574,6 +1572,10 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
seg1->mr_len = len;
}
*nsegs = i;
+ return 0;
+out_err:
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
return rc;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 02/13] xprtrdma: Protect ->qp during FRMR deregistration
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-23 22:39 ` [PATCH v1 01/13] xprtrdma: Fix panic in rpcrdma_register_frmr_external() Chuck Lever
@ 2014-06-23 22:39 ` Chuck Lever
2014-06-23 22:39 ` [PATCH v1 03/13] xprtrdma: Limit data payload size for ALLPHYSICAL Chuck Lever
` (12 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Ensure the QP remains valid while posting LOCAL_INV during a
transport reconnect. Otherwise, ia->ri_id->qp is NULL, which
triggers a panic.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=259
Fixes: ec62f40d3505a643497d105c297093bb90afd44e
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 14 +++++++++++---
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 78bd7c6..f70b8ad 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -613,6 +613,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
/* Else will do memory reg/dereg for each chunk */
ia->ri_memreg_strategy = memreg;
+ rwlock_init(&ia->ri_qplock);
return 0;
out2:
rdma_destroy_id(ia->ri_id);
@@ -859,7 +860,7 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
int
rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
{
- struct rdma_cm_id *id;
+ struct rdma_cm_id *id, *old;
int rc = 0;
int retry_count = 0;
@@ -905,9 +906,14 @@ retry:
rc = -ENETUNREACH;
goto out;
}
- rdma_destroy_qp(ia->ri_id);
- rdma_destroy_id(ia->ri_id);
+
+ write_lock(&ia->ri_qplock);
+ old = ia->ri_id;
ia->ri_id = id;
+ write_unlock(&ia->ri_qplock);
+
+ rdma_destroy_qp(old);
+ rdma_destroy_id(old);
} else {
dprintk("RPC: %s: connecting...\n", __func__);
rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
@@ -1597,7 +1603,9 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
invalidate_wr.ex.invalidate_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
DECR_CQCOUNT(&r_xprt->rx_ep);
+ read_lock(&ia->ri_qplock);
rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
+ read_unlock(&ia->ri_qplock);
if (rc)
dprintk("RPC: %s: failed ib_post_send for invalidate,"
" status %i\n", __func__, rc);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 89e7cd4..97ca516 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -59,6 +59,7 @@
* Interface Adapter -- one per transport instance
*/
struct rpcrdma_ia {
+ rwlock_t ri_qplock;
struct rdma_cm_id *ri_id;
struct ib_pd *ri_pd;
struct ib_mr *ri_bind_mem;
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 03/13] xprtrdma: Limit data payload size for ALLPHYSICAL
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-23 22:39 ` [PATCH v1 01/13] xprtrdma: Fix panic in rpcrdma_register_frmr_external() Chuck Lever
2014-06-23 22:39 ` [PATCH v1 02/13] xprtrdma: Protect ->qp during FRMR deregistration Chuck Lever
@ 2014-06-23 22:39 ` Chuck Lever
2014-06-23 22:39 ` [PATCH v1 04/13] xprtrdma: Update rkeys after transport reconnect Chuck Lever
` (11 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
When the client uses physical memory registration, each page in the
payload gets its own array entry in the RPC/RDMA header's chunk list.
Therefore, don't advertise a maximum payload size that would require
more array entries than can fit in the RPC buffer where RPC/RDMA
headers are built.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/transport.c | 4 +++-
net/sunrpc/xprtrdma/verbs.c | 41 +++++++++++++++++++++++++++++++++++++++
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
3 files changed, 45 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 66f91f0..4185102 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -296,7 +296,6 @@ xprt_setup_rdma(struct xprt_create *args)
xprt->resvport = 0; /* privileged port not needed */
xprt->tsh_size = 0; /* RPC-RDMA handles framing */
- xprt->max_payload = RPCRDMA_MAX_DATA_SEGS * PAGE_SIZE;
xprt->ops = &xprt_rdma_procs;
/*
@@ -382,6 +381,9 @@ xprt_setup_rdma(struct xprt_create *args)
new_ep->rep_xprt = xprt;
xprt_rdma_format_addresses(xprt);
+ xprt->max_payload = rpcrdma_max_payload(new_xprt);
+ dprintk("RPC: %s: transport data payload maximum: %zu bytes\n",
+ __func__, xprt->max_payload);
if (!try_module_get(THIS_MODULE))
goto out4;
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index f70b8ad..3c7f904 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1819,3 +1819,44 @@ rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
rc);
return rc;
}
+
+/* Physical mapping means one Read/Write list entry per-page.
+ * All list entries must fit within an inline buffer
+ *
+ * NB: The server must return a Write list for NFS READ,
+ * which has the same constraint. Factor in the inline
+ * rsize as well.
+ */
+static size_t
+rpcrdma_physical_max_payload(struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
+ unsigned int inline_size, pages;
+
+ inline_size = min_t(unsigned int,
+ cdata->inline_wsize, cdata->inline_rsize) -
+ RPCRDMA_HDRLEN_MIN;
+ pages = inline_size / sizeof(struct rpcrdma_segment);
+ return pages << PAGE_SHIFT;
+}
+
+static size_t
+rpcrdma_mr_max_payload(struct rpcrdma_xprt *r_xprt)
+{
+ return RPCRDMA_MAX_DATA_SEGS << PAGE_SHIFT;
+}
+
+size_t
+rpcrdma_max_payload(struct rpcrdma_xprt *r_xprt)
+{
+ size_t result;
+
+ switch (r_xprt->rx_ia.ri_memreg_strategy) {
+ case RPCRDMA_ALLPHYSICAL:
+ result = rpcrdma_physical_max_payload(r_xprt);
+ break;
+ default:
+ result = rpcrdma_mr_max_payload(r_xprt);
+ }
+ return result;
+}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 97ca516..f3d86b2 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -348,6 +348,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *);
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/
int rpcrdma_marshal_req(struct rpc_rqst *);
+size_t rpcrdma_max_payload(struct rpcrdma_xprt *);
/* Temporary NFS request map cache. Created in svc_rdma.c */
extern struct kmem_cache *svc_rdma_map_cachep;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 04/13] xprtrdma: Update rkeys after transport reconnect
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (2 preceding siblings ...)
2014-06-23 22:39 ` [PATCH v1 03/13] xprtrdma: Limit data payload size for ALLPHYSICAL Chuck Lever
@ 2014-06-23 22:39 ` Chuck Lever
2014-06-23 22:39 ` [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect Chuck Lever
` (10 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Various reports of:
rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0
ep ffff8800bfd3e848
Ensure that rkeys in already-marshalled RPC/RDMA headers are
refreshed after the QP has been replaced by a reconnect.
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249
Suggested-by: Selvin Xavier <Selvin.Xavier-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 77 ++++++++++++++++++++-------------------
net/sunrpc/xprtrdma/transport.c | 11 +++---
net/sunrpc/xprtrdma/xprt_rdma.h | 10 +++++
3 files changed, 56 insertions(+), 42 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 693966d..6eeb6d2 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -53,14 +53,6 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif
-enum rpcrdma_chunktype {
- rpcrdma_noch = 0,
- rpcrdma_readch,
- rpcrdma_areadch,
- rpcrdma_writech,
- rpcrdma_replych
-};
-
#ifdef RPC_DEBUG
static const char transfertypes[][12] = {
"pure inline", /* no chunks */
@@ -286,6 +278,30 @@ out:
}
/*
+ * Marshal chunks. This routine returns the header length
+ * consumed by marshaling.
+ *
+ * Returns positive RPC/RDMA header size, or negative errno.
+ */
+
+ssize_t
+rpcrdma_marshal_chunks(struct rpc_rqst *rqst, ssize_t result)
+{
+ struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
+ struct rpcrdma_msg *headerp = (struct rpcrdma_msg *)req->rl_base;
+
+ if (req->rl_rtype != rpcrdma_noch) {
+ result = rpcrdma_create_chunks(rqst,
+ &rqst->rq_snd_buf, headerp, req->rl_rtype);
+
+ } else if (req->rl_wtype != rpcrdma_noch) {
+ result = rpcrdma_create_chunks(rqst,
+ &rqst->rq_rcv_buf, headerp, req->rl_wtype);
+ }
+ return result;
+}
+
+/*
* Copy write data inline.
* This function is used for "small" requests. Data which is passed
* to RPC via iovecs (or page list) is copied directly into the
@@ -377,7 +393,6 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
char *base;
size_t rpclen, padlen;
ssize_t hdrlen;
- enum rpcrdma_chunktype rtype, wtype;
struct rpcrdma_msg *headerp;
/*
@@ -415,13 +430,13 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* into pages; otherwise use reply chunks.
*/
if (rqst->rq_rcv_buf.buflen <= RPCRDMA_INLINE_READ_THRESHOLD(rqst))
- wtype = rpcrdma_noch;
+ req->rl_wtype = rpcrdma_noch;
else if (rqst->rq_rcv_buf.page_len == 0)
- wtype = rpcrdma_replych;
+ req->rl_wtype = rpcrdma_replych;
else if (rqst->rq_rcv_buf.flags & XDRBUF_READ)
- wtype = rpcrdma_writech;
+ req->rl_wtype = rpcrdma_writech;
else
- wtype = rpcrdma_replych;
+ req->rl_wtype = rpcrdma_replych;
/*
* Chunks needed for arguments?
@@ -438,16 +453,16 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* TBD check NFSv4 setacl
*/
if (rqst->rq_snd_buf.len <= RPCRDMA_INLINE_WRITE_THRESHOLD(rqst))
- rtype = rpcrdma_noch;
+ req->rl_rtype = rpcrdma_noch;
else if (rqst->rq_snd_buf.page_len == 0)
- rtype = rpcrdma_areadch;
+ req->rl_rtype = rpcrdma_areadch;
else
- rtype = rpcrdma_readch;
+ req->rl_rtype = rpcrdma_readch;
/* The following simplification is not true forever */
- if (rtype != rpcrdma_noch && wtype == rpcrdma_replych)
- wtype = rpcrdma_noch;
- if (rtype != rpcrdma_noch && wtype != rpcrdma_noch) {
+ if (req->rl_rtype != rpcrdma_noch && req->rl_wtype == rpcrdma_replych)
+ req->rl_wtype = rpcrdma_noch;
+ if (req->rl_rtype != rpcrdma_noch && req->rl_wtype != rpcrdma_noch) {
dprintk("RPC: %s: cannot marshal multiple chunk lists\n",
__func__);
return -EIO;
@@ -461,7 +476,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* When padding is in use and applies to the transfer, insert
* it and change the message type.
*/
- if (rtype == rpcrdma_noch) {
+ if (req->rl_rtype == rpcrdma_noch) {
padlen = rpcrdma_inline_pullup(rqst,
RPCRDMA_INLINE_PAD_VALUE(rqst));
@@ -476,7 +491,7 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
headerp->rm_body.rm_padded.rm_pempty[1] = xdr_zero;
headerp->rm_body.rm_padded.rm_pempty[2] = xdr_zero;
hdrlen += 2 * sizeof(u32); /* extra words in padhdr */
- if (wtype != rpcrdma_noch) {
+ if (req->rl_wtype != rpcrdma_noch) {
dprintk("RPC: %s: invalid chunk list\n",
__func__);
return -EIO;
@@ -497,30 +512,18 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
* on receive. Therefore, we request a reply chunk
* for non-writes wherever feasible and efficient.
*/
- if (wtype == rpcrdma_noch)
- wtype = rpcrdma_replych;
+ if (req->rl_wtype == rpcrdma_noch)
+ req->rl_wtype = rpcrdma_replych;
}
}
- /*
- * Marshal chunks. This routine will return the header length
- * consumed by marshaling.
- */
- if (rtype != rpcrdma_noch) {
- hdrlen = rpcrdma_create_chunks(rqst,
- &rqst->rq_snd_buf, headerp, rtype);
- wtype = rtype; /* simplify dprintk */
-
- } else if (wtype != rpcrdma_noch) {
- hdrlen = rpcrdma_create_chunks(rqst,
- &rqst->rq_rcv_buf, headerp, wtype);
- }
+ hdrlen = rpcrdma_marshal_chunks(rqst, hdrlen);
if (hdrlen < 0)
return hdrlen;
dprintk("RPC: %s: %s: hdrlen %zd rpclen %zd padlen %zd"
" headerp 0x%p base 0x%p lkey 0x%x\n",
- __func__, transfertypes[wtype], hdrlen, rpclen, padlen,
+ __func__, transfertypes[req->rl_wtype], hdrlen, rpclen, padlen,
headerp, base, req->rl_iov.lkey);
/*
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 4185102..f6d280b 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -597,13 +597,14 @@ xprt_rdma_send_request(struct rpc_task *task)
struct rpc_xprt *xprt = rqst->rq_xprt;
struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
- int rc;
+ int rc = 0;
- if (req->rl_niovs == 0) {
+ if (req->rl_niovs == 0)
rc = rpcrdma_marshal_req(rqst);
- if (rc < 0)
- goto failed_marshal;
- }
+ else if (r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
+ rc = rpcrdma_marshal_chunks(rqst, 0);
+ if (rc < 0)
+ goto failed_marshal;
if (req->rl_reply == NULL) /* e.g. reconnection */
rpcrdma_recv_buffer_get(req);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index f3d86b2..c270e59 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -99,6 +99,14 @@ struct rpcrdma_ep {
#define INIT_CQCOUNT(ep) atomic_set(&(ep)->rep_cqcount, (ep)->rep_cqinit)
#define DECR_CQCOUNT(ep) atomic_sub_return(1, &(ep)->rep_cqcount)
+enum rpcrdma_chunktype {
+ rpcrdma_noch = 0,
+ rpcrdma_readch,
+ rpcrdma_areadch,
+ rpcrdma_writech,
+ rpcrdma_replych
+};
+
/*
* struct rpcrdma_rep -- this structure encapsulates state required to recv
* and complete a reply, asychronously. It needs several pieces of
@@ -192,6 +200,7 @@ struct rpcrdma_req {
unsigned int rl_niovs; /* 0, 2 or 4 */
unsigned int rl_nchunks; /* non-zero if chunks */
unsigned int rl_connect_cookie; /* retry detection */
+ enum rpcrdma_chunktype rl_rtype, rl_wtype;
struct rpcrdma_buffer *rl_buffer; /* home base for this structure */
struct rpcrdma_rep *rl_reply;/* holder for reply buffer */
struct rpcrdma_mr_seg rl_segments[RPCRDMA_MAX_SEGS];/* chunk segments */
@@ -347,6 +356,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *);
/*
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/
+ssize_t rpcrdma_marshal_chunks(struct rpc_rqst *, ssize_t);
int rpcrdma_marshal_req(struct rpc_rqst *);
size_t rpcrdma_max_payload(struct rpcrdma_xprt *);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (3 preceding siblings ...)
2014-06-23 22:39 ` [PATCH v1 04/13] xprtrdma: Update rkeys after transport reconnect Chuck Lever
@ 2014-06-23 22:39 ` Chuck Lever
[not found] ` <20140623223942.1634.89063.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-23 22:39 ` [PATCH v1 06/13] xprtrdma: Unclutter struct rpcrdma_mr_seg Chuck Lever
` (9 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
CQs are not destroyed until unmount. By draining CQs on transport
disconnect, successful completions that can change the r.frmr.state
field can be missed.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 3c7f904..451e100 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -873,9 +873,6 @@ retry:
dprintk("RPC: %s: rpcrdma_ep_disconnect"
" status %i\n", __func__, rc);
- rpcrdma_clean_cq(ep->rep_attr.recv_cq);
- rpcrdma_clean_cq(ep->rep_attr.send_cq);
-
xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
id = rpcrdma_create_id(xprt, ia,
(struct sockaddr *)&xprt->rx_data.addr);
@@ -985,8 +982,6 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
{
int rc;
- rpcrdma_clean_cq(ep->rep_attr.recv_cq);
- rpcrdma_clean_cq(ep->rep_attr.send_cq);
rc = rdma_disconnect(ia->ri_id);
if (!rc) {
/* returns without wait if not connected */
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 06/13] xprtrdma: Unclutter struct rpcrdma_mr_seg
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (4 preceding siblings ...)
2014-06-23 22:39 ` [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect Chuck Lever
@ 2014-06-23 22:39 ` Chuck Lever
2014-06-23 22:39 ` [PATCH v1 07/13] xprtrdma: Encode Work Request opcode in wc->wr_id Chuck Lever
` (8 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Clean ups:
- make it obvious that the rl_mw field is a pointer -- allocated
separately, not as part of struct rpcrdma_mr_seg
- promote "struct {} frmr;" to a named type
- promote the state enum to a named type
- name the MW state field the same way other fields in
rpcrdma_mw are named
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 6 +++--
net/sunrpc/xprtrdma/xprt_rdma.h | 44 +++++++++++++++++++++++++++++----------
2 files changed, 36 insertions(+), 14 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 451e100..e8ed81c 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -156,9 +156,9 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
return;
if (wc->opcode == IB_WC_FAST_REG_MR)
- frmr->r.frmr.state = FRMR_IS_VALID;
+ frmr->r.frmr.fr_state = FRMR_IS_VALID;
else if (wc->opcode == IB_WC_LOCAL_INV)
- frmr->r.frmr.state = FRMR_IS_INVALID;
+ frmr->r.frmr.fr_state = FRMR_IS_INVALID;
}
static int
@@ -1518,7 +1518,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
dprintk("RPC: %s: Using frmr %p to map %d segments\n",
__func__, seg1->mr_chunk.rl_mw, i);
- if (unlikely(seg1->mr_chunk.rl_mw->r.frmr.state == FRMR_IS_VALID)) {
+ if (unlikely(seg1->mr_chunk.rl_mw->r.frmr.fr_state == FRMR_IS_VALID)) {
dprintk("RPC: %s: frmr %x left valid, posting invalidate.\n",
__func__,
seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey);
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index c270e59..28c8570 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -146,6 +146,38 @@ struct rpcrdma_rep {
};
/*
+ * struct rpcrdma_mw - external memory region metadata
+ *
+ * An external memory region is any buffer or page that is registered
+ * on the fly (ie, not pre-registered).
+ *
+ * Each rpcrdma_buffer has a list of these anchored in rb_mws. During
+ * call_allocate, rpcrdma_buffer_get() assigns one to each segment in
+ * an rpcrdma_req. Then rpcrdma_register_external() grabs these to keep
+ * track of registration metadata while each RPC is pending.
+ * rpcrdma_deregister_external() uses this metadata to unmap and
+ * release these resources when an RPC is complete.
+ */
+enum rpcrdma_frmr_state {
+ FRMR_IS_INVALID,
+ FRMR_IS_VALID,
+};
+
+struct rpcrdma_frmr {
+ struct ib_fast_reg_page_list *fr_pgl;
+ struct ib_mr *fr_mr;
+ enum rpcrdma_frmr_state fr_state;
+};
+
+struct rpcrdma_mw {
+ union {
+ struct ib_fmr *fmr;
+ struct rpcrdma_frmr frmr;
+ } r;
+ struct list_head mw_list;
+};
+
+/*
* struct rpcrdma_req -- structure central to the request/reply sequence.
*
* N of these are associated with a transport instance, and stored in
@@ -172,17 +204,7 @@ struct rpcrdma_rep {
struct rpcrdma_mr_seg { /* chunk descriptors */
union { /* chunk memory handles */
struct ib_mr *rl_mr; /* if registered directly */
- struct rpcrdma_mw { /* if registered from region */
- union {
- struct ib_fmr *fmr;
- struct {
- struct ib_fast_reg_page_list *fr_pgl;
- struct ib_mr *fr_mr;
- enum { FRMR_IS_INVALID, FRMR_IS_VALID } state;
- } frmr;
- } r;
- struct list_head mw_list;
- } *rl_mw;
+ struct rpcrdma_mw *rl_mw; /* if registered from region */
} mr_chunk;
u64 mr_base; /* registration result */
u32 mr_rkey; /* registration result */
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 07/13] xprtrdma: Encode Work Request opcode in wc->wr_id
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (5 preceding siblings ...)
2014-06-23 22:39 ` [PATCH v1 06/13] xprtrdma: Unclutter struct rpcrdma_mr_seg Chuck Lever
@ 2014-06-23 22:39 ` Chuck Lever
2014-06-23 22:40 ` [PATCH v1 08/13] xprtrdma: Back off rkey when FAST_REG_MR fails Chuck Lever
` (7 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:39 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
The wc->opcode field is unreliable when a completion fails.
Up until now, the completion handler has ignored unsuccessful
completions, so that didn't matter to xprtrdma.
In a subsequent patch, however, the send CQ handler will need
to know which Work Request opcode is completing, even if for
error completions.
xprtrdma posts three Work Request opcodes via the send queue:
SEND, FAST_REG_MR, and LOCAL_INV:
For SEND, wc->wr_id is zero. Those completions are ignored.
The other two plant a pointer to an rpcrdma_mw in wc->wr_id. Make
the low-order bit indicate which of FAST_REG_MR or LOCAL_INV is
being done.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 19 +++++++++++--------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 ++
2 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e8ed81c..cef67fd 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -145,20 +145,22 @@ rpcrdma_cq_async_error_upcall(struct ib_event *event, void *context)
static void
rpcrdma_sendcq_process_wc(struct ib_wc *wc)
{
- struct rpcrdma_mw *frmr = (struct rpcrdma_mw *)(unsigned long)wc->wr_id;
+ unsigned long wrid = wc->wr_id;
+ struct rpcrdma_mw *mw;
+ int fastreg;
- dprintk("RPC: %s: frmr %p status %X opcode %d\n",
- __func__, frmr, wc->status, wc->opcode);
+ dprintk("RPC: %s: wr_id %lx status %X opcode %d\n",
+ __func__, wrid, wc->status, wc->opcode);
- if (wc->wr_id == 0ULL)
+ if (wrid == 0)
return;
if (wc->status != IB_WC_SUCCESS)
return;
- if (wc->opcode == IB_WC_FAST_REG_MR)
- frmr->r.frmr.fr_state = FRMR_IS_VALID;
- else if (wc->opcode == IB_WC_LOCAL_INV)
- frmr->r.frmr.fr_state = FRMR_IS_INVALID;
+ fastreg = test_and_clear_bit(RPCRDMA_BIT_FASTREG, &wrid);
+ mw = (struct rpcrdma_mw *)wrid;
+
+ mw->r.frmr.fr_state = fastreg ? FRMR_IS_VALID : FRMR_IS_INVALID;
}
static int
@@ -1538,6 +1540,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
/* Prepare FRMR WR */
memset(&frmr_wr, 0, sizeof frmr_wr);
frmr_wr.wr_id = (unsigned long)(void *)seg1->mr_chunk.rl_mw;
+ frmr_wr.wr_id |= (u64)1 << RPCRDMA_BIT_FASTREG;
frmr_wr.opcode = IB_WR_FAST_REG_MR;
frmr_wr.send_flags = IB_SEND_SIGNALED;
frmr_wr.wr.fast_reg.iova_start = seg1->mr_dma;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 28c8570..6b5d243 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -177,6 +177,8 @@ struct rpcrdma_mw {
struct list_head mw_list;
};
+#define RPCRDMA_BIT_FASTREG (0)
+
/*
* struct rpcrdma_req -- structure central to the request/reply sequence.
*
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 08/13] xprtrdma: Back off rkey when FAST_REG_MR fails
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (6 preceding siblings ...)
2014-06-23 22:39 ` [PATCH v1 07/13] xprtrdma: Encode Work Request opcode in wc->wr_id Chuck Lever
@ 2014-06-23 22:40 ` Chuck Lever
[not found] ` <20140623224007.1634.55636.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-23 22:40 ` [PATCH v1 09/13] xprtrdma: Refactor rpcrdma_buffer_put() Chuck Lever
` (6 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:40 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
If posting a FAST_REG_MR Work Reqeust fails, or the FAST_REG WR
flushes, revert the rkey update to avoid subsequent
IB_WC_MW_BIND_ERR completions.
Suggested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index cef67fd..3efc007 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -61,6 +61,8 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif
+static void rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *);
+
/*
* internal functions
*/
@@ -154,13 +156,17 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
if (wrid == 0)
return;
- if (wc->status != IB_WC_SUCCESS)
- return;
fastreg = test_and_clear_bit(RPCRDMA_BIT_FASTREG, &wrid);
mw = (struct rpcrdma_mw *)wrid;
- mw->r.frmr.fr_state = fastreg ? FRMR_IS_VALID : FRMR_IS_INVALID;
+ if (wc->status == IB_WC_SUCCESS) {
+ mw->r.frmr.fr_state = fastreg ?
+ FRMR_IS_VALID : FRMR_IS_INVALID;
+ } else {
+ if (fastreg)
+ rpcrdma_decrement_frmr_rkey(mw);
+ }
}
static int
@@ -1480,6 +1486,24 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
}
+static void
+rpcrdma_increment_frmr_rkey(struct rpcrdma_mw *mw)
+{
+ struct ib_mr *frmr = mw->r.frmr.fr_mr;
+ u8 key = frmr->rkey & 0x000000FF;
+
+ ib_update_fast_reg_key(frmr, ++key);
+}
+
+static void
+rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *mw)
+{
+ struct ib_mr *frmr = mw->r.frmr.fr_mr;
+ u8 key = frmr->rkey & 0x000000FF;
+
+ ib_update_fast_reg_key(frmr, --key);
+}
+
static int
rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
int *nsegs, int writing, struct rpcrdma_ia *ia,
@@ -1487,8 +1511,6 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
{
struct rpcrdma_mr_seg *seg1 = seg;
struct ib_send_wr invalidate_wr, frmr_wr, *bad_wr, *post_wr;
-
- u8 key;
int len, pageoff;
int i, rc;
int seg_len;
@@ -1552,14 +1574,10 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
rc = -EIO;
goto out_err;
}
-
- /* Bump the key */
- key = (u8)(seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey & 0x000000FF);
- ib_update_fast_reg_key(seg1->mr_chunk.rl_mw->r.frmr.fr_mr, ++key);
-
frmr_wr.wr.fast_reg.access_flags = (writing ?
IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
IB_ACCESS_REMOTE_READ);
+ rpcrdma_increment_frmr_rkey(seg1->mr_chunk.rl_mw);
frmr_wr.wr.fast_reg.rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
DECR_CQCOUNT(&r_xprt->rx_ep);
@@ -1568,6 +1586,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
if (rc) {
dprintk("RPC: %s: failed ib_post_send for register,"
" status %i\n", __func__, rc);
+ rpcrdma_decrement_frmr_rkey(seg1->mr_chunk.rl_mw);
goto out_err;
} else {
seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 09/13] xprtrdma: Refactor rpcrdma_buffer_put()
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (7 preceding siblings ...)
2014-06-23 22:40 ` [PATCH v1 08/13] xprtrdma: Back off rkey when FAST_REG_MR fails Chuck Lever
@ 2014-06-23 22:40 ` Chuck Lever
2014-06-23 22:40 ` [PATCH v1 10/13] xprtrdma: Release FRMR segment buffers during LOCAL_INV completion Chuck Lever
` (5 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:40 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Split out the code that manages the rb_mws list.
A little extra error checking is introduced in the code path that
grabs MWs for the next RPC request. If rb_mws were ever to become
empty, the list_entry() would cause a NULL pointer dereference.
Instead, now rpcrdma_buffer_get() returns NULL, which causes
call_allocate() to delay and try again.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 105 +++++++++++++++++++++++++++------------
net/sunrpc/xprtrdma/xprt_rdma.h | 1
2 files changed, 74 insertions(+), 32 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 3efc007..f24f0bf 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -1251,6 +1251,69 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
kfree(buf->rb_pool);
}
+static void
+rpcrdma_put_mw_locked(struct rpcrdma_mw *mw)
+{
+ list_add_tail(&mw->mw_list, &mw->mw_pool->rb_mws);
+}
+
+static void
+rpcrdma_buffer_put_mw(struct rpcrdma_mw **mw)
+{
+ rpcrdma_put_mw_locked(*mw);
+ *mw = NULL;
+}
+
+/* Cycle mw's back in reverse order, and "spin" them.
+ * This delays and scrambles reuse as much as possible.
+ */
+static void
+rpcrdma_buffer_put_mws(struct rpcrdma_req *req)
+{
+ struct rpcrdma_mr_seg *seg1 = req->rl_segments;
+ struct rpcrdma_mr_seg *seg = seg1;
+ int i;
+
+ for (i = 1, seg++; i < RPCRDMA_MAX_SEGS; seg++, i++)
+ rpcrdma_buffer_put_mw(&seg->mr_chunk.rl_mw);
+ rpcrdma_buffer_put_mw(&seg1->mr_chunk.rl_mw);
+}
+
+static void
+rpcrdma_send_buffer_put(struct rpcrdma_req *req, struct rpcrdma_buffer *buffers)
+{
+ buffers->rb_send_bufs[--buffers->rb_send_index] = req;
+ req->rl_niovs = 0;
+ if (req->rl_reply) {
+ buffers->rb_recv_bufs[--buffers->rb_recv_index] = req->rl_reply;
+ req->rl_reply->rr_func = NULL;
+ req->rl_reply = NULL;
+ }
+}
+
+static struct rpcrdma_req *
+rpcrdma_buffer_get_mws(struct rpcrdma_req *req, struct rpcrdma_buffer *buffers)
+{
+ struct rpcrdma_mw *r;
+ int i;
+
+ for (i = RPCRDMA_MAX_SEGS - 1; i >= 0; i--) {
+ if (list_empty(&buffers->rb_mws))
+ goto out_empty;
+
+ r = list_entry(buffers->rb_mws.next,
+ struct rpcrdma_mw, mw_list);
+ list_del(&r->mw_list);
+ r->mw_pool = buffers;
+ req->rl_segments[i].mr_chunk.rl_mw = r;
+ }
+ return req;
+out_empty:
+ rpcrdma_send_buffer_put(req, buffers);
+ rpcrdma_buffer_put_mws(req);
+ return NULL;
+}
+
/*
* Get a set of request/reply buffers.
*
@@ -1263,10 +1326,9 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
struct rpcrdma_req *
rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
{
+ struct rpcrdma_ia *ia = rdmab_to_ia(buffers);
struct rpcrdma_req *req;
unsigned long flags;
- int i;
- struct rpcrdma_mw *r;
spin_lock_irqsave(&buffers->rb_lock, flags);
if (buffers->rb_send_index == buffers->rb_max_requests) {
@@ -1286,14 +1348,13 @@ rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
buffers->rb_recv_bufs[buffers->rb_recv_index++] = NULL;
}
buffers->rb_send_bufs[buffers->rb_send_index++] = NULL;
- if (!list_empty(&buffers->rb_mws)) {
- i = RPCRDMA_MAX_SEGS - 1;
- do {
- r = list_entry(buffers->rb_mws.next,
- struct rpcrdma_mw, mw_list);
- list_del(&r->mw_list);
- req->rl_segments[i].mr_chunk.rl_mw = r;
- } while (--i >= 0);
+ switch (ia->ri_memreg_strategy) {
+ case RPCRDMA_FRMR:
+ case RPCRDMA_MTHCAFMR:
+ req = rpcrdma_buffer_get_mws(req, buffers);
+ break;
+ default:
+ break;
}
spin_unlock_irqrestore(&buffers->rb_lock, flags);
return req;
@@ -1308,34 +1369,14 @@ rpcrdma_buffer_put(struct rpcrdma_req *req)
{
struct rpcrdma_buffer *buffers = req->rl_buffer;
struct rpcrdma_ia *ia = rdmab_to_ia(buffers);
- int i;
unsigned long flags;
spin_lock_irqsave(&buffers->rb_lock, flags);
- buffers->rb_send_bufs[--buffers->rb_send_index] = req;
- req->rl_niovs = 0;
- if (req->rl_reply) {
- buffers->rb_recv_bufs[--buffers->rb_recv_index] = req->rl_reply;
- req->rl_reply->rr_func = NULL;
- req->rl_reply = NULL;
- }
+ rpcrdma_send_buffer_put(req, buffers);
switch (ia->ri_memreg_strategy) {
case RPCRDMA_FRMR:
case RPCRDMA_MTHCAFMR:
- /*
- * Cycle mw's back in reverse order, and "spin" them.
- * This delays and scrambles reuse as much as possible.
- */
- i = 1;
- do {
- struct rpcrdma_mw **mw;
- mw = &req->rl_segments[i].mr_chunk.rl_mw;
- list_add_tail(&(*mw)->mw_list, &buffers->rb_mws);
- *mw = NULL;
- } while (++i < RPCRDMA_MAX_SEGS);
- list_add_tail(&req->rl_segments[0].mr_chunk.rl_mw->mw_list,
- &buffers->rb_mws);
- req->rl_segments[0].mr_chunk.rl_mw = NULL;
+ rpcrdma_buffer_put_mws(req);
break;
default:
break;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 6b5d243..b81e5b5 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -175,6 +175,7 @@ struct rpcrdma_mw {
struct rpcrdma_frmr frmr;
} r;
struct list_head mw_list;
+ struct rpcrdma_buffer *mw_pool;
};
#define RPCRDMA_BIT_FASTREG (0)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 10/13] xprtrdma: Release FRMR segment buffers during LOCAL_INV completion
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (8 preceding siblings ...)
2014-06-23 22:40 ` [PATCH v1 09/13] xprtrdma: Refactor rpcrdma_buffer_put() Chuck Lever
@ 2014-06-23 22:40 ` Chuck Lever
[not found] ` <20140623224023.1634.67233.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-23 22:40 ` [PATCH v1 11/13] xprtrdma: Clean up rpcrdma_ep_disconnect() Chuck Lever
` (4 subsequent siblings)
14 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:40 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
FRMR uses a LOCAL_INV Work Request, which is asynchronous, to
deregister segment buffers. Other registration strategies use
synchronous deregistration mechanisms (like ib_unmap_fmr()).
For a synchronous deregistration mechanism, it makes sense for
xprt_rdma_free() to put segment buffers back into the buffer pool
immediately once rpcrdma_deregister_external() returns.
This is currently also what FRMR is doing. It is releasing segment
buffers just after the LOCAL_INV WR is posted.
But segment buffers need to be put back after the LOCAL_INV WR
_completes_ (or flushes). Otherwise, rpcrdma_buffer_get() can then
assign these segment buffers to another RPC task while they are
still "in use" by the hardware.
The result of re-using an FRMR too quickly is that it's rkey
no longer matches the rkey that was registered with the provider.
This results in FAST_REG_MR or LOCAL_INV Work Requests completing
with IB_WC_MW_BIND_ERR, and the FRMR, and thus the transport,
becomes unusable.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 44 +++++++++++++++++++++++++++++++++++----
net/sunrpc/xprtrdma/xprt_rdma.h | 2 ++
2 files changed, 42 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index f24f0bf..52f57f7 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -62,6 +62,8 @@
#endif
static void rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *);
+static void rpcrdma_get_mw(struct rpcrdma_mw *);
+static void rpcrdma_put_mw(struct rpcrdma_mw *);
/*
* internal functions
@@ -167,6 +169,7 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
if (fastreg)
rpcrdma_decrement_frmr_rkey(mw);
}
+ rpcrdma_put_mw(mw);
}
static int
@@ -1034,7 +1037,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
len += cdata->padding;
switch (ia->ri_memreg_strategy) {
case RPCRDMA_FRMR:
- len += buf->rb_max_requests * RPCRDMA_MAX_SEGS *
+ len += (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS *
sizeof(struct rpcrdma_mw);
break;
case RPCRDMA_MTHCAFMR:
@@ -1076,7 +1079,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
r = (struct rpcrdma_mw *)p;
switch (ia->ri_memreg_strategy) {
case RPCRDMA_FRMR:
- for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
+ for (i = (buf->rb_max_requests+1) * RPCRDMA_MAX_SEGS; i; i--) {
r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
ia->ri_max_frmr_depth);
if (IS_ERR(r->r.frmr.fr_mr)) {
@@ -1252,12 +1255,36 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
}
static void
-rpcrdma_put_mw_locked(struct rpcrdma_mw *mw)
+rpcrdma_free_mw(struct kref *kref)
{
+ struct rpcrdma_mw *mw = container_of(kref, struct rpcrdma_mw, mw_ref);
list_add_tail(&mw->mw_list, &mw->mw_pool->rb_mws);
}
static void
+rpcrdma_put_mw_locked(struct rpcrdma_mw *mw)
+{
+ kref_put(&mw->mw_ref, rpcrdma_free_mw);
+}
+
+static void
+rpcrdma_get_mw(struct rpcrdma_mw *mw)
+{
+ kref_get(&mw->mw_ref);
+}
+
+static void
+rpcrdma_put_mw(struct rpcrdma_mw *mw)
+{
+ struct rpcrdma_buffer *buffers = mw->mw_pool;
+ unsigned long flags;
+
+ spin_lock_irqsave(&buffers->rb_lock, flags);
+ rpcrdma_put_mw_locked(mw);
+ spin_unlock_irqrestore(&buffers->rb_lock, flags);
+}
+
+static void
rpcrdma_buffer_put_mw(struct rpcrdma_mw **mw)
{
rpcrdma_put_mw_locked(*mw);
@@ -1304,6 +1331,7 @@ rpcrdma_buffer_get_mws(struct rpcrdma_req *req, struct rpcrdma_buffer *buffers)
r = list_entry(buffers->rb_mws.next,
struct rpcrdma_mw, mw_list);
list_del(&r->mw_list);
+ kref_init(&r->mw_ref);
r->mw_pool = buffers;
req->rl_segments[i].mr_chunk.rl_mw = r;
}
@@ -1583,6 +1611,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
dprintk("RPC: %s: Using frmr %p to map %d segments\n",
__func__, seg1->mr_chunk.rl_mw, i);
+ rpcrdma_get_mw(seg1->mr_chunk.rl_mw);
if (unlikely(seg1->mr_chunk.rl_mw->r.frmr.fr_state == FRMR_IS_VALID)) {
dprintk("RPC: %s: frmr %x left valid, posting invalidate.\n",
__func__,
@@ -1595,6 +1624,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
invalidate_wr.send_flags = IB_SEND_SIGNALED;
invalidate_wr.ex.invalidate_rkey =
seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
+ rpcrdma_get_mw(seg1->mr_chunk.rl_mw);
DECR_CQCOUNT(&r_xprt->rx_ep);
post_wr = &invalidate_wr;
} else
@@ -1638,6 +1668,9 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
*nsegs = i;
return 0;
out_err:
+ rpcrdma_put_mw(seg1->mr_chunk.rl_mw);
+ if (post_wr == &invalidate_wr)
+ rpcrdma_put_mw(seg1->mr_chunk.rl_mw);
while (i--)
rpcrdma_unmap_one(ia, --seg);
return rc;
@@ -1653,6 +1686,7 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
while (seg1->mr_nsegs--)
rpcrdma_unmap_one(ia, seg++);
+ rpcrdma_get_mw(seg1->mr_chunk.rl_mw);
memset(&invalidate_wr, 0, sizeof invalidate_wr);
invalidate_wr.wr_id = (unsigned long)(void *)seg1->mr_chunk.rl_mw;
@@ -1664,9 +1698,11 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
read_lock(&ia->ri_qplock);
rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
read_unlock(&ia->ri_qplock);
- if (rc)
+ if (rc) {
+ rpcrdma_put_mw(seg1->mr_chunk.rl_mw);
dprintk("RPC: %s: failed ib_post_send for invalidate,"
" status %i\n", __func__, rc);
+ }
return rc;
}
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index b81e5b5..7a140fe 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -44,6 +44,7 @@
#include <linux/spinlock.h> /* spinlock_t, etc */
#include <linux/atomic.h> /* atomic_t, etc */
#include <linux/workqueue.h> /* struct work_struct */
+#include <linux/kref.h>
#include <rdma/rdma_cm.h> /* RDMA connection api */
#include <rdma/ib_verbs.h> /* RDMA verbs api */
@@ -176,6 +177,7 @@ struct rpcrdma_mw {
} r;
struct list_head mw_list;
struct rpcrdma_buffer *mw_pool;
+ struct kref mw_ref;
};
#define RPCRDMA_BIT_FASTREG (0)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 11/13] xprtrdma: Clean up rpcrdma_ep_disconnect()
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (9 preceding siblings ...)
2014-06-23 22:40 ` [PATCH v1 10/13] xprtrdma: Release FRMR segment buffers during LOCAL_INV completion Chuck Lever
@ 2014-06-23 22:40 ` Chuck Lever
2014-06-23 22:40 ` [PATCH v1 12/13] xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro Chuck Lever
` (3 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:40 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
The return code is used only for dprintk's that are already
redundant.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/transport.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 13 +++----------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
3 files changed, 5 insertions(+), 12 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index f6d280b..2faac49 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -414,7 +414,7 @@ xprt_rdma_close(struct rpc_xprt *xprt)
if (r_xprt->rx_ep.rep_connected > 0)
xprt->reestablish_timeout = 0;
xprt_disconnect_done(xprt);
- (void) rpcrdma_ep_disconnect(&r_xprt->rx_ep, &r_xprt->rx_ia);
+ rpcrdma_ep_disconnect(&r_xprt->rx_ep, &r_xprt->rx_ia);
}
static void
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 52f57f7..b6c52c7 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -838,10 +838,7 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
cancel_delayed_work_sync(&ep->rep_connect_worker);
if (ia->ri_id->qp) {
- rc = rpcrdma_ep_disconnect(ep, ia);
- if (rc)
- dprintk("RPC: %s: rpcrdma_ep_disconnect"
- " returned %i\n", __func__, rc);
+ rpcrdma_ep_disconnect(ep, ia);
rdma_destroy_qp(ia->ri_id);
ia->ri_id->qp = NULL;
}
@@ -879,10 +876,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
struct rpcrdma_xprt *xprt;
retry:
dprintk("RPC: %s: reconnecting...\n", __func__);
- rc = rpcrdma_ep_disconnect(ep, ia);
- if (rc && rc != -ENOTCONN)
- dprintk("RPC: %s: rpcrdma_ep_disconnect"
- " status %i\n", __func__, rc);
+ rpcrdma_ep_disconnect(ep, ia);
xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
id = rpcrdma_create_id(xprt, ia,
@@ -988,7 +982,7 @@ out:
* This call is not reentrant, and must not be made in parallel
* on the same endpoint.
*/
-int
+void
rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
{
int rc;
@@ -1004,7 +998,6 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
dprintk("RPC: %s: rdma_disconnect %i\n", __func__, rc);
ep->rep_connected = rc;
}
- return rc;
}
/*
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 7a140fe..4f7de2a 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -343,7 +343,7 @@ int rpcrdma_ep_create(struct rpcrdma_ep *, struct rpcrdma_ia *,
struct rpcrdma_create_data_internal *);
void rpcrdma_ep_destroy(struct rpcrdma_ep *, struct rpcrdma_ia *);
int rpcrdma_ep_connect(struct rpcrdma_ep *, struct rpcrdma_ia *);
-int rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
+void rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
struct rpcrdma_req *);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 12/13] xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (10 preceding siblings ...)
2014-06-23 22:40 ` [PATCH v1 11/13] xprtrdma: Clean up rpcrdma_ep_disconnect() Chuck Lever
@ 2014-06-23 22:40 ` Chuck Lever
2014-06-23 22:40 ` [PATCH v1 13/13] xprtrdma: Handle additional connection events Chuck Lever
` (2 subsequent siblings)
14 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:40 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Clean up.
RPCRDMA_PERSISTENT_REGISTRATION was a compile-time switch between
RPCRDMA_REGISTER mode and RPCRDMA_ALLPHYSICAL mode. Since
RPCRDMA_REGISTER has been removed, there's no need for the extra
conditional compilation.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
include/linux/sunrpc/xprtrdma.h | 2 --
net/sunrpc/xprtrdma/verbs.c | 13 -------------
2 files changed, 15 deletions(-)
diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
index c2f04e1..64a0a0a 100644
--- a/include/linux/sunrpc/xprtrdma.h
+++ b/include/linux/sunrpc/xprtrdma.h
@@ -62,8 +62,6 @@
#define RPCRDMA_INLINE_PAD_THRESH (512)/* payload threshold to pad (bytes) */
/* memory registration strategies */
-#define RPCRDMA_PERSISTENT_REGISTRATION (1)
-
enum rpcrdma_memreg {
RPCRDMA_BOUNCEBUFFERS = 0,
RPCRDMA_REGISTER,
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index b6c52c7..ec98e48 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -569,12 +569,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
if (!ia->ri_id->device->alloc_fmr) {
dprintk("RPC: %s: MTHCAFMR registration "
"not supported by HCA\n", __func__);
-#if RPCRDMA_PERSISTENT_REGISTRATION
memreg = RPCRDMA_ALLPHYSICAL;
-#else
- rc = -ENOMEM;
- goto out2;
-#endif
}
}
@@ -589,20 +584,16 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
switch (memreg) {
case RPCRDMA_FRMR:
break;
-#if RPCRDMA_PERSISTENT_REGISTRATION
case RPCRDMA_ALLPHYSICAL:
mem_priv = IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_WRITE |
IB_ACCESS_REMOTE_READ;
goto register_setup;
-#endif
case RPCRDMA_MTHCAFMR:
if (ia->ri_have_dma_lkey)
break;
mem_priv = IB_ACCESS_LOCAL_WRITE;
-#if RPCRDMA_PERSISTENT_REGISTRATION
register_setup:
-#endif
ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
if (IS_ERR(ia->ri_bind_mem)) {
printk(KERN_ALERT "%s: ib_get_dma_mr for "
@@ -1770,7 +1761,6 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
switch (ia->ri_memreg_strategy) {
-#if RPCRDMA_PERSISTENT_REGISTRATION
case RPCRDMA_ALLPHYSICAL:
rpcrdma_map_one(ia, seg, writing);
seg->mr_rkey = ia->ri_bind_mem->rkey;
@@ -1778,7 +1768,6 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
seg->mr_nsegs = 1;
nsegs = 1;
break;
-#endif
/* Registration using frmr registration */
case RPCRDMA_FRMR:
@@ -1808,11 +1797,9 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
switch (ia->ri_memreg_strategy) {
-#if RPCRDMA_PERSISTENT_REGISTRATION
case RPCRDMA_ALLPHYSICAL:
rpcrdma_unmap_one(ia, seg);
break;
-#endif
case RPCRDMA_FRMR:
rc = rpcrdma_deregister_frmr_external(seg, ia, r_xprt);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v1 13/13] xprtrdma: Handle additional connection events
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (11 preceding siblings ...)
2014-06-23 22:40 ` [PATCH v1 12/13] xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro Chuck Lever
@ 2014-06-23 22:40 ` Chuck Lever
[not found] ` <20140623224048.1634.23972.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-24 14:35 ` [PATCH v1 00/13] NFS/RDMA patches for 3.17 Or Gerlitz
2014-06-25 22:47 ` Steve Wise
14 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-06-23 22:40 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Commit 38ca83a5 added RDMA_CM_EVENT_TIMEWAIT_EXIT. But that status
is relevant only for consumers that re-use their QPs on new
connections. xprtrdma creates a fresh QP on reconnection, so that
event should be explicitly ignored.
Squelch the alarming "unexpected CM event" message.
Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
net/sunrpc/xprtrdma/verbs.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index ec98e48..dbd5f22 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -334,8 +334,16 @@ static const char * const conn[] = {
"rejected",
"established",
"disconnected",
- "device removal"
+ "device removal",
+ "multicast join",
+ "multicast error",
+ "address change",
+ "timewait exit",
};
+
+#define CONNECTION_MSG(status) \
+ ((status) < ARRAY_SIZE(conn) ? \
+ conn[(status)] : "unrecognized connection error")
#endif
static int
@@ -393,13 +401,10 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
case RDMA_CM_EVENT_DEVICE_REMOVAL:
connstate = -ENODEV;
connected:
- dprintk("RPC: %s: %s: %pI4:%u (ep 0x%p event 0x%x)\n",
- __func__,
- (event->event <= 11) ? conn[event->event] :
- "unknown connection error",
- &addr->sin_addr.s_addr,
- ntohs(addr->sin_port),
- ep, event->event);
+ dprintk("RPC: %s: %pI4:%u (ep 0x%p): %s\n",
+ __func__, &addr->sin_addr.s_addr,
+ ntohs(addr->sin_port), ep,
+ CONNECTION_MSG(event->event));
atomic_set(&rpcx_to_rdmax(ep->rep_xprt)->rx_buf.rb_credits, 1);
dprintk("RPC: %s: %sconnected\n",
__func__, connstate > 0 ? "" : "dis");
@@ -408,8 +413,10 @@ connected:
wake_up_all(&ep->rep_connect_wait);
break;
default:
- dprintk("RPC: %s: unexpected CM event %d\n",
- __func__, event->event);
+ dprintk("RPC: %s: %pI4:%u (ep 0x%p): %s\n",
+ __func__, &addr->sin_addr.s_addr,
+ ntohs(addr->sin_port), ep,
+ CONNECTION_MSG(event->event));
break;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v1 00/13] NFS/RDMA patches for 3.17
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (12 preceding siblings ...)
2014-06-23 22:40 ` [PATCH v1 13/13] xprtrdma: Handle additional connection events Chuck Lever
@ 2014-06-24 14:35 ` Or Gerlitz
[not found] ` <CAJZOPZ+ix6tPDHXbVrSnVzofHSbzqOoyTBvzkEo-GJpOYOaPFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-25 22:47 ` Steve Wise
14 siblings, 1 reply; 38+ messages in thread
From: Or Gerlitz @ 2014-06-24 14:35 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List
On Tue, Jun 24, 2014 at 1:39 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>
> The main purpose of this series is to address more connection drop
> recovery issues by fixing FRMR re-use to make it less likely the
> client will drop the connection due to a memory operation error.
>
> Some other clean-ups and fixes are present as well.
>From quick looking, patches 1,2 and 5 of series and maybe more have
very good match for 3.16-rc (fix kernel crashes etc), I don't think
they need to wait for 3.17
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 01/13] xprtrdma: Fix panic in rpcrdma_register_frmr_external()
[not found] ` <20140623223909.1634.33362.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-06-24 14:37 ` Or Gerlitz
0 siblings, 0 replies; 38+ messages in thread
From: Or Gerlitz @ 2014-06-24 14:37 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List
On Tue, Jun 24, 2014 at 1:39 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> seg1->mr_nsegs is not yet initialized when it is used to unmap
> segments during an error exit. Use the same unmapping logic for
> all error exits.
>
> "if (frmr_wr.wr.fast_reg.length < len) {" used to be a BUG_ON check.
> The broken code should never be executed under normal operation.
>
> Fixes: c977dea22708688eae31774f70126c97aa4dfe83
Here you should put the information provided by git log --oneline e.g
Fixes: c977dea22 ('THE COMMIT TITLE')
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
> net/sunrpc/xprtrdma/verbs.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 13dbd1c..78bd7c6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -1545,9 +1545,8 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
> frmr_wr.wr.fast_reg.length = page_no << PAGE_SHIFT;
> if (frmr_wr.wr.fast_reg.length < len) {
> - while (seg1->mr_nsegs--)
> - rpcrdma_unmap_one(ia, seg++);
> - return -EIO;
> + rc = -EIO;
> + goto out_err;
> }
>
> /* Bump the key */
> @@ -1565,8 +1564,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> if (rc) {
> dprintk("RPC: %s: failed ib_post_send for register,"
> " status %i\n", __func__, rc);
> - while (i--)
> - rpcrdma_unmap_one(ia, --seg);
> + goto out_err;
> } else {
> seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
> seg1->mr_base = seg1->mr_dma + pageoff;
> @@ -1574,6 +1572,10 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> seg1->mr_len = len;
> }
> *nsegs = i;
> + return 0;
> +out_err:
> + while (i--)
> + rpcrdma_unmap_one(ia, --seg);
> return rc;
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 08/13] xprtrdma: Back off rkey when FAST_REG_MR fails
[not found] ` <20140623224007.1634.55636.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-06-24 15:47 ` Anna Schumaker
[not found] ` <53A99DA6.90808-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Anna Schumaker @ 2014-06-24 15:47 UTC (permalink / raw)
To: Chuck Lever, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
On 06/23/2014 06:40 PM, Chuck Lever wrote:
> If posting a FAST_REG_MR Work Reqeust fails, or the FAST_REG WR
> flushes, revert the rkey update to avoid subsequent
> IB_WC_MW_BIND_ERR completions.
>
> Suggested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
> net/sunrpc/xprtrdma/verbs.c | 39 +++++++++++++++++++++++++++++----------
> 1 file changed, 29 insertions(+), 10 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index cef67fd..3efc007 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -61,6 +61,8 @@
> # define RPCDBG_FACILITY RPCDBG_TRANS
> #endif
>
> +static void rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *);
> +
> /*
> * internal functions
> */
> @@ -154,13 +156,17 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
>
> if (wrid == 0)
> return;
> - if (wc->status != IB_WC_SUCCESS)
> - return;
>
> fastreg = test_and_clear_bit(RPCRDMA_BIT_FASTREG, &wrid);
> mw = (struct rpcrdma_mw *)wrid;
>
> - mw->r.frmr.fr_state = fastreg ? FRMR_IS_VALID : FRMR_IS_INVALID;
> + if (wc->status == IB_WC_SUCCESS) {
> + mw->r.frmr.fr_state = fastreg ?
> + FRMR_IS_VALID : FRMR_IS_INVALID;
> + } else {
> + if (fastreg)
> + rpcrdma_decrement_frmr_rkey(mw);
Isn't this the same as "else if (fastreg)"?
Anna
> + }
> }
>
> static int
> @@ -1480,6 +1486,24 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
> seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
> }
>
> +static void
> +rpcrdma_increment_frmr_rkey(struct rpcrdma_mw *mw)
> +{
> + struct ib_mr *frmr = mw->r.frmr.fr_mr;
> + u8 key = frmr->rkey & 0x000000FF;
> +
> + ib_update_fast_reg_key(frmr, ++key);
> +}
> +
> +static void
> +rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *mw)
> +{
> + struct ib_mr *frmr = mw->r.frmr.fr_mr;
> + u8 key = frmr->rkey & 0x000000FF;
> +
> + ib_update_fast_reg_key(frmr, --key);
> +}
> +
> static int
> rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> int *nsegs, int writing, struct rpcrdma_ia *ia,
> @@ -1487,8 +1511,6 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> {
> struct rpcrdma_mr_seg *seg1 = seg;
> struct ib_send_wr invalidate_wr, frmr_wr, *bad_wr, *post_wr;
> -
> - u8 key;
> int len, pageoff;
> int i, rc;
> int seg_len;
> @@ -1552,14 +1574,10 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> rc = -EIO;
> goto out_err;
> }
> -
> - /* Bump the key */
> - key = (u8)(seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey & 0x000000FF);
> - ib_update_fast_reg_key(seg1->mr_chunk.rl_mw->r.frmr.fr_mr, ++key);
> -
> frmr_wr.wr.fast_reg.access_flags = (writing ?
> IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
> IB_ACCESS_REMOTE_READ);
> + rpcrdma_increment_frmr_rkey(seg1->mr_chunk.rl_mw);
> frmr_wr.wr.fast_reg.rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
> DECR_CQCOUNT(&r_xprt->rx_ep);
>
> @@ -1568,6 +1586,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> if (rc) {
> dprintk("RPC: %s: failed ib_post_send for register,"
> " status %i\n", __func__, rc);
> + rpcrdma_decrement_frmr_rkey(seg1->mr_chunk.rl_mw);
> goto out_err;
> } else {
> seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 13/13] xprtrdma: Handle additional connection events
[not found] ` <20140623224048.1634.23972.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-06-24 15:58 ` Anna Schumaker
0 siblings, 0 replies; 38+ messages in thread
From: Anna Schumaker @ 2014-06-24 15:58 UTC (permalink / raw)
To: Chuck Lever, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
On 06/23/2014 06:40 PM, Chuck Lever wrote:
> Commit 38ca83a5 added RDMA_CM_EVENT_TIMEWAIT_EXIT. But that status
> is relevant only for consumers that re-use their QPs on new
> connections. xprtrdma creates a fresh QP on reconnection, so that
> event should be explicitly ignored.
>
> Squelch the alarming "unexpected CM event" message.
>
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
> net/sunrpc/xprtrdma/verbs.c | 27 +++++++++++++++++----------
> 1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index ec98e48..dbd5f22 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -334,8 +334,16 @@ static const char * const conn[] = {
> "rejected",
> "established",
> "disconnected",
> - "device removal"
> + "device removal",
> + "multicast join",
> + "multicast error",
> + "address change",
> + "timewait exit",
> };
> +
> +#define CONNECTION_MSG(status) \
> + ((status) < ARRAY_SIZE(conn) ? \
> + conn[(status)] : "unrecognized connection error")
> #endif
>
> static int
> @@ -393,13 +401,10 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
> case RDMA_CM_EVENT_DEVICE_REMOVAL:
> connstate = -ENODEV;
> connected:
> - dprintk("RPC: %s: %s: %pI4:%u (ep 0x%p event 0x%x)\n",
> - __func__,
> - (event->event <= 11) ? conn[event->event] :
> - "unknown connection error",
> - &addr->sin_addr.s_addr,
> - ntohs(addr->sin_port),
> - ep, event->event);
> + dprintk("RPC: %s: %pI4:%u (ep 0x%p): %s\n",
> + __func__, &addr->sin_addr.s_addr,
> + ntohs(addr->sin_port), ep,
> + CONNECTION_MSG(event->event));
> atomic_set(&rpcx_to_rdmax(ep->rep_xprt)->rx_buf.rb_credits, 1);
> dprintk("RPC: %s: %sconnected\n",
> __func__, connstate > 0 ? "" : "dis");
> @@ -408,8 +413,10 @@ connected:
> wake_up_all(&ep->rep_connect_wait);
> break;
> default:
> - dprintk("RPC: %s: unexpected CM event %d\n",
> - __func__, event->event);
> + dprintk("RPC: %s: %pI4:%u (ep 0x%p): %s\n",
> + __func__, &addr->sin_addr.s_addr,
> + ntohs(addr->sin_port), ep,
> + CONNECTION_MSG(event->event));
These two dprintk()s are exactly the same, and only a few lines apart. Is there some way to combine them? (I'm just curious, not saying that you need to!)
Anna
> break;
> }
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 08/13] xprtrdma: Back off rkey when FAST_REG_MR fails
[not found] ` <53A99DA6.90808-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2014-06-24 16:26 ` Chuck Lever
0 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-24 16:26 UTC (permalink / raw)
To: Anna Schumaker; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux NFS Mailing List
On Jun 24, 2014, at 11:47 AM, Anna Schumaker <schumaker.anna-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On 06/23/2014 06:40 PM, Chuck Lever wrote:
>> If posting a FAST_REG_MR Work Reqeust fails, or the FAST_REG WR
>> flushes, revert the rkey update to avoid subsequent
>> IB_WC_MW_BIND_ERR completions.
>>
>> Suggested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> ---
>> net/sunrpc/xprtrdma/verbs.c | 39 +++++++++++++++++++++++++++++----------
>> 1 file changed, 29 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index cef67fd..3efc007 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -61,6 +61,8 @@
>> # define RPCDBG_FACILITY RPCDBG_TRANS
>> #endif
>>
>> +static void rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *);
>> +
>> /*
>> * internal functions
>> */
>> @@ -154,13 +156,17 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
>>
>> if (wrid == 0)
>> return;
>> - if (wc->status != IB_WC_SUCCESS)
>> - return;
>>
>> fastreg = test_and_clear_bit(RPCRDMA_BIT_FASTREG, &wrid);
>> mw = (struct rpcrdma_mw *)wrid;
>>
>> - mw->r.frmr.fr_state = fastreg ? FRMR_IS_VALID : FRMR_IS_INVALID;
>> + if (wc->status == IB_WC_SUCCESS) {
>> + mw->r.frmr.fr_state = fastreg ?
>> + FRMR_IS_VALID : FRMR_IS_INVALID;
>> + } else {
>> + if (fastreg)
>> + rpcrdma_decrement_frmr_rkey(mw);
>
> Isn't this the same as "else if (fastreg)”?
Yep, those are logically equivalent. I left them separate, and
left the extra braces, so it would be cleaner to add more logic in
both arms of “if (wc->status == IB_WC_SUCCESS)” in subsequent
patches.
Using “switch (wc->status)” might be more future-proof.
> Anna
>
>> + }
>> }
>>
>> static int
>> @@ -1480,6 +1486,24 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
>> seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
>> }
>>
>> +static void
>> +rpcrdma_increment_frmr_rkey(struct rpcrdma_mw *mw)
>> +{
>> + struct ib_mr *frmr = mw->r.frmr.fr_mr;
>> + u8 key = frmr->rkey & 0x000000FF;
>> +
>> + ib_update_fast_reg_key(frmr, ++key);
>> +}
>> +
>> +static void
>> +rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *mw)
>> +{
>> + struct ib_mr *frmr = mw->r.frmr.fr_mr;
>> + u8 key = frmr->rkey & 0x000000FF;
>> +
>> + ib_update_fast_reg_key(frmr, --key);
>> +}
>> +
>> static int
>> rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
>> int *nsegs, int writing, struct rpcrdma_ia *ia,
>> @@ -1487,8 +1511,6 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
>> {
>> struct rpcrdma_mr_seg *seg1 = seg;
>> struct ib_send_wr invalidate_wr, frmr_wr, *bad_wr, *post_wr;
>> -
>> - u8 key;
>> int len, pageoff;
>> int i, rc;
>> int seg_len;
>> @@ -1552,14 +1574,10 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
>> rc = -EIO;
>> goto out_err;
>> }
>> -
>> - /* Bump the key */
>> - key = (u8)(seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey & 0x000000FF);
>> - ib_update_fast_reg_key(seg1->mr_chunk.rl_mw->r.frmr.fr_mr, ++key);
>> -
>> frmr_wr.wr.fast_reg.access_flags = (writing ?
>> IB_ACCESS_REMOTE_WRITE | IB_ACCESS_LOCAL_WRITE :
>> IB_ACCESS_REMOTE_READ);
>> + rpcrdma_increment_frmr_rkey(seg1->mr_chunk.rl_mw);
>> frmr_wr.wr.fast_reg.rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
>> DECR_CQCOUNT(&r_xprt->rx_ep);
>>
>> @@ -1568,6 +1586,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
>> if (rc) {
>> dprintk("RPC: %s: failed ib_post_send for register,"
>> " status %i\n", __func__, rc);
>> + rpcrdma_decrement_frmr_rkey(seg1->mr_chunk.rl_mw);
>> goto out_err;
>> } else {
>> seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 00/13] NFS/RDMA patches for 3.17
[not found] ` <CAJZOPZ+ix6tPDHXbVrSnVzofHSbzqOoyTBvzkEo-GJpOYOaPFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-06-24 17:07 ` Chuck Lever
0 siblings, 0 replies; 38+ messages in thread
From: Chuck Lever @ 2014-06-24 17:07 UTC (permalink / raw)
To: Or Gerlitz; +Cc: linux-rdma, Linux NFS Mailing List
On Jun 24, 2014, at 10:35 AM, Or Gerlitz <or.gerlitz-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Jun 24, 2014 at 1:39 AM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> The main purpose of this series is to address more connection drop
>> recovery issues by fixing FRMR re-use to make it less likely the
>> client will drop the connection due to a memory operation error.
>>
>> Some other clean-ups and fixes are present as well.
>
>
> From quick looking, patches 1,2 and 5 of series and maybe more have
> very good match for 3.16-rc (fix kernel crashes etc), I don't think
> they need to wait for 3.17
My take:
1/13 fixes a panic that should never be hit (that used to be a BUG,
which never fired in operation).
2/13 might be a candidate for 3.16-rc. It’s up to Trond.
5/13 doesn’t fix a regression, it always worked that way.
I’ve updated the “Fixes:” tag in these, will appear in v2 of the series.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 10/13] xprtrdma: Release FRMR segment buffers during LOCAL_INV completion
[not found] ` <20140623224023.1634.67233.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-06-25 5:17 ` Shirley Ma
[not found] ` <53AA5B72.3010200-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Shirley Ma @ 2014-06-25 5:17 UTC (permalink / raw)
To: Chuck Lever, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Would it be possible to delay rpcrdma_buffer_put() until LOCAL_INV request send completion? remove rpcrdma_buffer_put() from xprt_rdma_free(), add a call back after LOCAL_INV completed?
Shirley
On 06/23/2014 03:40 PM, Chuck Lever wrote:
> FRMR uses a LOCAL_INV Work Request, which is asynchronous, to
> deregister segment buffers. Other registration strategies use
> synchronous deregistration mechanisms (like ib_unmap_fmr()).
>
> For a synchronous deregistration mechanism, it makes sense for
> xprt_rdma_free() to put segment buffers back into the buffer pool
> immediately once rpcrdma_deregister_external() returns.
>
> This is currently also what FRMR is doing. It is releasing segment
> buffers just after the LOCAL_INV WR is posted.
>
> But segment buffers need to be put back after the LOCAL_INV WR
> _completes_ (or flushes). Otherwise, rpcrdma_buffer_get() can then
> assign these segment buffers to another RPC task while they are
> still "in use" by the hardware.
>
> The result of re-using an FRMR too quickly is that it's rkey
> no longer matches the rkey that was registered with the provider.
> This results in FAST_REG_MR or LOCAL_INV Work Requests completing
> with IB_WC_MW_BIND_ERR, and the FRMR, and thus the transport,
> becomes unusable.
>
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> ---
> net/sunrpc/xprtrdma/verbs.c | 44 +++++++++++++++++++++++++++++++++++----
> net/sunrpc/xprtrdma/xprt_rdma.h | 2 ++
> 2 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index f24f0bf..52f57f7 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -62,6 +62,8 @@
> #endif
>
> static void rpcrdma_decrement_frmr_rkey(struct rpcrdma_mw *);
> +static void rpcrdma_get_mw(struct rpcrdma_mw *);
> +static void rpcrdma_put_mw(struct rpcrdma_mw *);
>
> /*
> * internal functions
> @@ -167,6 +169,7 @@ rpcrdma_sendcq_process_wc(struct ib_wc *wc)
> if (fastreg)
> rpcrdma_decrement_frmr_rkey(mw);
> }
> + rpcrdma_put_mw(mw);
> }
>
> static int
> @@ -1034,7 +1037,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
> len += cdata->padding;
> switch (ia->ri_memreg_strategy) {
> case RPCRDMA_FRMR:
> - len += buf->rb_max_requests * RPCRDMA_MAX_SEGS *
> + len += (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS *
> sizeof(struct rpcrdma_mw);
> break;
> case RPCRDMA_MTHCAFMR:
> @@ -1076,7 +1079,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
> r = (struct rpcrdma_mw *)p;
> switch (ia->ri_memreg_strategy) {
> case RPCRDMA_FRMR:
> - for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
> + for (i = (buf->rb_max_requests+1) * RPCRDMA_MAX_SEGS; i; i--) {
> r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
> ia->ri_max_frmr_depth);
> if (IS_ERR(r->r.frmr.fr_mr)) {
> @@ -1252,12 +1255,36 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
> }
>
> static void
> -rpcrdma_put_mw_locked(struct rpcrdma_mw *mw)
> +rpcrdma_free_mw(struct kref *kref)
> {
> + struct rpcrdma_mw *mw = container_of(kref, struct rpcrdma_mw, mw_ref);
> list_add_tail(&mw->mw_list, &mw->mw_pool->rb_mws);
> }
>
> static void
> +rpcrdma_put_mw_locked(struct rpcrdma_mw *mw)
> +{
> + kref_put(&mw->mw_ref, rpcrdma_free_mw);
> +}
> +
> +static void
> +rpcrdma_get_mw(struct rpcrdma_mw *mw)
> +{
> + kref_get(&mw->mw_ref);
> +}
> +
> +static void
> +rpcrdma_put_mw(struct rpcrdma_mw *mw)
> +{
> + struct rpcrdma_buffer *buffers = mw->mw_pool;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&buffers->rb_lock, flags);
> + rpcrdma_put_mw_locked(mw);
> + spin_unlock_irqrestore(&buffers->rb_lock, flags);
> +}
> +
> +static void
> rpcrdma_buffer_put_mw(struct rpcrdma_mw **mw)
> {
> rpcrdma_put_mw_locked(*mw);
> @@ -1304,6 +1331,7 @@ rpcrdma_buffer_get_mws(struct rpcrdma_req *req, struct rpcrdma_buffer *buffers)
> r = list_entry(buffers->rb_mws.next,
> struct rpcrdma_mw, mw_list);
> list_del(&r->mw_list);
> + kref_init(&r->mw_ref);
> r->mw_pool = buffers;
> req->rl_segments[i].mr_chunk.rl_mw = r;
> }
> @@ -1583,6 +1611,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> dprintk("RPC: %s: Using frmr %p to map %d segments\n",
> __func__, seg1->mr_chunk.rl_mw, i);
>
> + rpcrdma_get_mw(seg1->mr_chunk.rl_mw);
> if (unlikely(seg1->mr_chunk.rl_mw->r.frmr.fr_state == FRMR_IS_VALID)) {
> dprintk("RPC: %s: frmr %x left valid, posting invalidate.\n",
> __func__,
> @@ -1595,6 +1624,7 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> invalidate_wr.send_flags = IB_SEND_SIGNALED;
> invalidate_wr.ex.invalidate_rkey =
> seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
> + rpcrdma_get_mw(seg1->mr_chunk.rl_mw);
> DECR_CQCOUNT(&r_xprt->rx_ep);
> post_wr = &invalidate_wr;
> } else
> @@ -1638,6 +1668,9 @@ rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
> *nsegs = i;
> return 0;
> out_err:
> + rpcrdma_put_mw(seg1->mr_chunk.rl_mw);
> + if (post_wr == &invalidate_wr)
> + rpcrdma_put_mw(seg1->mr_chunk.rl_mw);
> while (i--)
> rpcrdma_unmap_one(ia, --seg);
> return rc;
> @@ -1653,6 +1686,7 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
>
> while (seg1->mr_nsegs--)
> rpcrdma_unmap_one(ia, seg++);
> + rpcrdma_get_mw(seg1->mr_chunk.rl_mw);
>
> memset(&invalidate_wr, 0, sizeof invalidate_wr);
> invalidate_wr.wr_id = (unsigned long)(void *)seg1->mr_chunk.rl_mw;
> @@ -1664,9 +1698,11 @@ rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
> read_lock(&ia->ri_qplock);
> rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
> read_unlock(&ia->ri_qplock);
> - if (rc)
> + if (rc) {
> + rpcrdma_put_mw(seg1->mr_chunk.rl_mw);
> dprintk("RPC: %s: failed ib_post_send for invalidate,"
> " status %i\n", __func__, rc);
> + }
> return rc;
> }
>
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index b81e5b5..7a140fe 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -44,6 +44,7 @@
> #include <linux/spinlock.h> /* spinlock_t, etc */
> #include <linux/atomic.h> /* atomic_t, etc */
> #include <linux/workqueue.h> /* struct work_struct */
> +#include <linux/kref.h>
>
> #include <rdma/rdma_cm.h> /* RDMA connection api */
> #include <rdma/ib_verbs.h> /* RDMA verbs api */
> @@ -176,6 +177,7 @@ struct rpcrdma_mw {
> } r;
> struct list_head mw_list;
> struct rpcrdma_buffer *mw_pool;
> + struct kref mw_ref;
> };
>
> #define RPCRDMA_BIT_FASTREG (0)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 10/13] xprtrdma: Release FRMR segment buffers during LOCAL_INV completion
[not found] ` <53AA5B72.3010200-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-25 14:32 ` Chuck Lever
[not found] ` <89930B1D-AE3B-48AD-922C-6FCA754D2B01-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-06-25 14:32 UTC (permalink / raw)
To: Shirley Ma; +Cc: linux-rdma, Linux NFS Mailing List
Hi Shirley-
On Jun 25, 2014, at 1:17 AM, Shirley Ma <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> Would it be possible to delay rpcrdma_buffer_put() until LOCAL_INV request send completion? remove rpcrdma_buffer_put() from xprt_rdma_free(), add a call back after LOCAL_INV completed?
That’s exactly what this patch does. The relevant part of
rpcrdma_buffer_put() is:
list_add(&mw->mw_list, &buf->rb_mws);
This is now wrapped with a reference count so that
rpcrdma_buffer_put() and the LOCAL_INV completion can run in any
order. The FRMR is added back to the list only after both of those
two have finished.
Nothing in xprt_rdma_free() is allowed to sleep, so we can’t wait for
LOCAL_INV completion in there.
The only alternative I can think of is having rpcrdma_buffer_get() check
fr_state as it removes FRMRs from the rb_mws list. Only if the FRMR is
marked FRMR_IS_INVALID, rpcrdma_buffer_get() will add it to the
rpcrdma_req.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 10/13] xprtrdma: Release FRMR segment buffers during LOCAL_INV completion
[not found] ` <89930B1D-AE3B-48AD-922C-6FCA754D2B01-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-06-25 16:14 ` Shirley Ma
0 siblings, 0 replies; 38+ messages in thread
From: Shirley Ma @ 2014-06-25 16:14 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List
On 06/25/2014 07:32 AM, Chuck Lever wrote:
> Hi Shirley-
>
> On Jun 25, 2014, at 1:17 AM, Shirley Ma <shirley.ma-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
>
>> Would it be possible to delay rpcrdma_buffer_put() until LOCAL_INV request send completion? remove rpcrdma_buffer_put() from xprt_rdma_free(), add a call back after LOCAL_INV completed?
>
> That’s exactly what this patch does. The relevant part of
> rpcrdma_buffer_put() is:
>
> list_add(&mw->mw_list, &buf->rb_mws);
>
> This is now wrapped with a reference count so that
> rpcrdma_buffer_put() and the LOCAL_INV completion can run in any
> order. The FRMR is added back to the list only after both of those
> two have finished.
What I was thinking is to run rpcrdma_buffer_put after LOCAL_INV completion without reference count.
> Nothing in xprt_rdma_free() is allowed to sleep, so we can’t wait for
> LOCAL_INV completion in there.
>
> The only alternative I can think of is having rpcrdma_buffer_get() check
> fr_state as it removes FRMRs from the rb_mws list. Only if the FRMR is
> marked FRMR_IS_INVALID, rpcrdma_buffer_get() will add it to the
> rpcrdma_req.
I thought about it too, an atomic operation would be better than a lock.
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 00/13] NFS/RDMA patches for 3.17
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
` (13 preceding siblings ...)
2014-06-24 14:35 ` [PATCH v1 00/13] NFS/RDMA patches for 3.17 Or Gerlitz
@ 2014-06-25 22:47 ` Steve Wise
2014-06-27 16:17 ` Shirley Ma
14 siblings, 1 reply; 38+ messages in thread
From: Steve Wise @ 2014-06-25 22:47 UTC (permalink / raw)
To: 'Chuck Lever', linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Hey Chuck,
I did some testing on this series. Just 2 nodes, both nfs3 and nfs4 over cxgb4 and mlx4:
cthon b/g/s 10 iterations
iozone -a with direct IO and data validation
fio write and rand-rw testing of large IO/files and 8 threads.
xfs test suite.
No regressions seen.
Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> -----Original Message-----
> From: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf
> Of Chuck Lever
> Sent: Monday, June 23, 2014 5:39 PM
> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Subject: [PATCH v1 00/13] NFS/RDMA patches for 3.17
>
> The main purpose of this series is to address more connection drop
> recovery issues by fixing FRMR re-use to make it less likely the
> client will drop the connection due to a memory operation error.
>
> Some other clean-ups and fixes are present as well.
>
> See topic branch nfs-rdma-for-3.17 in
>
> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>
> I tested with NFSv3 and NFSv4 on all three supported memory
> registration modes. Used cthon04 and iozone with both Solaris
> and Linux NFS/RDMA servers. Used xfstests with Linux.
>
> ---
>
> Chuck Lever (13):
> xprtrdma: Fix panic in rpcrdma_register_frmr_external()
> xprtrdma: Protect ->qp during FRMR deregistration
> xprtrdma: Limit data payload size for ALLPHYSICAL
> xprtrdma: Update rkeys after transport reconnect
> xprtrdma: Don't drain CQs on transport disconnect
> xprtrdma: Unclutter struct rpcrdma_mr_seg
> xprtrdma: Encode Work Request opcode in wc->wr_id
> xprtrdma: Back off rkey when FAST_REG_MR fails
> xprtrdma: Refactor rpcrdma_buffer_put()
> xprtrdma: Release FRMR segment buffers during LOCAL_INV completion
> xprtrdma: Clean up rpcrdma_ep_disconnect()
> xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
> xprtrdma: Handle additional connection events
>
>
> include/linux/sunrpc/xprtrdma.h | 2
> net/sunrpc/xprtrdma/rpc_rdma.c | 77 +++++----
> net/sunrpc/xprtrdma/transport.c | 17 +-
> net/sunrpc/xprtrdma/verbs.c | 330 +++++++++++++++++++++++++++------------
> net/sunrpc/xprtrdma/xprt_rdma.h | 63 ++++++-
> 5 files changed, 332 insertions(+), 157 deletions(-)
>
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 00/13] NFS/RDMA patches for 3.17
2014-06-25 22:47 ` Steve Wise
@ 2014-06-27 16:17 ` Shirley Ma
0 siblings, 0 replies; 38+ messages in thread
From: Shirley Ma @ 2014-06-27 16:17 UTC (permalink / raw)
To: Steve Wise, 'Chuck Lever',
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
Passed cthon4, iozone test interoperability test between Linux client and Solaris server, no regressions.
Shirley
On 06/25/2014 03:47 PM, Steve Wise wrote:
> Hey Chuck,
>
> I did some testing on this series. Just 2 nodes, both nfs3 and nfs4 over cxgb4 and mlx4:
>
> cthon b/g/s 10 iterations
> iozone -a with direct IO and data validation
> fio write and rand-rw testing of large IO/files and 8 threads.
> xfs test suite.
>
> No regressions seen.
>
> Tested-by: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>
>
>> -----Original Message-----
>> From: linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-nfs-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf
>> Of Chuck Lever
>> Sent: Monday, June 23, 2014 5:39 PM
>> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: [PATCH v1 00/13] NFS/RDMA patches for 3.17
>>
>> The main purpose of this series is to address more connection drop
>> recovery issues by fixing FRMR re-use to make it less likely the
>> client will drop the connection due to a memory operation error.
>>
>> Some other clean-ups and fixes are present as well.
>>
>> See topic branch nfs-rdma-for-3.17 in
>>
>> git://git.linux-nfs.org/projects/cel/cel-2.6.git
>>
>> I tested with NFSv3 and NFSv4 on all three supported memory
>> registration modes. Used cthon04 and iozone with both Solaris
>> and Linux NFS/RDMA servers. Used xfstests with Linux.
>>
>> ---
>>
>> Chuck Lever (13):
>> xprtrdma: Fix panic in rpcrdma_register_frmr_external()
>> xprtrdma: Protect ->qp during FRMR deregistration
>> xprtrdma: Limit data payload size for ALLPHYSICAL
>> xprtrdma: Update rkeys after transport reconnect
>> xprtrdma: Don't drain CQs on transport disconnect
>> xprtrdma: Unclutter struct rpcrdma_mr_seg
>> xprtrdma: Encode Work Request opcode in wc->wr_id
>> xprtrdma: Back off rkey when FAST_REG_MR fails
>> xprtrdma: Refactor rpcrdma_buffer_put()
>> xprtrdma: Release FRMR segment buffers during LOCAL_INV completion
>> xprtrdma: Clean up rpcrdma_ep_disconnect()
>> xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro
>> xprtrdma: Handle additional connection events
>>
>>
>> include/linux/sunrpc/xprtrdma.h | 2
>> net/sunrpc/xprtrdma/rpc_rdma.c | 77 +++++----
>> net/sunrpc/xprtrdma/transport.c | 17 +-
>> net/sunrpc/xprtrdma/verbs.c | 330 +++++++++++++++++++++++++++------------
>> net/sunrpc/xprtrdma/xprt_rdma.h | 63 ++++++-
>> 5 files changed, 332 insertions(+), 157 deletions(-)
>>
>> --
>> Chuck Lever
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <20140623223942.1634.89063.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2014-07-02 19:06 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C540-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Devesh Sharma @ 2014-07-02 19:06 UTC (permalink / raw)
To: Chuck Lever, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
This change is very much prone to generate poll_cq errors because of un-cleaned completions which still
point to the non-existent QPs. On the new connection when these completions are polled, the poll_cq will
fail because old QP pointer is already NULL.
Did anyone hit this situation during their testing?
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> owner@vger.kernel.org] On Behalf Of Chuck Lever
> Sent: Tuesday, June 24, 2014 4:10 AM
> To: linux-rdma@vger.kernel.org; linux-nfs@vger.kernel.org
> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
>
> CQs are not destroyed until unmount. By draining CQs on transport
> disconnect, successful completions that can change the r.frmr.state field can
> be missed.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> net/sunrpc/xprtrdma/verbs.c | 5 -----
> 1 file changed, 5 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 3c7f904..451e100 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -873,9 +873,6 @@ retry:
> dprintk("RPC: %s: rpcrdma_ep_disconnect"
> " status %i\n", __func__, rc);
>
> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> -
> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> id = rpcrdma_create_id(xprt, ia,
> (struct sockaddr *)&xprt->rx_data.addr);
> @@ -985,8 +982,6 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep,
> struct rpcrdma_ia *ia) {
> int rc;
>
> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> rc = rdma_disconnect(ia->ri_id);
> if (!rc) {
> /* returns without wait if not connected */
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C540-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-07-02 19:28 ` Steve Wise
[not found] ` <53B45D7B.4020705-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Steve Wise @ 2014-07-02 19:28 UTC (permalink / raw)
To: Devesh Sharma, Chuck Lever,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> This change is very much prone to generate poll_cq errors because of un-cleaned completions which still
> point to the non-existent QPs. On the new connection when these completions are polled, the poll_cq will
> fail because old QP pointer is already NULL.
> Did anyone hit this situation during their testing?
Hey Devesh,
iw_cxgb4 will silently toss CQEs if the QP is not active.
>> -----Original Message-----
>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
>> Sent: Tuesday, June 24, 2014 4:10 AM
>> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
>>
>> CQs are not destroyed until unmount. By draining CQs on transport
>> disconnect, successful completions that can change the r.frmr.state field can
>> be missed.
>>
>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>> ---
>> net/sunrpc/xprtrdma/verbs.c | 5 -----
>> 1 file changed, 5 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 3c7f904..451e100 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -873,9 +873,6 @@ retry:
>> dprintk("RPC: %s: rpcrdma_ep_disconnect"
>> " status %i\n", __func__, rc);
>>
>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
>> -
>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
>> id = rpcrdma_create_id(xprt, ia,
>> (struct sockaddr *)&xprt->rx_data.addr);
>> @@ -985,8 +982,6 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep,
>> struct rpcrdma_ia *ia) {
>> int rc;
>>
>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
>> rc = rdma_disconnect(ia->ri_id);
>> if (!rc) {
>> /* returns without wait if not connected */
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
> N�����r��y���b�X��ǧv�^�){.n�+����{���"��^n�r���z�\x1a��h����&��\x1e�G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <53B45D7B.4020705-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2014-07-02 19:40 ` Chuck Lever
[not found] ` <C9B761DF-7960-4346-949E-17A9BDD357DB-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-02 19:42 ` Devesh Sharma
1 sibling, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-07-02 19:40 UTC (permalink / raw)
To: Steve Wise, Devesh Sharma
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux NFS Mailing List
On Jul 2, 2014, at 3:28 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
> On 7/2/2014 2:06 PM, Devesh Sharma wrote:
>> This change is very much prone to generate poll_cq errors because of un-cleaned completions which still
>> point to the non-existent QPs. On the new connection when these completions are polled, the poll_cq will
>> fail because old QP pointer is already NULL.
>> Did anyone hit this situation during their testing?
I tested this aggressively with a fault injector that triggers regular connection
disruption.
> Hey Devesh,
>
> iw_cxgb4 will silently toss CQEs if the QP is not active.
xprtrdma relies on getting a completion (either successful or in error) for every
WR it has posted. The goal of this patch is to avoid throwing away queued
completions after a transport disconnect so we don't lose track of FRMR rkey
updates (FAST_REG_MR and LOCAL_INV completions) and we can capture all RPC
replies posted before the connection was lost.
Sounds like we also need to keep the QP around, even in error state, until all
known WRs on that QP have completed?
>
>
>>> -----Original Message-----
>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
>>> Sent: Tuesday, June 24, 2014 4:10 AM
>>> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
>>>
>>> CQs are not destroyed until unmount. By draining CQs on transport
>>> disconnect, successful completions that can change the r.frmr.state field can
>>> be missed.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>> ---
>>> net/sunrpc/xprtrdma/verbs.c | 5 -----
>>> 1 file changed, 5 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 3c7f904..451e100 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -873,9 +873,6 @@ retry:
>>> dprintk("RPC: %s: rpcrdma_ep_disconnect"
>>> " status %i\n", __func__, rc);
>>>
>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
>>> -
>>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
>>> id = rpcrdma_create_id(xprt, ia,
>>> (struct sockaddr *)&xprt->rx_data.addr);
>>> @@ -985,8 +982,6 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep,
>>> struct rpcrdma_ia *ia) {
>>> int rc;
>>>
>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
>>> rc = rdma_disconnect(ia->ri_id);
>>> if (!rc) {
>>> /* returns without wait if not connected */
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
>>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>> N�����r��y���b�X��ǧv�^�){.n�+����{���"��^n�r���z�\x1a��h����&��\x1e�G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <53B45D7B.4020705-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-07-02 19:40 ` Chuck Lever
@ 2014-07-02 19:42 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C57A-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
1 sibling, 1 reply; 38+ messages in thread
From: Devesh Sharma @ 2014-07-02 19:42 UTC (permalink / raw)
To: Steve Wise, Chuck Lever,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Thursday, July 03, 2014 12:59 AM
> To: Devesh Sharma; Chuck Lever; linux-rdma@vger.kernel.org; linux-
> nfs@vger.kernel.org
> Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> disconnect
>
> On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> > This change is very much prone to generate poll_cq errors because of
> > un-cleaned completions which still point to the non-existent QPs. On
> > the new connection when these completions are polled, the poll_cq will fail
> because old QP pointer is already NULL.
> > Did anyone hit this situation during their testing?
>
> Hey Devesh,
>
> iw_cxgb4 will silently toss CQEs if the QP is not active.
Ya, just now checked that in mlx and cxgb4 driver code. On the other hand ocrdma is asserting a BUG-ON for such CQEs causing system panic.
Out of curiosity I am asking, how this change is useful here, is it reducing the re-connection time...Anyhow rpcrdma_clean_cq was discarding the completions (flush/successful both)
>
>
> >> -----Original Message-----
> >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >> owner@vger.kernel.org] On Behalf Of Chuck Lever
> >> Sent: Tuesday, June 24, 2014 4:10 AM
> >> To: linux-rdma@vger.kernel.org; linux-nfs@vger.kernel.org
> >> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> >> disconnect
> >>
> >> CQs are not destroyed until unmount. By draining CQs on transport
> >> disconnect, successful completions that can change the r.frmr.state
> >> field can be missed.
Still those are missed isn’t it....Since those successful completions will still be dropped after re-connection. Am I missing something to
understanding the motivation...
> >>
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >> net/sunrpc/xprtrdma/verbs.c | 5 -----
> >> 1 file changed, 5 deletions(-)
> >>
> >> diff --git a/net/sunrpc/xprtrdma/verbs.c
> >> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> >> --- a/net/sunrpc/xprtrdma/verbs.c
> >> +++ b/net/sunrpc/xprtrdma/verbs.c
> >> @@ -873,9 +873,6 @@ retry:
> >> dprintk("RPC: %s: rpcrdma_ep_disconnect"
> >> " status %i\n", __func__, rc);
> >>
> >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> >> -
> >> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> >> id = rpcrdma_create_id(xprt, ia,
> >> (struct sockaddr *)&xprt->rx_data.addr);
> @@ -985,8 +982,6 @@
> >> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
> >> {
> >> int rc;
> >>
> >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> >> rc = rdma_disconnect(ia->ri_id);
> >> if (!rc) {
> >> /* returns without wait if not connected */
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> >> in the body of a message to majordomo@vger.kernel.org More
> majordomo
> >> info at http://vger.kernel.org/majordomo-info.html
> > N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G h \x03
> > ( 階 ݢj" \x1a ^[m z ޖ f h ~ mml==
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <C9B761DF-7960-4346-949E-17A9BDD357DB-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-07-02 19:46 ` Steve Wise
2014-07-02 19:48 ` Devesh Sharma
0 siblings, 1 reply; 38+ messages in thread
From: Steve Wise @ 2014-07-02 19:46 UTC (permalink / raw)
To: 'Chuck Lever', 'Devesh Sharma'
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
'Linux NFS Mailing List'
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, July 02, 2014 2:40 PM
> To: Steve Wise; Devesh Sharma
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
> Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
>
>
> On Jul 2, 2014, at 3:28 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org> wrote:
>
> > On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> >> This change is very much prone to generate poll_cq errors because of un-cleaned
> completions which still
> >> point to the non-existent QPs. On the new connection when these completions are polled,
> the poll_cq will
> >> fail because old QP pointer is already NULL.
> >> Did anyone hit this situation during their testing?
>
> I tested this aggressively with a fault injector that triggers regular connection
> disruption.
>
> > Hey Devesh,
> >
> > iw_cxgb4 will silently toss CQEs if the QP is not active.
>
> xprtrdma relies on getting a completion (either successful or in error) for every
> WR it has posted. The goal of this patch is to avoid throwing away queued
> completions after a transport disconnect so we don't lose track of FRMR rkey
> updates (FAST_REG_MR and LOCAL_INV completions) and we can capture all RPC
> replies posted before the connection was lost.
>
> Sounds like we also need to keep the QP around, even in error state, until all
> known WRs on that QP have completed?
>
Perhaps.
>
> >
> >
> >>> -----Original Message-----
> >>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> >>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
> >>> Sent: Tuesday, June 24, 2014 4:10 AM
> >>> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >>> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
> >>>
> >>> CQs are not destroyed until unmount. By draining CQs on transport
> >>> disconnect, successful completions that can change the r.frmr.state field can
> >>> be missed.
> >>>
> >>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> >>> ---
> >>> net/sunrpc/xprtrdma/verbs.c | 5 -----
> >>> 1 file changed, 5 deletions(-)
> >>>
> >>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> >>> index 3c7f904..451e100 100644
> >>> --- a/net/sunrpc/xprtrdma/verbs.c
> >>> +++ b/net/sunrpc/xprtrdma/verbs.c
> >>> @@ -873,9 +873,6 @@ retry:
> >>> dprintk("RPC: %s: rpcrdma_ep_disconnect"
> >>> " status %i\n", __func__, rc);
> >>>
> >>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> >>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> >>> -
> >>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> >>> id = rpcrdma_create_id(xprt, ia,
> >>> (struct sockaddr *)&xprt->rx_data.addr);
> >>> @@ -985,8 +982,6 @@ rpcrdma_ep_disconnect(struct rpcrdma_ep *ep,
> >>> struct rpcrdma_ia *ia) {
> >>> int rc;
> >>>
> >>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> >>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> >>> rc = rdma_disconnect(ia->ri_id);
> >>> if (!rc) {
> >>> /* returns without wait if not connected */
> >>>
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> >>> body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at
> >>> http://vger.kernel.org/majordomo-info.html
> >> N�����r��y���b�X��ǧv�^�){.n�+����{���"��^n�r���z�\x1a��h����&��\x1e�G���h�\x03(�階
> �ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
2014-07-02 19:46 ` Steve Wise
@ 2014-07-02 19:48 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C594-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Devesh Sharma @ 2014-07-02 19:48 UTC (permalink / raw)
To: Steve Wise, 'Chuck Lever'
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
'Linux NFS Mailing List'
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Thursday, July 03, 2014 1:16 AM
> To: 'Chuck Lever'; Devesh Sharma
> Cc: linux-rdma@vger.kernel.org; 'Linux NFS Mailing List'
> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> disconnect
>
>
>
> > -----Original Message-----
> > From: Chuck Lever [mailto:chuck.lever@oracle.com]
> > Sent: Wednesday, July 02, 2014 2:40 PM
> > To: Steve Wise; Devesh Sharma
> > Cc: linux-rdma@vger.kernel.org; Linux NFS Mailing List
> > Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > disconnect
> >
> >
> > On Jul 2, 2014, at 3:28 PM, Steve Wise <swise@opengridcomputing.com>
> wrote:
> >
> > > On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> > >> This change is very much prone to generate poll_cq errors because
> > >> of un-cleaned
> > completions which still
> > >> point to the non-existent QPs. On the new connection when these
> > >> completions are polled,
> > the poll_cq will
> > >> fail because old QP pointer is already NULL.
> > >> Did anyone hit this situation during their testing?
> >
> > I tested this aggressively with a fault injector that triggers regular
> > connection disruption.
> >
> > > Hey Devesh,
> > >
> > > iw_cxgb4 will silently toss CQEs if the QP is not active.
> >
> > xprtrdma relies on getting a completion (either successful or in
> > error) for every WR it has posted. The goal of this patch is to avoid
> > throwing away queued completions after a transport disconnect so we
> > don't lose track of FRMR rkey updates (FAST_REG_MR and LOCAL_INV
> > completions) and we can capture all RPC replies posted before the
> connection was lost.
> >
> > Sounds like we also need to keep the QP around, even in error state,
> > until all known WRs on that QP have completed?
> >
Why not poll and process every completion during rpcrdma_cq_cleanup()....
>
> Perhaps.
>
> >
> > >
> > >
> > >>> -----Original Message-----
> > >>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > >>> owner@vger.kernel.org] On Behalf Of Chuck Lever
> > >>> Sent: Tuesday, June 24, 2014 4:10 AM
> > >>> To: linux-rdma@vger.kernel.org; linux-nfs@vger.kernel.org
> > >>> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > >>> disconnect
> > >>>
> > >>> CQs are not destroyed until unmount. By draining CQs on transport
> > >>> disconnect, successful completions that can change the
> > >>> r.frmr.state field can be missed.
> > >>>
> > >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > >>> ---
> > >>> net/sunrpc/xprtrdma/verbs.c | 5 -----
> > >>> 1 file changed, 5 deletions(-)
> > >>>
> > >>> diff --git a/net/sunrpc/xprtrdma/verbs.c
> > >>> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> > >>> --- a/net/sunrpc/xprtrdma/verbs.c
> > >>> +++ b/net/sunrpc/xprtrdma/verbs.c
> > >>> @@ -873,9 +873,6 @@ retry:
> > >>> dprintk("RPC: %s: rpcrdma_ep_disconnect"
> > >>> " status %i\n", __func__, rc);
> > >>>
> > >>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > >>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > >>> -
> > >>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> > >>> id = rpcrdma_create_id(xprt, ia,
> > >>> (struct sockaddr *)&xprt->rx_data.addr);
> @@ -985,8 +982,6 @@
> > >>> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia
> > >>> *ia) {
> > >>> int rc;
> > >>>
> > >>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > >>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > >>> rc = rdma_disconnect(ia->ri_id);
> > >>> if (!rc) {
> > >>> /* returns without wait if not connected */
> > >>>
> > >>> --
> > >>> To unsubscribe from this list: send the line "unsubscribe
> > >>> linux-rdma" in the body of a message to majordomo@vger.kernel.org
> > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > >> N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G
> > >> h \x03( 階
> > ݢj" \x1a ^[m z ޖ f h ~ mml==
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> > > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > > info at http://vger.kernel.org/majordomo-info.html
> >
> > --
> > Chuck Lever
> > chuck[dot]lever[at]oracle[dot]com
> >
> >
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C57A-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-07-02 19:50 ` Steve Wise
2014-07-02 19:53 ` Devesh Sharma
2014-07-02 19:56 ` Devesh Sharma
0 siblings, 2 replies; 38+ messages in thread
From: Steve Wise @ 2014-07-02 19:50 UTC (permalink / raw)
To: 'Devesh Sharma', 'Chuck Lever',
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
> -----Original Message-----
> From: Devesh Sharma [mailto:Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, July 02, 2014 2:43 PM
> To: Steve Wise; Chuck Lever; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs@vger.kernel.org
> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
>
> > -----Original Message-----
> > From: Steve Wise [mailto:swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
> > Sent: Thursday, July 03, 2014 12:59 AM
> > To: Devesh Sharma; Chuck Lever; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-
> > nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > disconnect
> >
> > On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> > > This change is very much prone to generate poll_cq errors because of
> > > un-cleaned completions which still point to the non-existent QPs. On
> > > the new connection when these completions are polled, the poll_cq will fail
> > because old QP pointer is already NULL.
> > > Did anyone hit this situation during their testing?
> >
> > Hey Devesh,
> >
> > iw_cxgb4 will silently toss CQEs if the QP is not active.
>
> Ya, just now checked that in mlx and cxgb4 driver code. On the other hand ocrdma is asserting
> a BUG-ON for such CQEs causing system panic.
> Out of curiosity I am asking, how this change is useful here, is it reducing the re-connection
> time...Anyhow rpcrdma_clean_cq was discarding the completions (flush/successful both)
>
Well, I don't think there is anything restricting an application from destroying the QP with pending CQEs on its CQs. So it definitely shouldn't cause a BUG_ON() I think. I'll have to read up in the Verbs specs if destroying a QP kills all the pending CQEs...
> >
> >
> > >> -----Original Message-----
> > >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> > >> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
> > >> Sent: Tuesday, June 24, 2014 4:10 AM
> > >> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > >> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > >> disconnect
> > >>
> > >> CQs are not destroyed until unmount. By draining CQs on transport
> > >> disconnect, successful completions that can change the r.frmr.state
> > >> field can be missed.
>
> Still those are missed isn’t it....Since those successful completions will still be dropped after re-
> connection. Am I missing something to
> understanding the motivation...
>
> > >>
> > >> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > >> ---
> > >> net/sunrpc/xprtrdma/verbs.c | 5 -----
> > >> 1 file changed, 5 deletions(-)
> > >>
> > >> diff --git a/net/sunrpc/xprtrdma/verbs.c
> > >> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> > >> --- a/net/sunrpc/xprtrdma/verbs.c
> > >> +++ b/net/sunrpc/xprtrdma/verbs.c
> > >> @@ -873,9 +873,6 @@ retry:
> > >> dprintk("RPC: %s: rpcrdma_ep_disconnect"
> > >> " status %i\n", __func__, rc);
> > >>
> > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > >> -
> > >> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> > >> id = rpcrdma_create_id(xprt, ia,
> > >> (struct sockaddr *)&xprt->rx_data.addr);
> > @@ -985,8 +982,6 @@
> > >> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
> > >> {
> > >> int rc;
> > >>
> > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > >> rc = rdma_disconnect(ia->ri_id);
> > >> if (!rc) {
> > >> /* returns without wait if not connected */
> > >>
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > >> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More
> > majordomo
> > >> info at http://vger.kernel.org/majordomo-info.html
> > > N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G h \x03
> > > ( 階 ݢj" \x1a ^[m z ޖ f h ~ mml==
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
2014-07-02 19:50 ` Steve Wise
@ 2014-07-02 19:53 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C5AE-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-02 19:56 ` Devesh Sharma
1 sibling, 1 reply; 38+ messages in thread
From: Devesh Sharma @ 2014-07-02 19:53 UTC (permalink / raw)
To: Steve Wise, 'Chuck Lever',
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Thursday, July 03, 2014 1:21 AM
> To: Devesh Sharma; 'Chuck Lever'; linux-rdma@vger.kernel.org; linux-
> nfs@vger.kernel.org
> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> disconnect
>
>
>
> > -----Original Message-----
> > From: Devesh Sharma [mailto:Devesh.Sharma@Emulex.Com]
> > Sent: Wednesday, July 02, 2014 2:43 PM
> > To: Steve Wise; Chuck Lever; linux-rdma@vger.kernel.org;
> > linux-nfs@vger.kernel.org
> > Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > disconnect
> >
> > > -----Original Message-----
> > > From: Steve Wise [mailto:swise@opengridcomputing.com]
> > > Sent: Thursday, July 03, 2014 12:59 AM
> > > To: Devesh Sharma; Chuck Lever; linux-rdma@vger.kernel.org; linux-
> > > nfs@vger.kernel.org
> > > Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > disconnect
> > >
> > > On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> > > > This change is very much prone to generate poll_cq errors because
> > > > of un-cleaned completions which still point to the non-existent
> > > > QPs. On the new connection when these completions are polled, the
> > > > poll_cq will fail
> > > because old QP pointer is already NULL.
> > > > Did anyone hit this situation during their testing?
> > >
> > > Hey Devesh,
> > >
> > > iw_cxgb4 will silently toss CQEs if the QP is not active.
> >
> > Ya, just now checked that in mlx and cxgb4 driver code. On the other
> > hand ocrdma is asserting a BUG-ON for such CQEs causing system panic.
> > Out of curiosity I am asking, how this change is useful here, is it
> > reducing the re-connection time...Anyhow rpcrdma_clean_cq was
> > discarding the completions (flush/successful both)
> >
>
> Well, I don't think there is anything restricting an application from destroying
> the QP with pending CQEs on its CQs. So it definitely shouldn't cause a
> BUG_ON() I think. I'll have to read up in the Verbs specs if destroying a QP
> kills all the pending CQEs...
Oh confusion...let me clarify: in ocrdma BUG ON is hit in poll_cq() after re-connection happens and cq is polled again.
Now the first completion in CQ still points to old QP-ID for which ocrdma does not have valid QP pointer.
>
>
> > >
> > >
> > > >> -----Original Message-----
> > > >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > >> owner@vger.kernel.org] On Behalf Of Chuck Lever
> > > >> Sent: Tuesday, June 24, 2014 4:10 AM
> > > >> To: linux-rdma@vger.kernel.org; linux-nfs@vger.kernel.org
> > > >> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > >> disconnect
> > > >>
> > > >> CQs are not destroyed until unmount. By draining CQs on transport
> > > >> disconnect, successful completions that can change the
> > > >> r.frmr.state field can be missed.
> >
> > Still those are missed isn’t it....Since those successful completions
> > will still be dropped after re- connection. Am I missing something to
> > understanding the motivation...
> >
> > > >>
> > > >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > >> ---
> > > >> net/sunrpc/xprtrdma/verbs.c | 5 -----
> > > >> 1 file changed, 5 deletions(-)
> > > >>
> > > >> diff --git a/net/sunrpc/xprtrdma/verbs.c
> > > >> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> > > >> --- a/net/sunrpc/xprtrdma/verbs.c
> > > >> +++ b/net/sunrpc/xprtrdma/verbs.c
> > > >> @@ -873,9 +873,6 @@ retry:
> > > >> dprintk("RPC: %s:
> rpcrdma_ep_disconnect"
> > > >> " status %i\n", __func__, rc);
> > > >>
> > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > >> -
> > > >> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> > > >> id = rpcrdma_create_id(xprt, ia,
> > > >> (struct sockaddr *)&xprt-
> >rx_data.addr);
> > > @@ -985,8 +982,6 @@
> > > >> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia
> > > >> *ia) {
> > > >> int rc;
> > > >>
> > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > >> rc = rdma_disconnect(ia->ri_id);
> > > >> if (!rc) {
> > > >> /* returns without wait if not connected */
> > > >>
> > > >> --
> > > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > >> in the body of a message to majordomo@vger.kernel.org More
> > > majordomo
> > > >> info at http://vger.kernel.org/majordomo-info.html
> > > > N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G h \x03
> > > > ( 階 ݢj" \x1a ^[m z ޖ f h ~ mml==
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
2014-07-02 19:50 ` Steve Wise
2014-07-02 19:53 ` Devesh Sharma
@ 2014-07-02 19:56 ` Devesh Sharma
1 sibling, 0 replies; 38+ messages in thread
From: Devesh Sharma @ 2014-07-02 19:56 UTC (permalink / raw)
To: Steve Wise, 'Chuck Lever',
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Thursday, July 03, 2014 1:21 AM
> To: Devesh Sharma; 'Chuck Lever'; linux-rdma@vger.kernel.org; linux-
> nfs@vger.kernel.org
> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> disconnect
>
>
>
> > -----Original Message-----
> > From: Devesh Sharma [mailto:Devesh.Sharma@Emulex.Com]
> > Sent: Wednesday, July 02, 2014 2:43 PM
> > To: Steve Wise; Chuck Lever; linux-rdma@vger.kernel.org;
> > linux-nfs@vger.kernel.org
> > Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > disconnect
> >
> > > -----Original Message-----
> > > From: Steve Wise [mailto:swise@opengridcomputing.com]
> > > Sent: Thursday, July 03, 2014 12:59 AM
> > > To: Devesh Sharma; Chuck Lever; linux-rdma@vger.kernel.org; linux-
> > > nfs@vger.kernel.org
> > > Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > disconnect
> > >
> > > On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> > > > This change is very much prone to generate poll_cq errors because
> > > > of un-cleaned completions which still point to the non-existent
> > > > QPs. On the new connection when these completions are polled, the
> > > > poll_cq will fail
> > > because old QP pointer is already NULL.
> > > > Did anyone hit this situation during their testing?
> > >
> > > Hey Devesh,
> > >
> > > iw_cxgb4 will silently toss CQEs if the QP is not active.
> >
> > Ya, just now checked that in mlx and cxgb4 driver code. On the other
> > hand ocrdma is asserting a BUG-ON for such CQEs causing system panic.
> > Out of curiosity I am asking, how this change is useful here, is it
> > reducing the re-connection time...Anyhow rpcrdma_clean_cq was
> > discarding the completions (flush/successful both)
> >
>
> Well, I don't think there is anything restricting an application from destroying
> the QP with pending CQEs on its CQs. So it definitely shouldn't cause a
> BUG_ON() I think. I'll have to read up in the Verbs specs if destroying a QP
> kills all the pending CQEs...
I will double check ocrdma_destroy_qp code.
>
>
> > >
> > >
> > > >> -----Original Message-----
> > > >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > >> owner@vger.kernel.org] On Behalf Of Chuck Lever
> > > >> Sent: Tuesday, June 24, 2014 4:10 AM
> > > >> To: linux-rdma@vger.kernel.org; linux-nfs@vger.kernel.org
> > > >> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > >> disconnect
> > > >>
> > > >> CQs are not destroyed until unmount. By draining CQs on transport
> > > >> disconnect, successful completions that can change the
> > > >> r.frmr.state field can be missed.
> >
> > Still those are missed isn’t it....Since those successful completions
> > will still be dropped after re- connection. Am I missing something to
> > understanding the motivation...
> >
> > > >>
> > > >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > >> ---
> > > >> net/sunrpc/xprtrdma/verbs.c | 5 -----
> > > >> 1 file changed, 5 deletions(-)
> > > >>
> > > >> diff --git a/net/sunrpc/xprtrdma/verbs.c
> > > >> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> > > >> --- a/net/sunrpc/xprtrdma/verbs.c
> > > >> +++ b/net/sunrpc/xprtrdma/verbs.c
> > > >> @@ -873,9 +873,6 @@ retry:
> > > >> dprintk("RPC: %s:
> rpcrdma_ep_disconnect"
> > > >> " status %i\n", __func__, rc);
> > > >>
> > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > >> -
> > > >> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> > > >> id = rpcrdma_create_id(xprt, ia,
> > > >> (struct sockaddr *)&xprt-
> >rx_data.addr);
> > > @@ -985,8 +982,6 @@
> > > >> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia
> > > >> *ia) {
> > > >> int rc;
> > > >>
> > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > >> rc = rdma_disconnect(ia->ri_id);
> > > >> if (!rc) {
> > > >> /* returns without wait if not connected */
> > > >>
> > > >> --
> > > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > >> in the body of a message to majordomo@vger.kernel.org More
> > > majordomo
> > > >> info at http://vger.kernel.org/majordomo-info.html
> > > > N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G h \x03
> > > > ( 階 ݢj" \x1a ^[m z ޖ f h ~ mml==
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C5AE-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-07-02 19:56 ` Steve Wise
2014-07-02 19:57 ` Devesh Sharma
0 siblings, 1 reply; 38+ messages in thread
From: Steve Wise @ 2014-07-02 19:56 UTC (permalink / raw)
To: 'Devesh Sharma', 'Chuck Lever',
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
linux-nfs-u79uwXL29TY76Z2rM5mHXA
> -----Original Message-----
> From: Devesh Sharma [mailto:Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org]
> Sent: Wednesday, July 02, 2014 2:54 PM
> To: Steve Wise; 'Chuck Lever'; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs@vger.kernel.org
> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
>
>
>
> > -----Original Message-----
> > From: Steve Wise [mailto:swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
> > Sent: Thursday, July 03, 2014 1:21 AM
> > To: Devesh Sharma; 'Chuck Lever'; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-
> > nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > disconnect
> >
> >
> >
> > > -----Original Message-----
> > > From: Devesh Sharma [mailto:Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org]
> > > Sent: Wednesday, July 02, 2014 2:43 PM
> > > To: Steve Wise; Chuck Lever; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org;
> > > linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > disconnect
> > >
> > > > -----Original Message-----
> > > > From: Steve Wise [mailto:swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
> > > > Sent: Thursday, July 03, 2014 12:59 AM
> > > > To: Devesh Sharma; Chuck Lever; linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-
> > > > nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > > disconnect
> > > >
> > > > On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> > > > > This change is very much prone to generate poll_cq errors because
> > > > > of un-cleaned completions which still point to the non-existent
> > > > > QPs. On the new connection when these completions are polled, the
> > > > > poll_cq will fail
> > > > because old QP pointer is already NULL.
> > > > > Did anyone hit this situation during their testing?
> > > >
> > > > Hey Devesh,
> > > >
> > > > iw_cxgb4 will silently toss CQEs if the QP is not active.
> > >
> > > Ya, just now checked that in mlx and cxgb4 driver code. On the other
> > > hand ocrdma is asserting a BUG-ON for such CQEs causing system panic.
> > > Out of curiosity I am asking, how this change is useful here, is it
> > > reducing the re-connection time...Anyhow rpcrdma_clean_cq was
> > > discarding the completions (flush/successful both)
> > >
> >
> > Well, I don't think there is anything restricting an application from destroying
> > the QP with pending CQEs on its CQs. So it definitely shouldn't cause a
> > BUG_ON() I think. I'll have to read up in the Verbs specs if destroying a QP
> > kills all the pending CQEs...
>
> Oh confusion...let me clarify: in ocrdma BUG ON is hit in poll_cq() after re-connection happens
> and cq is polled again.
> Now the first completion in CQ still points to old QP-ID for which ocrdma does not have valid
> QP pointer.
>
Right. Which means it’s a stale CQE. I don't think that should cause a BUG_ON.
> >
> >
> > > >
> > > >
> > > > >> -----Original Message-----
> > > > >> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
> > > > >> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
> > > > >> Sent: Tuesday, June 24, 2014 4:10 AM
> > > > >> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > > > >> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > > >> disconnect
> > > > >>
> > > > >> CQs are not destroyed until unmount. By draining CQs on transport
> > > > >> disconnect, successful completions that can change the
> > > > >> r.frmr.state field can be missed.
> > >
> > > Still those are missed isn’t it....Since those successful completions
> > > will still be dropped after re- connection. Am I missing something to
> > > understanding the motivation...
> > >
> > > > >>
> > > > >> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > > > >> ---
> > > > >> net/sunrpc/xprtrdma/verbs.c | 5 -----
> > > > >> 1 file changed, 5 deletions(-)
> > > > >>
> > > > >> diff --git a/net/sunrpc/xprtrdma/verbs.c
> > > > >> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> > > > >> --- a/net/sunrpc/xprtrdma/verbs.c
> > > > >> +++ b/net/sunrpc/xprtrdma/verbs.c
> > > > >> @@ -873,9 +873,6 @@ retry:
> > > > >> dprintk("RPC: %s:
> > rpcrdma_ep_disconnect"
> > > > >> " status %i\n", __func__, rc);
> > > > >>
> > > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > > >> -
> > > > >> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> > > > >> id = rpcrdma_create_id(xprt, ia,
> > > > >> (struct sockaddr *)&xprt-
> > >rx_data.addr);
> > > > @@ -985,8 +982,6 @@
> > > > >> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia
> > > > >> *ia) {
> > > > >> int rc;
> > > > >>
> > > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > > >> rc = rdma_disconnect(ia->ri_id);
> > > > >> if (!rc) {
> > > > >> /* returns without wait if not connected */
> > > > >>
> > > > >> --
> > > > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > > > >> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More
> > > > majordomo
> > > > >> info at http://vger.kernel.org/majordomo-info.html
> > > > > N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G h \x03
> > > > > ( 階 ݢj" \x1a ^[m z ޖ f h ~ mml==
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
2014-07-02 19:56 ` Steve Wise
@ 2014-07-02 19:57 ` Devesh Sharma
0 siblings, 0 replies; 38+ messages in thread
From: Devesh Sharma @ 2014-07-02 19:57 UTC (permalink / raw)
To: Steve Wise, 'Chuck Lever',
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> -----Original Message-----
> From: Steve Wise [mailto:swise@opengridcomputing.com]
> Sent: Thursday, July 03, 2014 1:27 AM
> To: Devesh Sharma; 'Chuck Lever'; linux-rdma@vger.kernel.org; linux-
> nfs@vger.kernel.org
> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> disconnect
>
>
>
> > -----Original Message-----
> > From: Devesh Sharma [mailto:Devesh.Sharma@Emulex.Com]
> > Sent: Wednesday, July 02, 2014 2:54 PM
> > To: Steve Wise; 'Chuck Lever'; linux-rdma@vger.kernel.org;
> > linux-nfs@vger.kernel.org
> > Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > disconnect
> >
> >
> >
> > > -----Original Message-----
> > > From: Steve Wise [mailto:swise@opengridcomputing.com]
> > > Sent: Thursday, July 03, 2014 1:21 AM
> > > To: Devesh Sharma; 'Chuck Lever'; linux-rdma@vger.kernel.org; linux-
> > > nfs@vger.kernel.org
> > > Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> > > disconnect
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Devesh Sharma [mailto:Devesh.Sharma@Emulex.Com]
> > > > Sent: Wednesday, July 02, 2014 2:43 PM
> > > > To: Steve Wise; Chuck Lever; linux-rdma@vger.kernel.org;
> > > > linux-nfs@vger.kernel.org
> > > > Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on
> > > > transport disconnect
> > > >
> > > > > -----Original Message-----
> > > > > From: Steve Wise [mailto:swise@opengridcomputing.com]
> > > > > Sent: Thursday, July 03, 2014 12:59 AM
> > > > > To: Devesh Sharma; Chuck Lever; linux-rdma@vger.kernel.org;
> > > > > linux- nfs@vger.kernel.org
> > > > > Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on
> > > > > transport disconnect
> > > > >
> > > > > On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> > > > > > This change is very much prone to generate poll_cq errors
> > > > > > because of un-cleaned completions which still point to the
> > > > > > non-existent QPs. On the new connection when these completions
> > > > > > are polled, the poll_cq will fail
> > > > > because old QP pointer is already NULL.
> > > > > > Did anyone hit this situation during their testing?
> > > > >
> > > > > Hey Devesh,
> > > > >
> > > > > iw_cxgb4 will silently toss CQEs if the QP is not active.
> > > >
> > > > Ya, just now checked that in mlx and cxgb4 driver code. On the
> > > > other hand ocrdma is asserting a BUG-ON for such CQEs causing system
> panic.
> > > > Out of curiosity I am asking, how this change is useful here, is
> > > > it reducing the re-connection time...Anyhow rpcrdma_clean_cq was
> > > > discarding the completions (flush/successful both)
> > > >
> > >
> > > Well, I don't think there is anything restricting an application from
> destroying
> > > the QP with pending CQEs on its CQs. So it definitely shouldn't cause a
> > > BUG_ON() I think. I'll have to read up in the Verbs specs if destroying a
> QP
> > > kills all the pending CQEs...
> >
> > Oh confusion...let me clarify: in ocrdma BUG ON is hit in poll_cq()
> > after re-connection happens and cq is polled again.
> > Now the first completion in CQ still points to old QP-ID for which
> > ocrdma does not have valid QP pointer.
> >
>
> Right. Which means it’s a stale CQE. I don't think that should cause a
> BUG_ON.
Yes this surely needs a fix in ocrdma.
>
>
> > >
> > >
> > > > >
> > > > >
> > > > > >> -----Original Message-----
> > > > > >> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> > > > > >> owner@vger.kernel.org] On Behalf Of Chuck Lever
> > > > > >> Sent: Tuesday, June 24, 2014 4:10 AM
> > > > > >> To: linux-rdma@vger.kernel.org; linux-nfs@vger.kernel.org
> > > > > >> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on
> > > > > >> transport disconnect
> > > > > >>
> > > > > >> CQs are not destroyed until unmount. By draining CQs on
> > > > > >> transport disconnect, successful completions that can change
> > > > > >> the r.frmr.state field can be missed.
> > > >
> > > > Still those are missed isn’t it....Since those successful
> > > > completions will still be dropped after re- connection. Am I
> > > > missing something to understanding the motivation...
> > > >
> > > > > >>
> > > > > >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > > >> ---
> > > > > >> net/sunrpc/xprtrdma/verbs.c | 5 -----
> > > > > >> 1 file changed, 5 deletions(-)
> > > > > >>
> > > > > >> diff --git a/net/sunrpc/xprtrdma/verbs.c
> > > > > >> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> > > > > >> --- a/net/sunrpc/xprtrdma/verbs.c
> > > > > >> +++ b/net/sunrpc/xprtrdma/verbs.c
> > > > > >> @@ -873,9 +873,6 @@ retry:
> > > > > >> dprintk("RPC: %s:
> > > rpcrdma_ep_disconnect"
> > > > > >> " status %i\n", __func__, rc);
> > > > > >>
> > > > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > > > >> -
> > > > > >> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> > > > > >> id = rpcrdma_create_id(xprt, ia,
> > > > > >> (struct sockaddr *)&xprt-
> > > >rx_data.addr);
> > > > > @@ -985,8 +982,6 @@
> > > > > >> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct
> > > > > >> rpcrdma_ia
> > > > > >> *ia) {
> > > > > >> int rc;
> > > > > >>
> > > > > >> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> > > > > >> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> > > > > >> rc = rdma_disconnect(ia->ri_id);
> > > > > >> if (!rc) {
> > > > > >> /* returns without wait if not connected */
> > > > > >>
> > > > > >> --
> > > > > >> To unsubscribe from this list: send the line "unsubscribe linux-
> rdma"
> > > > > >> in the body of a message to majordomo@vger.kernel.org More
> > > > > majordomo
> > > > > >> info at http://vger.kernel.org/majordomo-info.html
> > > > > > N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G h \x03
> > > > > > ( 階 ݢj" \x1a ^[m z ޖ f h ~ mml==
> > >
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C594-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
@ 2014-07-02 19:59 ` Chuck Lever
[not found] ` <8D65ABF9-DC2B-4906-BBDE-60F03FCEF990-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
0 siblings, 1 reply; 38+ messages in thread
From: Chuck Lever @ 2014-07-02 19:59 UTC (permalink / raw)
To: Devesh Sharma
Cc: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux NFS Mailing List
On Jul 2, 2014, at 3:48 PM, Devesh Sharma <Devesh.Sharma-iH1Dq9VlAzfQT0dZR+AlfA@public.gmane.org> wrote:
>
>
>> -----Original Message-----
>> From: Steve Wise [mailto:swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org]
>> Sent: Thursday, July 03, 2014 1:16 AM
>> To: 'Chuck Lever'; Devesh Sharma
>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; 'Linux NFS Mailing List'
>> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
>> disconnect
>>
>>
>>
>>> -----Original Message-----
>>> From: Chuck Lever [mailto:chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org]
>>> Sent: Wednesday, July 02, 2014 2:40 PM
>>> To: Steve Wise; Devesh Sharma
>>> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Linux NFS Mailing List
>>> Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
>>> disconnect
>>>
>>>
>>> On Jul 2, 2014, at 3:28 PM, Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
>> wrote:
>>>
>>>> On 7/2/2014 2:06 PM, Devesh Sharma wrote:
>>>>> This change is very much prone to generate poll_cq errors because
>>>>> of un-cleaned
>>> completions which still
>>>>> point to the non-existent QPs. On the new connection when these
>>>>> completions are polled,
>>> the poll_cq will
>>>>> fail because old QP pointer is already NULL.
>>>>> Did anyone hit this situation during their testing?
>>>
>>> I tested this aggressively with a fault injector that triggers regular
>>> connection disruption.
>>>
>>>> Hey Devesh,
>>>>
>>>> iw_cxgb4 will silently toss CQEs if the QP is not active.
>>>
>>> xprtrdma relies on getting a completion (either successful or in
>>> error) for every WR it has posted. The goal of this patch is to avoid
>>> throwing away queued completions after a transport disconnect so we
>>> don't lose track of FRMR rkey updates (FAST_REG_MR and LOCAL_INV
>>> completions) and we can capture all RPC replies posted before the
>> connection was lost.
>>>
>>> Sounds like we also need to keep the QP around, even in error state,
>>> until all known WRs on that QP have completed?
>>>
>
> Why not poll and process every completion during rpcrdma_cq_cleanup()….
Yes, I have a patch in the next version of this series that does that.
It just calls rpcrdma_sendcq_upcall() from the connect worker. I will
squash that change into this patch.
Maybe it needs to invoke rpcrdma_recvcq_upcall() there as well.
>
>>
>> Perhaps.
>>
>>>
>>>>
>>>>
>>>>>> -----Original Message-----
>>>>>> From: linux-rdma-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org [mailto:linux-rdma-
>>>>>> owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org] On Behalf Of Chuck Lever
>>>>>> Sent: Tuesday, June 24, 2014 4:10 AM
>>>>>> To: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>>>> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
>>>>>> disconnect
>>>>>>
>>>>>> CQs are not destroyed until unmount. By draining CQs on transport
>>>>>> disconnect, successful completions that can change the
>>>>>> r.frmr.state field can be missed.
>>>>>>
>>>>>> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>>>>> ---
>>>>>> net/sunrpc/xprtrdma/verbs.c | 5 -----
>>>>>> 1 file changed, 5 deletions(-)
>>>>>>
>>>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c
>>>>>> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
>>>>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>>>>> @@ -873,9 +873,6 @@ retry:
>>>>>> dprintk("RPC: %s: rpcrdma_ep_disconnect"
>>>>>> " status %i\n", __func__, rc);
>>>>>>
>>>>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
>>>>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
>>>>>> -
>>>>>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
>>>>>> id = rpcrdma_create_id(xprt, ia,
>>>>>> (struct sockaddr *)&xprt->rx_data.addr);
>> @@ -985,8 +982,6 @@
>>>>>> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia
>>>>>> *ia) {
>>>>>> int rc;
>>>>>>
>>>>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
>>>>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
>>>>>> rc = rdma_disconnect(ia->ri_id);
>>>>>> if (!rc) {
>>>>>> /* returns without wait if not connected */
>>>>>>
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe
>>>>>> linux-rdma" in the body of a message to majordomo-u79uwXL29TaqPxH82wqD4g@public.gmane.orgg
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>> N r y b X ǧv ^ ){.n + { " ^n r z \x1a h & \x1e G
>>>>> h \x03( 階
>>> ݢj" \x1a ^[m z ޖ f h ~ mml==
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
>>>> in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More
>> majordomo
>>>> info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> Chuck Lever
>>> chuck[dot]lever[at]oracle[dot]com
>>>
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect
[not found] ` <8D65ABF9-DC2B-4906-BBDE-60F03FCEF990-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2014-07-03 5:33 ` Devesh Sharma
0 siblings, 0 replies; 38+ messages in thread
From: Devesh Sharma @ 2014-07-03 5:33 UTC (permalink / raw)
To: Chuck Lever
Cc: Steve Wise, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Linux NFS Mailing List
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5782 bytes --]
> -----Original Message-----
> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> Sent: Thursday, July 03, 2014 1:30 AM
> To: Devesh Sharma
> Cc: Steve Wise; linux-rdma@vger.kernel.org; Linux NFS Mailing List
> Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> disconnect
>
>
> On Jul 2, 2014, at 3:48 PM, Devesh Sharma <Devesh.Sharma@Emulex.Com>
> wrote:
>
> >
> >
> >> -----Original Message-----
> >> From: Steve Wise [mailto:swise@opengridcomputing.com]
> >> Sent: Thursday, July 03, 2014 1:16 AM
> >> To: 'Chuck Lever'; Devesh Sharma
> >> Cc: linux-rdma@vger.kernel.org; 'Linux NFS Mailing List'
> >> Subject: RE: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> >> disconnect
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Chuck Lever [mailto:chuck.lever@oracle.com]
> >>> Sent: Wednesday, July 02, 2014 2:40 PM
> >>> To: Steve Wise; Devesh Sharma
> >>> Cc: linux-rdma@vger.kernel.org; Linux NFS Mailing List
> >>> Subject: Re: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> >>> disconnect
> >>>
> >>>
> >>> On Jul 2, 2014, at 3:28 PM, Steve Wise
> <swise@opengridcomputing.com>
> >> wrote:
> >>>
> >>>> On 7/2/2014 2:06 PM, Devesh Sharma wrote:
> >>>>> This change is very much prone to generate poll_cq errors because
> >>>>> of un-cleaned
> >>> completions which still
> >>>>> point to the non-existent QPs. On the new connection when these
> >>>>> completions are polled,
> >>> the poll_cq will
> >>>>> fail because old QP pointer is already NULL.
> >>>>> Did anyone hit this situation during their testing?
> >>>
> >>> I tested this aggressively with a fault injector that triggers
> >>> regular connection disruption.
> >>>
> >>>> Hey Devesh,
> >>>>
> >>>> iw_cxgb4 will silently toss CQEs if the QP is not active.
> >>>
> >>> xprtrdma relies on getting a completion (either successful or in
> >>> error) for every WR it has posted. The goal of this patch is to
> >>> avoid throwing away queued completions after a transport disconnect
> >>> so we don't lose track of FRMR rkey updates (FAST_REG_MR and
> >>> LOCAL_INV
> >>> completions) and we can capture all RPC replies posted before the
> >> connection was lost.
> >>>
> >>> Sounds like we also need to keep the QP around, even in error state,
> >>> until all known WRs on that QP have completed?
> >>>
> >
> > Why not poll and process every completion during
> rpcrdma_cq_cleanup()â¦.
>
> Yes, I have a patch in the next version of this series that does that.
> It just calls rpcrdma_sendcq_upcall() from the connect worker. I will squash
> that change into this patch.
Cool.
>
> Maybe it needs to invoke rpcrdma_recvcq_upcall() there as well.
Yes, that would be needed.
>
>
> >
> >>
> >> Perhaps.
> >>
> >>>
> >>>>
> >>>>
> >>>>>> -----Original Message-----
> >>>>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
> >>>>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
> >>>>>> Sent: Tuesday, June 24, 2014 4:10 AM
> >>>>>> To: linux-rdma@vger.kernel.org; linux-nfs@vger.kernel.org
> >>>>>> Subject: [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport
> >>>>>> disconnect
> >>>>>>
> >>>>>> CQs are not destroyed until unmount. By draining CQs on transport
> >>>>>> disconnect, successful completions that can change the
> >>>>>> r.frmr.state field can be missed.
> >>>>>>
> >>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>>>> ---
> >>>>>> net/sunrpc/xprtrdma/verbs.c | 5 -----
> >>>>>> 1 file changed, 5 deletions(-)
> >>>>>>
> >>>>>> diff --git a/net/sunrpc/xprtrdma/verbs.c
> >>>>>> b/net/sunrpc/xprtrdma/verbs.c index 3c7f904..451e100 100644
> >>>>>> --- a/net/sunrpc/xprtrdma/verbs.c
> >>>>>> +++ b/net/sunrpc/xprtrdma/verbs.c
> >>>>>> @@ -873,9 +873,6 @@ retry:
> >>>>>> dprintk("RPC: %s:
> rpcrdma_ep_disconnect"
> >>>>>> " status %i\n", __func__, rc);
> >>>>>>
> >>>>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> >>>>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> >>>>>> -
> >>>>>> xprt = container_of(ia, struct rpcrdma_xprt, rx_ia);
> >>>>>> id = rpcrdma_create_id(xprt, ia,
> >>>>>> (struct sockaddr *)&xprt-
> >rx_data.addr);
> >> @@ -985,8 +982,6 @@
> >>>>>> rpcrdma_ep_disconnect(struct rpcrdma_ep *ep, struct rpcrdma_ia
> >>>>>> *ia) {
> >>>>>> int rc;
> >>>>>>
> >>>>>> - rpcrdma_clean_cq(ep->rep_attr.recv_cq);
> >>>>>> - rpcrdma_clean_cq(ep->rep_attr.send_cq);
> >>>>>> rc = rdma_disconnect(ia->ri_id);
> >>>>>> if (!rc) {
> >>>>>> /* returns without wait if not connected */
> >>>>>>
> >>>>>> --
> >>>>>> To unsubscribe from this list: send the line "unsubscribe
> >>>>>> linux-rdma" in the body of a message to
> majordomo@vger.kernel.org
> >>>>>> More majordomo info at http://vger.kernel.org/majordomo-
> info.html
> >>>>> N r y b X ǧv ^ )޺{.n + { " ^n r z \x1a h & \x1e G
> >>>>> h \x03( é
> >>> Ý¢j" \x1a ^[m z Þ f h ~ mml==
> >>>>
> >>>> --
> >>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> >>>> in the body of a message to majordomo@vger.kernel.org More
> >> majordomo
> >>>> info at http://vger.kernel.org/majordomo-info.html
> >>>
> >>> --
> >>> Chuck Lever
> >>> chuck[dot]lever[at]oracle[dot]com
> >>>
> >>>
> >>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> > in the body of a message to majordomo@vger.kernel.org More
> majordomo
> > info at http://vger.kernel.org/majordomo-info.html
> >
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
N§²æìr¸yúèØb²X¬¶Ç§vØ^)Þº{.nÇ+·¥{±Ù{ayº\x1dÊÚë,j\a¢f£¢·h»öì\x17/oSc¾Ú³9uÀ¦æåÈ&jw¨®\x03(éÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þàþf£¢·h§~m
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2014-07-03 5:33 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-23 22:39 [PATCH v1 00/13] NFS/RDMA patches for 3.17 Chuck Lever
[not found] ` <20140623223201.1634.83888.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-23 22:39 ` [PATCH v1 01/13] xprtrdma: Fix panic in rpcrdma_register_frmr_external() Chuck Lever
[not found] ` <20140623223909.1634.33362.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-24 14:37 ` Or Gerlitz
2014-06-23 22:39 ` [PATCH v1 02/13] xprtrdma: Protect ->qp during FRMR deregistration Chuck Lever
2014-06-23 22:39 ` [PATCH v1 03/13] xprtrdma: Limit data payload size for ALLPHYSICAL Chuck Lever
2014-06-23 22:39 ` [PATCH v1 04/13] xprtrdma: Update rkeys after transport reconnect Chuck Lever
2014-06-23 22:39 ` [PATCH v1 05/13] xprtrdma: Don't drain CQs on transport disconnect Chuck Lever
[not found] ` <20140623223942.1634.89063.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-07-02 19:06 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C540-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-02 19:28 ` Steve Wise
[not found] ` <53B45D7B.4020705-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2014-07-02 19:40 ` Chuck Lever
[not found] ` <C9B761DF-7960-4346-949E-17A9BDD357DB-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-02 19:46 ` Steve Wise
2014-07-02 19:48 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C594-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-02 19:59 ` Chuck Lever
[not found] ` <8D65ABF9-DC2B-4906-BBDE-60F03FCEF990-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-07-03 5:33 ` Devesh Sharma
2014-07-02 19:42 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C57A-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-02 19:50 ` Steve Wise
2014-07-02 19:53 ` Devesh Sharma
[not found] ` <EE7902D3F51F404C82415C4803930ACD3FE0C5AE-DWYeeINJQrxExQ8dmkPuX0M9+F4ksjoh@public.gmane.org>
2014-07-02 19:56 ` Steve Wise
2014-07-02 19:57 ` Devesh Sharma
2014-07-02 19:56 ` Devesh Sharma
2014-06-23 22:39 ` [PATCH v1 06/13] xprtrdma: Unclutter struct rpcrdma_mr_seg Chuck Lever
2014-06-23 22:39 ` [PATCH v1 07/13] xprtrdma: Encode Work Request opcode in wc->wr_id Chuck Lever
2014-06-23 22:40 ` [PATCH v1 08/13] xprtrdma: Back off rkey when FAST_REG_MR fails Chuck Lever
[not found] ` <20140623224007.1634.55636.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-24 15:47 ` Anna Schumaker
[not found] ` <53A99DA6.90808-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2014-06-24 16:26 ` Chuck Lever
2014-06-23 22:40 ` [PATCH v1 09/13] xprtrdma: Refactor rpcrdma_buffer_put() Chuck Lever
2014-06-23 22:40 ` [PATCH v1 10/13] xprtrdma: Release FRMR segment buffers during LOCAL_INV completion Chuck Lever
[not found] ` <20140623224023.1634.67233.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-25 5:17 ` Shirley Ma
[not found] ` <53AA5B72.3010200-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-25 14:32 ` Chuck Lever
[not found] ` <89930B1D-AE3B-48AD-922C-6FCA754D2B01-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2014-06-25 16:14 ` Shirley Ma
2014-06-23 22:40 ` [PATCH v1 11/13] xprtrdma: Clean up rpcrdma_ep_disconnect() Chuck Lever
2014-06-23 22:40 ` [PATCH v1 12/13] xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro Chuck Lever
2014-06-23 22:40 ` [PATCH v1 13/13] xprtrdma: Handle additional connection events Chuck Lever
[not found] ` <20140623224048.1634.23972.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-06-24 15:58 ` Anna Schumaker
2014-06-24 14:35 ` [PATCH v1 00/13] NFS/RDMA patches for 3.17 Or Gerlitz
[not found] ` <CAJZOPZ+ix6tPDHXbVrSnVzofHSbzqOoyTBvzkEo-GJpOYOaPFA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-06-24 17:07 ` Chuck Lever
2014-06-25 22:47 ` Steve Wise
2014-06-27 16:17 ` Shirley Ma
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox