* [PATCH 00/15] RPC/RDMA patchset for next merge window
@ 2008-10-08 15:46 Tom Talpey
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
0 siblings, 1 reply; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:46 UTC (permalink / raw)
To: linux-nfs
The following series updates the RPC/RDMA (NFS/RDMA) client to
support the new rdma "fastreg" memory registration mode, which
fixes operation on the Chelsio cxgb3 adapter and strengthens
the safety of others.
Additionally, it fixes many smaller issues in the code improving
its robustness and performance. Except for supporting large (>32KB)
rpc's, it addresses all known issues in the client.
It's my hope this patchset can be queued for the upcoming merge
window. It has been extensively tested with both IB and iWARP
adapters under Connectathon and heavy parallel load.
This patchset applies to the current nfs-2.6 git;
(4330ed8ed4da360ac1ca14b0fddff4c05b10de16)
---
Tom Talpey (14):
RPC/RDMA: optionally emit useful transport info upon connect/disconnect.
RPC/RDMA: reformat a debug printk to keep lines together.
RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls.
RPC/RDMA: correct a 5 second pause on reconnecting to an idle server.
RPC/RDMA: fix connect/reconnect resource leak.
RPC/RDMA: return a consistent error to mount, when connect fails.
RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.
RPC/RDMA: avoid an oops due to disconnect racing with async upcalls.
RPC/RDMA: maintain the RPC task bytes-sent statistic.
RPC/RDMA: suppress retransmit on RPC/RDMA clients.
RPC/RDMA: support FRMR client memory registration.
RPC/RDMA: check selected memory registration mode at runtime.
RPC/RDMA: add data types and new FRMR memory registration enum.
RPC/RDMA: refactor the inline memory registration code.
Tom Tucker (1):
RPC/RDMA: fix connection IRD/ORD setting
net/sunrpc/xprtrdma/rpc_rdma.c | 30 +-
net/sunrpc/xprtrdma/transport.c | 39 +-
net/sunrpc/xprtrdma/verbs.c | 737 +++++++++++++++++++++++++++------------
net/sunrpc/xprtrdma/xprt_rdma.h | 12 +
4 files changed, 570 insertions(+), 248 deletions(-)
--
Tom.
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 01/15] RPC/RDMA: refactor the inline memory registration code.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
@ 2008-10-08 15:47 ` Tom Talpey
2008-10-08 15:47 ` [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum Tom Talpey
` (13 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:47 UTC (permalink / raw)
To: linux-nfs
Refactor the memory registration and deregistration routines.
This saves stack space, makes the code more readable and prepares
to add the new FRMR registration methods.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/verbs.c | 365 ++++++++++++++++++++++++-------------------
1 files changed, 207 insertions(+), 158 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 8ea283e..d04208a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -863,6 +863,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
char *p;
size_t len;
int i, rc;
+ struct rpcrdma_mw *r;
buf->rb_max_requests = cdata->max_requests;
spin_lock_init(&buf->rb_lock);
@@ -873,7 +874,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
* 2. arrays of struct rpcrdma_req to fill in pointers
* 3. array of struct rpcrdma_rep for replies
* 4. padding, if any
- * 5. mw's, if any
+ * 5. mw's or fmr's, if any
* Send/recv buffers in req/rep need to be registered
*/
@@ -927,15 +928,13 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
* and also reduce unbind-to-bind collision.
*/
INIT_LIST_HEAD(&buf->rb_mws);
+ r = (struct rpcrdma_mw *)p;
switch (ia->ri_memreg_strategy) {
case RPCRDMA_MTHCAFMR:
- {
- struct rpcrdma_mw *r = (struct rpcrdma_mw *)p;
- struct ib_fmr_attr fa = {
- RPCRDMA_MAX_DATA_SEGS, 1, PAGE_SHIFT
- };
/* TBD we are perhaps overallocating here */
for (i = (buf->rb_max_requests+1) * RPCRDMA_MAX_SEGS; i; i--) {
+ static struct ib_fmr_attr fa =
+ { RPCRDMA_MAX_DATA_SEGS, 1, PAGE_SHIFT };
r->r.fmr = ib_alloc_fmr(ia->ri_pd,
IB_ACCESS_REMOTE_WRITE | IB_ACCESS_REMOTE_READ,
&fa);
@@ -948,12 +947,9 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
list_add(&r->mw_list, &buf->rb_mws);
++r;
}
- }
break;
case RPCRDMA_MEMWINDOWS_ASYNC:
case RPCRDMA_MEMWINDOWS:
- {
- struct rpcrdma_mw *r = (struct rpcrdma_mw *)p;
/* Allocate one extra request's worth, for full cycling */
for (i = (buf->rb_max_requests+1) * RPCRDMA_MAX_SEGS; i; i--) {
r->r.mw = ib_alloc_mw(ia->ri_pd);
@@ -966,7 +962,6 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
list_add(&r->mw_list, &buf->rb_mws);
++r;
}
- }
break;
default:
break;
@@ -1046,6 +1041,7 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
{
int rc, i;
struct rpcrdma_ia *ia = rdmab_to_ia(buf);
+ struct rpcrdma_mw *r;
/* clean up in reverse order from create
* 1. recv mr memory (mr free, then kfree)
@@ -1065,7 +1061,6 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
}
if (buf->rb_send_bufs && buf->rb_send_bufs[i]) {
while (!list_empty(&buf->rb_mws)) {
- struct rpcrdma_mw *r;
r = list_entry(buf->rb_mws.next,
struct rpcrdma_mw, mw_list);
list_del(&r->mw_list);
@@ -1115,6 +1110,8 @@ rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
{
struct rpcrdma_req *req;
unsigned long flags;
+ int i;
+ struct rpcrdma_mw *r;
spin_lock_irqsave(&buffers->rb_lock, flags);
if (buffers->rb_send_index == buffers->rb_max_requests) {
@@ -1135,9 +1132,8 @@ rpcrdma_buffer_get(struct rpcrdma_buffer *buffers)
}
buffers->rb_send_bufs[buffers->rb_send_index++] = NULL;
if (!list_empty(&buffers->rb_mws)) {
- int i = RPCRDMA_MAX_SEGS - 1;
+ i = RPCRDMA_MAX_SEGS - 1;
do {
- struct rpcrdma_mw *r;
r = list_entry(buffers->rb_mws.next,
struct rpcrdma_mw, mw_list);
list_del(&r->mw_list);
@@ -1329,15 +1325,202 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
seg->mr_dma, seg->mr_dmalen, seg->mr_dir);
}
+static int
+rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg,
+ int *nsegs, int writing, struct rpcrdma_ia *ia)
+{
+ struct rpcrdma_mr_seg *seg1 = seg;
+ u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
+ int len, pageoff, i, rc;
+
+ pageoff = offset_in_page(seg1->mr_offset);
+ seg1->mr_offset -= pageoff; /* start of page */
+ seg1->mr_len += pageoff;
+ len = -pageoff;
+ if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
+ *nsegs = RPCRDMA_MAX_DATA_SEGS;
+ for (i = 0; i < *nsegs;) {
+ rpcrdma_map_one(ia, seg, writing);
+ physaddrs[i] = seg->mr_dma;
+ len += seg->mr_len;
+ ++seg;
+ ++i;
+ /* Check for holes */
+ if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
+ offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
+ break;
+ }
+ rc = ib_map_phys_fmr(seg1->mr_chunk.rl_mw->r.fmr,
+ physaddrs, i, seg1->mr_dma);
+ if (rc) {
+ dprintk("RPC: %s: failed ib_map_phys_fmr "
+ "%u@0x%llx+%i (%d)... status %i\n", __func__,
+ len, (unsigned long long)seg1->mr_dma,
+ pageoff, i, rc);
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
+ } else {
+ seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.fmr->rkey;
+ seg1->mr_base = seg1->mr_dma + pageoff;
+ seg1->mr_nsegs = i;
+ seg1->mr_len = len;
+ }
+ *nsegs = i;
+ return rc;
+}
+
+static int
+rpcrdma_deregister_fmr_external(struct rpcrdma_mr_seg *seg,
+ struct rpcrdma_ia *ia)
+{
+ struct rpcrdma_mr_seg *seg1 = seg;
+ LIST_HEAD(l);
+ int rc;
+
+ list_add(&seg1->mr_chunk.rl_mw->r.fmr->list, &l);
+ rc = ib_unmap_fmr(&l);
+ while (seg1->mr_nsegs--)
+ rpcrdma_unmap_one(ia, seg++);
+ if (rc)
+ dprintk("RPC: %s: failed ib_unmap_fmr,"
+ " status %i\n", __func__, rc);
+ return rc;
+}
+
+static int
+rpcrdma_register_memwin_external(struct rpcrdma_mr_seg *seg,
+ int *nsegs, int writing, struct rpcrdma_ia *ia,
+ struct rpcrdma_xprt *r_xprt)
+{
+ int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
+ IB_ACCESS_REMOTE_READ);
+ struct ib_mw_bind param;
+ int rc;
+
+ *nsegs = 1;
+ rpcrdma_map_one(ia, seg, writing);
+ param.mr = ia->ri_bind_mem;
+ param.wr_id = 0ULL; /* no send cookie */
+ param.addr = seg->mr_dma;
+ param.length = seg->mr_len;
+ param.send_flags = 0;
+ param.mw_access_flags = mem_priv;
+
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+ rc = ib_bind_mw(ia->ri_id->qp, seg->mr_chunk.rl_mw->r.mw, ¶m);
+ if (rc) {
+ dprintk("RPC: %s: failed ib_bind_mw "
+ "%u@0x%llx status %i\n",
+ __func__, seg->mr_len,
+ (unsigned long long)seg->mr_dma, rc);
+ rpcrdma_unmap_one(ia, seg);
+ } else {
+ seg->mr_rkey = seg->mr_chunk.rl_mw->r.mw->rkey;
+ seg->mr_base = param.addr;
+ seg->mr_nsegs = 1;
+ }
+ return rc;
+}
+
+static int
+rpcrdma_deregister_memwin_external(struct rpcrdma_mr_seg *seg,
+ struct rpcrdma_ia *ia,
+ struct rpcrdma_xprt *r_xprt, void **r)
+{
+ struct ib_mw_bind param;
+ LIST_HEAD(l);
+ int rc;
+
+ BUG_ON(seg->mr_nsegs != 1);
+ param.mr = ia->ri_bind_mem;
+ param.addr = 0ULL; /* unbind */
+ param.length = 0;
+ param.mw_access_flags = 0;
+ if (*r) {
+ param.wr_id = (u64) (unsigned long) *r;
+ param.send_flags = IB_SEND_SIGNALED;
+ INIT_CQCOUNT(&r_xprt->rx_ep);
+ } else {
+ param.wr_id = 0ULL;
+ param.send_flags = 0;
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+ }
+ rc = ib_bind_mw(ia->ri_id->qp, seg->mr_chunk.rl_mw->r.mw, ¶m);
+ rpcrdma_unmap_one(ia, seg);
+ if (rc)
+ dprintk("RPC: %s: failed ib_(un)bind_mw,"
+ " status %i\n", __func__, rc);
+ else
+ *r = NULL; /* will upcall on completion */
+ return rc;
+}
+
+static int
+rpcrdma_register_default_external(struct rpcrdma_mr_seg *seg,
+ int *nsegs, int writing, struct rpcrdma_ia *ia)
+{
+ int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
+ IB_ACCESS_REMOTE_READ);
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
+ int len, i, rc = 0;
+
+ if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
+ *nsegs = RPCRDMA_MAX_DATA_SEGS;
+ for (len = 0, i = 0; i < *nsegs;) {
+ rpcrdma_map_one(ia, seg, writing);
+ ipb[i].addr = seg->mr_dma;
+ ipb[i].size = seg->mr_len;
+ len += seg->mr_len;
+ ++seg;
+ ++i;
+ /* Check for holes */
+ if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
+ offset_in_page((seg-1)->mr_offset+(seg-1)->mr_len))
+ break;
+ }
+ seg1->mr_base = seg1->mr_dma;
+ seg1->mr_chunk.rl_mr = ib_reg_phys_mr(ia->ri_pd,
+ ipb, i, mem_priv, &seg1->mr_base);
+ if (IS_ERR(seg1->mr_chunk.rl_mr)) {
+ rc = PTR_ERR(seg1->mr_chunk.rl_mr);
+ dprintk("RPC: %s: failed ib_reg_phys_mr "
+ "%u@0x%llx (%d)... status %i\n",
+ __func__, len,
+ (unsigned long long)seg1->mr_dma, i, rc);
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
+ } else {
+ seg1->mr_rkey = seg1->mr_chunk.rl_mr->rkey;
+ seg1->mr_nsegs = i;
+ seg1->mr_len = len;
+ }
+ *nsegs = i;
+ return rc;
+}
+
+static int
+rpcrdma_deregister_default_external(struct rpcrdma_mr_seg *seg,
+ struct rpcrdma_ia *ia)
+{
+ struct rpcrdma_mr_seg *seg1 = seg;
+ int rc;
+
+ rc = ib_dereg_mr(seg1->mr_chunk.rl_mr);
+ seg1->mr_chunk.rl_mr = NULL;
+ while (seg1->mr_nsegs--)
+ rpcrdma_unmap_one(ia, seg++);
+ if (rc)
+ dprintk("RPC: %s: failed ib_dereg_mr,"
+ " status %i\n", __func__, rc);
+ return rc;
+}
+
int
rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
int nsegs, int writing, struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- int mem_priv = (writing ? IB_ACCESS_REMOTE_WRITE :
- IB_ACCESS_REMOTE_READ);
- struct rpcrdma_mr_seg *seg1 = seg;
- int i;
int rc = 0;
switch (ia->ri_memreg_strategy) {
@@ -1352,114 +1535,20 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
break;
#endif
- /* Registration using fast memory registration */
+ /* Registration using fmr memory registration */
case RPCRDMA_MTHCAFMR:
- {
- u64 physaddrs[RPCRDMA_MAX_DATA_SEGS];
- int len, pageoff = offset_in_page(seg->mr_offset);
- seg1->mr_offset -= pageoff; /* start of page */
- seg1->mr_len += pageoff;
- len = -pageoff;
- if (nsegs > RPCRDMA_MAX_DATA_SEGS)
- nsegs = RPCRDMA_MAX_DATA_SEGS;
- for (i = 0; i < nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
- physaddrs[i] = seg->mr_dma;
- len += seg->mr_len;
- ++seg;
- ++i;
- /* Check for holes */
- if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
- offset_in_page((seg-1)->mr_offset+(seg-1)->mr_len))
- break;
- }
- nsegs = i;
- rc = ib_map_phys_fmr(seg1->mr_chunk.rl_mw->r.fmr,
- physaddrs, nsegs, seg1->mr_dma);
- if (rc) {
- dprintk("RPC: %s: failed ib_map_phys_fmr "
- "%u@0x%llx+%i (%d)... status %i\n", __func__,
- len, (unsigned long long)seg1->mr_dma,
- pageoff, nsegs, rc);
- while (nsegs--)
- rpcrdma_unmap_one(ia, --seg);
- } else {
- seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.fmr->rkey;
- seg1->mr_base = seg1->mr_dma + pageoff;
- seg1->mr_nsegs = nsegs;
- seg1->mr_len = len;
- }
- }
+ rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
break;
/* Registration using memory windows */
case RPCRDMA_MEMWINDOWS_ASYNC:
case RPCRDMA_MEMWINDOWS:
- {
- struct ib_mw_bind param;
- rpcrdma_map_one(ia, seg, writing);
- param.mr = ia->ri_bind_mem;
- param.wr_id = 0ULL; /* no send cookie */
- param.addr = seg->mr_dma;
- param.length = seg->mr_len;
- param.send_flags = 0;
- param.mw_access_flags = mem_priv;
-
- DECR_CQCOUNT(&r_xprt->rx_ep);
- rc = ib_bind_mw(ia->ri_id->qp,
- seg->mr_chunk.rl_mw->r.mw, ¶m);
- if (rc) {
- dprintk("RPC: %s: failed ib_bind_mw "
- "%u@0x%llx status %i\n",
- __func__, seg->mr_len,
- (unsigned long long)seg->mr_dma, rc);
- rpcrdma_unmap_one(ia, seg);
- } else {
- seg->mr_rkey = seg->mr_chunk.rl_mw->r.mw->rkey;
- seg->mr_base = param.addr;
- seg->mr_nsegs = 1;
- nsegs = 1;
- }
- }
+ rc = rpcrdma_register_memwin_external(seg, &nsegs, writing, ia, r_xprt);
break;
/* Default registration each time */
default:
- {
- struct ib_phys_buf ipb[RPCRDMA_MAX_DATA_SEGS];
- int len = 0;
- if (nsegs > RPCRDMA_MAX_DATA_SEGS)
- nsegs = RPCRDMA_MAX_DATA_SEGS;
- for (i = 0; i < nsegs;) {
- rpcrdma_map_one(ia, seg, writing);
- ipb[i].addr = seg->mr_dma;
- ipb[i].size = seg->mr_len;
- len += seg->mr_len;
- ++seg;
- ++i;
- /* Check for holes */
- if ((i < nsegs && offset_in_page(seg->mr_offset)) ||
- offset_in_page((seg-1)->mr_offset+(seg-1)->mr_len))
- break;
- }
- nsegs = i;
- seg1->mr_base = seg1->mr_dma;
- seg1->mr_chunk.rl_mr = ib_reg_phys_mr(ia->ri_pd,
- ipb, nsegs, mem_priv, &seg1->mr_base);
- if (IS_ERR(seg1->mr_chunk.rl_mr)) {
- rc = PTR_ERR(seg1->mr_chunk.rl_mr);
- dprintk("RPC: %s: failed ib_reg_phys_mr "
- "%u@0x%llx (%d)... status %i\n",
- __func__, len,
- (unsigned long long)seg1->mr_dma, nsegs, rc);
- while (nsegs--)
- rpcrdma_unmap_one(ia, --seg);
- } else {
- seg1->mr_rkey = seg1->mr_chunk.rl_mr->rkey;
- seg1->mr_nsegs = nsegs;
- seg1->mr_len = len;
- }
- }
+ rc = rpcrdma_register_default_external(seg, &nsegs, writing, ia);
break;
}
if (rc)
@@ -1473,7 +1562,6 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
struct rpcrdma_xprt *r_xprt, void *r)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
- struct rpcrdma_mr_seg *seg1 = seg;
int nsegs = seg->mr_nsegs, rc;
switch (ia->ri_memreg_strategy) {
@@ -1487,55 +1575,16 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
#endif
case RPCRDMA_MTHCAFMR:
- {
- LIST_HEAD(l);
- list_add(&seg->mr_chunk.rl_mw->r.fmr->list, &l);
- rc = ib_unmap_fmr(&l);
- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
- }
- if (rc)
- dprintk("RPC: %s: failed ib_unmap_fmr,"
- " status %i\n", __func__, rc);
+ rc = rpcrdma_deregister_fmr_external(seg, ia);
break;
case RPCRDMA_MEMWINDOWS_ASYNC:
case RPCRDMA_MEMWINDOWS:
- {
- struct ib_mw_bind param;
- BUG_ON(nsegs != 1);
- param.mr = ia->ri_bind_mem;
- param.addr = 0ULL; /* unbind */
- param.length = 0;
- param.mw_access_flags = 0;
- if (r) {
- param.wr_id = (u64) (unsigned long) r;
- param.send_flags = IB_SEND_SIGNALED;
- INIT_CQCOUNT(&r_xprt->rx_ep);
- } else {
- param.wr_id = 0ULL;
- param.send_flags = 0;
- DECR_CQCOUNT(&r_xprt->rx_ep);
- }
- rc = ib_bind_mw(ia->ri_id->qp,
- seg->mr_chunk.rl_mw->r.mw, ¶m);
- rpcrdma_unmap_one(ia, seg);
- }
- if (rc)
- dprintk("RPC: %s: failed ib_(un)bind_mw,"
- " status %i\n", __func__, rc);
- else
- r = NULL; /* will upcall on completion */
+ rc = rpcrdma_deregister_memwin_external(seg, ia, r_xprt, &r);
break;
default:
- rc = ib_dereg_mr(seg1->mr_chunk.rl_mr);
- seg1->mr_chunk.rl_mr = NULL;
- while (seg1->mr_nsegs--)
- rpcrdma_unmap_one(ia, seg++);
- if (rc)
- dprintk("RPC: %s: failed ib_dereg_mr,"
- " status %i\n", __func__, rc);
+ rc = rpcrdma_deregister_default_external(seg, ia);
break;
}
if (r) {
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:47 ` [PATCH 01/15] RPC/RDMA: refactor the inline memory registration code Tom Talpey
@ 2008-10-08 15:47 ` Tom Talpey
[not found] ` <20081008154713.1336.41538.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:47 ` [PATCH 03/15] RPC/RDMA: check selected memory registration mode at runtime Tom Talpey
` (12 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:47 UTC (permalink / raw)
To: linux-nfs
Internal RPC/RDMA structure updates in preparation for FRMR support.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/xprt_rdma.h | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletions(-)
diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
index 4de56b1..55a5d92 100644
--- a/include/linux/sunrpc/xprtrdma.h
+++ b/include/linux/sunrpc/xprtrdma.h
@@ -78,6 +78,7 @@ enum rpcrdma_memreg {
RPCRDMA_MEMWINDOWS,
RPCRDMA_MEMWINDOWS_ASYNC,
RPCRDMA_MTHCAFMR,
+ RPCRDMA_FRMR,
RPCRDMA_ALLPHYSICAL,
RPCRDMA_LAST
};
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 2427822..05b7898 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -58,6 +58,8 @@ struct rpcrdma_ia {
struct rdma_cm_id *ri_id;
struct ib_pd *ri_pd;
struct ib_mr *ri_bind_mem;
+ u32 ri_dma_lkey;
+ int ri_have_dma_lkey;
struct completion ri_done;
int ri_async_rc;
enum rpcrdma_memreg ri_memreg_strategy;
@@ -156,6 +158,10 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
union {
struct ib_mw *mw;
struct ib_fmr *fmr;
+ struct {
+ struct ib_fast_reg_page_list *fr_pgl;
+ struct ib_mr *fr_mr;
+ } frmr;
} r;
struct list_head mw_list;
} *rl_mw;
@@ -198,7 +204,7 @@ struct rpcrdma_buffer {
atomic_t rb_credits; /* most recent server credits */
unsigned long rb_cwndscale; /* cached framework rpc_cwndscale */
int rb_max_requests;/* client max requests */
- struct list_head rb_mws; /* optional memory windows/fmrs */
+ struct list_head rb_mws; /* optional memory windows/fmrs/frmrs */
int rb_send_index;
struct rpcrdma_req **rb_send_bufs;
int rb_recv_index;
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 03/15] RPC/RDMA: check selected memory registration mode at runtime.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:47 ` [PATCH 01/15] RPC/RDMA: refactor the inline memory registration code Tom Talpey
2008-10-08 15:47 ` [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum Tom Talpey
@ 2008-10-08 15:47 ` Tom Talpey
[not found] ` <20081008154723.1336.57976.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:47 ` [PATCH 04/15] RPC/RDMA: support FRMR client memory registration Tom Talpey
` (11 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:47 UTC (permalink / raw)
To: linux-nfs
At transport creation, check for, and use, any local dma lkey.
Then, check that the selected memory registration mode is in fact
supported by the RDMA adapter selected for the mount. Fall back
to best alternative if not.
Signed-off-by: Tom Talpey <talpey@netapp.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
---
net/sunrpc/xprtrdma/verbs.c | 95 ++++++++++++++++++++++++++++++++++++-------
1 files changed, 80 insertions(+), 15 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index d04208a..0f3b431 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -423,7 +423,8 @@ rpcrdma_clean_cq(struct ib_cq *cq)
int
rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
{
- int rc;
+ int rc, mem_priv;
+ struct ib_device_attr devattr;
struct rpcrdma_ia *ia = &xprt->rx_ia;
init_completion(&ia->ri_done);
@@ -443,6 +444,53 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
}
/*
+ * Query the device to determine if the requested memory
+ * registration strategy is supported. If it isn't, set the
+ * strategy to a globally supported model.
+ */
+ rc = ib_query_device(ia->ri_id->device, &devattr);
+ if (rc) {
+ dprintk("RPC: %s: ib_query_device failed %d\n",
+ __func__, rc);
+ goto out2;
+ }
+
+ if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
+ ia->ri_have_dma_lkey = 1;
+ ia->ri_dma_lkey = ia->ri_id->device->local_dma_lkey;
+ }
+
+ switch (memreg) {
+ case RPCRDMA_MEMWINDOWS:
+ case RPCRDMA_MEMWINDOWS_ASYNC:
+ if (!(devattr.device_cap_flags & IB_DEVICE_MEM_WINDOW)) {
+ dprintk("RPC: %s: MEMWINDOWS registration "
+ "specified but not supported by adapter, "
+ "using slower RPCRDMA_REGISTER\n",
+ __func__);
+ memreg = RPCRDMA_REGISTER;
+ }
+ break;
+ case RPCRDMA_MTHCAFMR:
+ if (!ia->ri_id->device->alloc_fmr) {
+#if RPCRDMA_PERSISTENT_REGISTRATION
+ dprintk("RPC: %s: MTHCAFMR registration "
+ "specified but not supported by adapter, "
+ "using riskier RPCRDMA_ALLPHYSICAL\n",
+ __func__);
+ memreg = RPCRDMA_ALLPHYSICAL;
+#else
+ dprintk("RPC: %s: MTHCAFMR registration "
+ "specified but not supported by adapter, "
+ "using slower RPCRDMA_REGISTER\n",
+ __func__);
+ memreg = RPCRDMA_REGISTER;
+#endif
+ }
+ break;
+ }
+
+ /*
* Optionally obtain an underlying physical identity mapping in
* order to do a memory window-based bind. This base registration
* is protected from remote access - that is enabled only by binding
@@ -450,22 +498,27 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
* revoked after the corresponding completion similar to a storage
* adapter.
*/
- if (memreg > RPCRDMA_REGISTER) {
- int mem_priv = IB_ACCESS_LOCAL_WRITE;
- switch (memreg) {
+ switch (memreg) {
+ case RPCRDMA_BOUNCEBUFFERS:
+ case RPCRDMA_REGISTER:
+ break;
#if RPCRDMA_PERSISTENT_REGISTRATION
- case RPCRDMA_ALLPHYSICAL:
- mem_priv |= IB_ACCESS_REMOTE_WRITE;
- mem_priv |= IB_ACCESS_REMOTE_READ;
- break;
+ case RPCRDMA_ALLPHYSICAL:
+ mem_priv = IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_REMOTE_WRITE |
+ IB_ACCESS_REMOTE_READ;
+ goto register_setup;
#endif
- case RPCRDMA_MEMWINDOWS_ASYNC:
- case RPCRDMA_MEMWINDOWS:
- mem_priv |= IB_ACCESS_MW_BIND;
- break;
- default:
+ case RPCRDMA_MEMWINDOWS_ASYNC:
+ case RPCRDMA_MEMWINDOWS:
+ mem_priv = IB_ACCESS_LOCAL_WRITE |
+ IB_ACCESS_MW_BIND;
+ goto register_setup;
+ case RPCRDMA_MTHCAFMR:
+ if (ia->ri_have_dma_lkey)
break;
- }
+ mem_priv = IB_ACCESS_LOCAL_WRITE;
+ register_setup:
ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
if (IS_ERR(ia->ri_bind_mem)) {
printk(KERN_ALERT "%s: ib_get_dma_mr for "
@@ -475,7 +528,15 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
memreg = RPCRDMA_REGISTER;
ia->ri_bind_mem = NULL;
}
+ break;
+ default:
+ printk(KERN_ERR "%s: invalid memory registration mode %d\n",
+ __func__, memreg);
+ rc = -EINVAL;
+ goto out2;
}
+ dprintk("RPC: %s: memory registration strategy is %d\n",
+ __func__, memreg);
/* Else will do memory reg/dereg for each chunk */
ia->ri_memreg_strategy = memreg;
@@ -1248,7 +1309,11 @@ rpcrdma_register_internal(struct rpcrdma_ia *ia, void *va, int len,
va, len, DMA_BIDIRECTIONAL);
iov->length = len;
- if (ia->ri_bind_mem != NULL) {
+ if (ia->ri_have_dma_lkey) {
+ *mrp = NULL;
+ iov->lkey = ia->ri_dma_lkey;
+ return 0;
+ } else if (ia->ri_bind_mem != NULL) {
*mrp = NULL;
iov->lkey = ia->ri_bind_mem->lkey;
return 0;
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 04/15] RPC/RDMA: support FRMR client memory registration.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (2 preceding siblings ...)
2008-10-08 15:47 ` [PATCH 03/15] RPC/RDMA: check selected memory registration mode at runtime Tom Talpey
@ 2008-10-08 15:47 ` Tom Talpey
2008-10-08 15:47 ` [PATCH 05/15] RPC/RDMA: fix connection IRD/ORD setting Tom Talpey
` (10 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:47 UTC (permalink / raw)
To: linux-nfs
Configure, detect and use "fastreg" support from IB/iWARP verbs
layer to perform RPC/RDMA memory registration.
Make FRMR the default memreg mode (will fall back if not supported
by the selected RDMA adapter).
This allows full and optimal operation over the cxgb3 adapter, and others.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/transport.c | 6 -
net/sunrpc/xprtrdma/verbs.c | 167 +++++++++++++++++++++++++++++++++++++++
2 files changed, 167 insertions(+), 6 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index a564c1a..89970b0 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -70,11 +70,7 @@ static unsigned int xprt_rdma_slot_table_entries = RPCRDMA_DEF_SLOT_TABLE;
static unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
static unsigned int xprt_rdma_inline_write_padding;
-#if !RPCRDMA_PERSISTENT_REGISTRATION
-static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_REGISTER; /* FMR? */
-#else
-static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_ALLPHYSICAL;
-#endif
+static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
#ifdef RPC_DEBUG
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 0f3b431..39a1652 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -488,6 +488,26 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
#endif
}
break;
+ case RPCRDMA_FRMR:
+ /* Requires both frmr reg and local dma lkey */
+ if ((devattr.device_cap_flags &
+ (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) !=
+ (IB_DEVICE_MEM_MGT_EXTENSIONS|IB_DEVICE_LOCAL_DMA_LKEY)) {
+#if RPCRDMA_PERSISTENT_REGISTRATION
+ dprintk("RPC: %s: FRMR registration "
+ "specified but not supported by adapter, "
+ "using riskier RPCRDMA_ALLPHYSICAL\n",
+ __func__);
+ memreg = RPCRDMA_ALLPHYSICAL;
+#else
+ dprintk("RPC: %s: FRMR registration "
+ "specified but not supported by adapter, "
+ "using slower RPCRDMA_REGISTER\n",
+ __func__);
+ memreg = RPCRDMA_REGISTER;
+#endif
+ }
+ break;
}
/*
@@ -501,6 +521,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
switch (memreg) {
case RPCRDMA_BOUNCEBUFFERS:
case RPCRDMA_REGISTER:
+ case RPCRDMA_FRMR:
break;
#if RPCRDMA_PERSISTENT_REGISTRATION
case RPCRDMA_ALLPHYSICAL:
@@ -602,6 +623,12 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
ep->rep_attr.srq = NULL;
ep->rep_attr.cap.max_send_wr = cdata->max_requests;
switch (ia->ri_memreg_strategy) {
+ case RPCRDMA_FRMR:
+ /* Add room for frmr register and invalidate WRs */
+ ep->rep_attr.cap.max_send_wr *= 3;
+ if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr)
+ return -EINVAL;
+ break;
case RPCRDMA_MEMWINDOWS_ASYNC:
case RPCRDMA_MEMWINDOWS:
/* Add room for mw_binds+unbinds - overkill! */
@@ -684,6 +711,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
break;
case RPCRDMA_MTHCAFMR:
case RPCRDMA_REGISTER:
+ case RPCRDMA_FRMR:
ep->rep_remote_cma.responder_resources = cdata->max_requests *
(RPCRDMA_MAX_DATA_SEGS / 8);
break;
@@ -935,7 +963,7 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
* 2. arrays of struct rpcrdma_req to fill in pointers
* 3. array of struct rpcrdma_rep for replies
* 4. padding, if any
- * 5. mw's or fmr's, if any
+ * 5. mw's, fmr's or frmr's, if any
* Send/recv buffers in req/rep need to be registered
*/
@@ -943,6 +971,10 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
(sizeof(struct rpcrdma_req *) + sizeof(struct rpcrdma_rep *));
len += cdata->padding;
switch (ia->ri_memreg_strategy) {
+ case RPCRDMA_FRMR:
+ len += buf->rb_max_requests * RPCRDMA_MAX_SEGS *
+ sizeof(struct rpcrdma_mw);
+ break;
case RPCRDMA_MTHCAFMR:
/* TBD we are perhaps overallocating here */
len += (buf->rb_max_requests + 1) * RPCRDMA_MAX_SEGS *
@@ -991,6 +1023,30 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf, struct rpcrdma_ep *ep,
INIT_LIST_HEAD(&buf->rb_mws);
r = (struct rpcrdma_mw *)p;
switch (ia->ri_memreg_strategy) {
+ case RPCRDMA_FRMR:
+ for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--) {
+ r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
+ RPCRDMA_MAX_SEGS);
+ if (IS_ERR(r->r.frmr.fr_mr)) {
+ rc = PTR_ERR(r->r.frmr.fr_mr);
+ dprintk("RPC: %s: ib_alloc_fast_reg_mr"
+ " failed %i\n", __func__, rc);
+ goto out;
+ }
+ r->r.frmr.fr_pgl =
+ ib_alloc_fast_reg_page_list(ia->ri_id->device,
+ RPCRDMA_MAX_SEGS);
+ if (IS_ERR(r->r.frmr.fr_pgl)) {
+ rc = PTR_ERR(r->r.frmr.fr_pgl);
+ dprintk("RPC: %s: "
+ "ib_alloc_fast_reg_page_list "
+ "failed %i\n", __func__, rc);
+ goto out;
+ }
+ list_add(&r->mw_list, &buf->rb_mws);
+ ++r;
+ }
+ break;
case RPCRDMA_MTHCAFMR:
/* TBD we are perhaps overallocating here */
for (i = (buf->rb_max_requests+1) * RPCRDMA_MAX_SEGS; i; i--) {
@@ -1126,6 +1182,15 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
struct rpcrdma_mw, mw_list);
list_del(&r->mw_list);
switch (ia->ri_memreg_strategy) {
+ case RPCRDMA_FRMR:
+ rc = ib_dereg_mr(r->r.frmr.fr_mr);
+ if (rc)
+ dprintk("RPC: %s:"
+ " ib_dereg_mr"
+ " failed %i\n",
+ __func__, rc);
+ ib_free_fast_reg_page_list(r->r.frmr.fr_pgl);
+ break;
case RPCRDMA_MTHCAFMR:
rc = ib_dealloc_fmr(r->r.fmr);
if (rc)
@@ -1228,6 +1293,7 @@ rpcrdma_buffer_put(struct rpcrdma_req *req)
req->rl_reply = NULL;
}
switch (ia->ri_memreg_strategy) {
+ case RPCRDMA_FRMR:
case RPCRDMA_MTHCAFMR:
case RPCRDMA_MEMWINDOWS_ASYNC:
case RPCRDMA_MEMWINDOWS:
@@ -1391,6 +1457,96 @@ rpcrdma_unmap_one(struct rpcrdma_ia *ia, struct rpcrdma_mr_seg *seg)
}
static int
+rpcrdma_register_frmr_external(struct rpcrdma_mr_seg *seg,
+ int *nsegs, int writing, struct rpcrdma_ia *ia,
+ struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct ib_send_wr frmr_wr, *bad_wr;
+ u8 key;
+ int len, pageoff;
+ int i, rc;
+
+ pageoff = offset_in_page(seg1->mr_offset);
+ seg1->mr_offset -= pageoff; /* start of page */
+ seg1->mr_len += pageoff;
+ len = -pageoff;
+ if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
+ *nsegs = RPCRDMA_MAX_DATA_SEGS;
+ for (i = 0; i < *nsegs;) {
+ rpcrdma_map_one(ia, seg, writing);
+ seg1->mr_chunk.rl_mw->r.frmr.fr_pgl->page_list[i] = seg->mr_dma;
+ len += seg->mr_len;
+ ++seg;
+ ++i;
+ /* Check for holes */
+ if ((i < *nsegs && offset_in_page(seg->mr_offset)) ||
+ offset_in_page((seg-1)->mr_offset + (seg-1)->mr_len))
+ break;
+ }
+ dprintk("RPC: %s: Using frmr %p to map %d segments\n",
+ __func__, seg1->mr_chunk.rl_mw, i);
+
+ /* Bump the key */
+ key = (u8)(seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey & 0x000000FF);
+ ib_update_fast_reg_key(seg1->mr_chunk.rl_mw->r.frmr.fr_mr, ++key);
+
+ /* Prepare FRMR WR */
+ memset(&frmr_wr, 0, sizeof frmr_wr);
+ frmr_wr.opcode = IB_WR_FAST_REG_MR;
+ frmr_wr.send_flags = 0; /* unsignaled */
+ frmr_wr.wr.fast_reg.iova_start = (unsigned long)seg1->mr_dma;
+ frmr_wr.wr.fast_reg.page_list = seg1->mr_chunk.rl_mw->r.frmr.fr_pgl;
+ frmr_wr.wr.fast_reg.page_list_len = i;
+ frmr_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
+ frmr_wr.wr.fast_reg.length = i << PAGE_SHIFT;
+ frmr_wr.wr.fast_reg.access_flags = (writing ?
+ IB_ACCESS_REMOTE_WRITE : IB_ACCESS_REMOTE_READ);
+ frmr_wr.wr.fast_reg.rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+
+ rc = ib_post_send(ia->ri_id->qp, &frmr_wr, &bad_wr);
+
+ if (rc) {
+ dprintk("RPC: %s: failed ib_post_send for register,"
+ " status %i\n", __func__, rc);
+ while (i--)
+ rpcrdma_unmap_one(ia, --seg);
+ } else {
+ seg1->mr_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
+ seg1->mr_base = seg1->mr_dma + pageoff;
+ seg1->mr_nsegs = i;
+ seg1->mr_len = len;
+ }
+ *nsegs = i;
+ return rc;
+}
+
+static int
+rpcrdma_deregister_frmr_external(struct rpcrdma_mr_seg *seg,
+ struct rpcrdma_ia *ia, struct rpcrdma_xprt *r_xprt)
+{
+ struct rpcrdma_mr_seg *seg1 = seg;
+ struct ib_send_wr invalidate_wr, *bad_wr;
+ int rc;
+
+ while (seg1->mr_nsegs--)
+ rpcrdma_unmap_one(ia, seg++);
+
+ memset(&invalidate_wr, 0, sizeof invalidate_wr);
+ invalidate_wr.opcode = IB_WR_LOCAL_INV;
+ invalidate_wr.send_flags = 0; /* unsignaled */
+ invalidate_wr.ex.invalidate_rkey = seg1->mr_chunk.rl_mw->r.frmr.fr_mr->rkey;
+ DECR_CQCOUNT(&r_xprt->rx_ep);
+
+ rc = ib_post_send(ia->ri_id->qp, &invalidate_wr, &bad_wr);
+ if (rc)
+ dprintk("RPC: %s: failed ib_post_send for invalidate,"
+ " status %i\n", __func__, rc);
+ return rc;
+}
+
+static int
rpcrdma_register_fmr_external(struct rpcrdma_mr_seg *seg,
int *nsegs, int writing, struct rpcrdma_ia *ia)
{
@@ -1600,6 +1756,11 @@ rpcrdma_register_external(struct rpcrdma_mr_seg *seg,
break;
#endif
+ /* Registration using frmr registration */
+ case RPCRDMA_FRMR:
+ rc = rpcrdma_register_frmr_external(seg, &nsegs, writing, ia, r_xprt);
+ break;
+
/* Registration using fmr memory registration */
case RPCRDMA_MTHCAFMR:
rc = rpcrdma_register_fmr_external(seg, &nsegs, writing, ia);
@@ -1639,6 +1800,10 @@ rpcrdma_deregister_external(struct rpcrdma_mr_seg *seg,
break;
#endif
+ case RPCRDMA_FRMR:
+ rc = rpcrdma_deregister_frmr_external(seg, ia, r_xprt);
+ break;
+
case RPCRDMA_MTHCAFMR:
rc = rpcrdma_deregister_fmr_external(seg, ia);
break;
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 05/15] RPC/RDMA: fix connection IRD/ORD setting
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (3 preceding siblings ...)
2008-10-08 15:47 ` [PATCH 04/15] RPC/RDMA: support FRMR client memory registration Tom Talpey
@ 2008-10-08 15:47 ` Tom Talpey
[not found] ` <20081008154744.1336.20909.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:47 ` [PATCH 06/15] RPC/RDMA: suppress retransmit on RPC/RDMA clients Tom Talpey
` (9 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:47 UTC (permalink / raw)
To: linux-nfs
From: Tom Tucker <talpey@netapp.com>
This logic sets the connection parameter that configures the local device
and informs the remote peer how many concurrent incoming RDMA_READ
requests are supported. The original logic didn't really do what was
intended for two reasons:
- The max number supported by the device is typically smaller than
any one factor in the calculation used, and
- The field in the connection parameter structure where the value is
stored is a u8 and always overflows for the default settings.
So what really happens is the value requested for responder resources
is the left over 8 bits from the "desired value". If the desired value
happened to be a multiple of 256, the result was zero and it wouldn't
connect at all.
Given the above and the fact that max_requests is almost always larger
than the max responder resources supported by the adapter, this patch
simplifies this logic and simply requests the max supported by the device,
subject to a reasonable limit.
This bug was found by Jim Schutt at Sandia.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/verbs.c | 51 ++++++++++++-------------------------------
1 files changed, 14 insertions(+), 37 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 39a1652..e3fe905 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -705,30 +705,13 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
ep->rep_remote_cma.private_data_len = 0;
/* Client offers RDMA Read but does not initiate */
- switch (ia->ri_memreg_strategy) {
- case RPCRDMA_BOUNCEBUFFERS:
+ ep->rep_remote_cma.initiator_depth = 0;
+ if (ia->ri_memreg_strategy == RPCRDMA_BOUNCEBUFFERS)
ep->rep_remote_cma.responder_resources = 0;
- break;
- case RPCRDMA_MTHCAFMR:
- case RPCRDMA_REGISTER:
- case RPCRDMA_FRMR:
- ep->rep_remote_cma.responder_resources = cdata->max_requests *
- (RPCRDMA_MAX_DATA_SEGS / 8);
- break;
- case RPCRDMA_MEMWINDOWS:
- case RPCRDMA_MEMWINDOWS_ASYNC:
-#if RPCRDMA_PERSISTENT_REGISTRATION
- case RPCRDMA_ALLPHYSICAL:
-#endif
- ep->rep_remote_cma.responder_resources = cdata->max_requests *
- (RPCRDMA_MAX_DATA_SEGS / 2);
- break;
- default:
- break;
- }
- if (ep->rep_remote_cma.responder_resources > devattr.max_qp_rd_atom)
+ else if (devattr.max_qp_rd_atom > 32) /* arbitrary but <= 255 */
+ ep->rep_remote_cma.responder_resources = 32;
+ else
ep->rep_remote_cma.responder_resources = devattr.max_qp_rd_atom;
- ep->rep_remote_cma.initiator_depth = 0;
ep->rep_remote_cma.retry_count = 7;
ep->rep_remote_cma.flow_control = 0;
@@ -858,14 +841,6 @@ if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
}
}
- /* Theoretically a client initiator_depth > 0 is not needed,
- * but many peers fail to complete the connection unless they
- * == responder_resources! */
- if (ep->rep_remote_cma.initiator_depth !=
- ep->rep_remote_cma.responder_resources)
- ep->rep_remote_cma.initiator_depth =
- ep->rep_remote_cma.responder_resources;
-
ep->rep_connected = 0;
rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);
@@ -894,14 +869,16 @@ if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
if (ep->rep_connected <= 0) {
/* Sometimes, the only way to reliably connect to remote
* CMs is to use same nonzero values for ORD and IRD. */
- ep->rep_remote_cma.initiator_depth =
- ep->rep_remote_cma.responder_resources;
- if (ep->rep_remote_cma.initiator_depth == 0)
- ++ep->rep_remote_cma.initiator_depth;
- if (ep->rep_remote_cma.responder_resources == 0)
- ++ep->rep_remote_cma.responder_resources;
- if (retry_count++ == 0)
+ if (retry_count++ <= RDMA_CONNECT_RETRY_MAX + 1 &&
+ (ep->rep_remote_cma.responder_resources == 0 ||
+ ep->rep_remote_cma.initiator_depth !=
+ ep->rep_remote_cma.responder_resources)) {
+ if (ep->rep_remote_cma.responder_resources == 0)
+ ep->rep_remote_cma.responder_resources = 1;
+ ep->rep_remote_cma.initiator_depth =
+ ep->rep_remote_cma.responder_resources;
goto retry;
+ }
rc = ep->rep_connected;
} else {
dprintk("RPC: %s: connected\n", __func__);
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 06/15] RPC/RDMA: suppress retransmit on RPC/RDMA clients.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (4 preceding siblings ...)
2008-10-08 15:47 ` [PATCH 05/15] RPC/RDMA: fix connection IRD/ORD setting Tom Talpey
@ 2008-10-08 15:47 ` Tom Talpey
2008-10-08 15:48 ` [PATCH 07/15] RPC/RDMA: maintain the RPC task bytes-sent statistic Tom Talpey
` (8 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:47 UTC (permalink / raw)
To: linux-nfs
An RPC/RDMA client cannot retransmit on an unbroken connection,
doing so violates its flow control with the server.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 2 ++
net/sunrpc/xprtrdma/transport.c | 16 ++++++++++++----
net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
3 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index e55427f..721dae7 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -681,6 +681,8 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
struct rpc_xprt *xprt = ep->rep_xprt;
spin_lock_bh(&xprt->transport_lock);
+ if (++xprt->connect_cookie == 0) /* maintain a reserved value */
+ ++xprt->connect_cookie;
if (ep->rep_connected > 0) {
if (!xprt_test_and_set_connected(xprt))
xprt_wake_pending_tasks(xprt, 0);
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 89970b0..0aefc64 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -587,6 +587,7 @@ xprt_rdma_allocate(struct rpc_task *task, size_t size)
}
dprintk("RPC: %s: size %zd, request 0x%p\n", __func__, size, req);
out:
+ req->rl_connect_cookie = 0; /* our reserved value */
return req->rl_xdr_buf;
outfail:
@@ -690,13 +691,20 @@ xprt_rdma_send_request(struct rpc_task *task)
req->rl_reply->rr_xprt = xprt;
}
- if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req)) {
- xprt_disconnect_done(xprt);
- return -ENOTCONN; /* implies disconnect */
- }
+ /* Must suppress retransmit to maintain credits */
+ if (req->rl_connect_cookie == xprt->connect_cookie)
+ goto drop_connection;
+ req->rl_connect_cookie = xprt->connect_cookie;
+
+ if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req))
+ goto drop_connection;
rqst->rq_bytes_sent = 0;
return 0;
+
+drop_connection:
+ xprt_disconnect_done(xprt);
+ return -ENOTCONN; /* implies disconnect */
}
static void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 05b7898..2db2344 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -181,6 +181,7 @@ struct rpcrdma_req {
size_t rl_size; /* actual length of buffer */
unsigned int rl_niovs; /* 0, 2 or 4 */
unsigned int rl_nchunks; /* non-zero if chunks */
+ unsigned int rl_connect_cookie; /* retry detection */
struct rpcrdma_buffer *rl_buffer; /* home base for this structure */
struct rpcrdma_rep *rl_reply;/* holder for reply buffer */
struct rpcrdma_mr_seg rl_segments[RPCRDMA_MAX_SEGS];/* chunk segments */
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 07/15] RPC/RDMA: maintain the RPC task bytes-sent statistic.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (5 preceding siblings ...)
2008-10-08 15:47 ` [PATCH 06/15] RPC/RDMA: suppress retransmit on RPC/RDMA clients Tom Talpey
@ 2008-10-08 15:48 ` Tom Talpey
2008-10-08 15:48 ` [PATCH 08/15] RPC/RDMA: avoid an oops due to disconnect racing with async upcalls Tom Talpey
` (7 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:48 UTC (permalink / raw)
To: linux-nfs
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/transport.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 0aefc64..ec6d1e7 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -699,6 +699,7 @@ xprt_rdma_send_request(struct rpc_task *task)
if (rpcrdma_ep_post(&r_xprt->rx_ia, &r_xprt->rx_ep, req))
goto drop_connection;
+ task->tk_bytes_sent += rqst->rq_snd_buf.len;
rqst->rq_bytes_sent = 0;
return 0;
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 08/15] RPC/RDMA: avoid an oops due to disconnect racing with async upcalls.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (6 preceding siblings ...)
2008-10-08 15:48 ` [PATCH 07/15] RPC/RDMA: maintain the RPC task bytes-sent statistic Tom Talpey
@ 2008-10-08 15:48 ` Tom Talpey
2008-10-08 15:48 ` [PATCH 09/15] RPC/RDMA: adhere to protocol for unpadded client trailing write chunks Tom Talpey
` (6 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:48 UTC (permalink / raw)
To: linux-nfs
RDMA disconnects yield an upcall from the RDMA connection manager,
which can race with rpc transport close, e.g. on ^C of a mount.
Ensure any rdma cm_id and qp are fully destroyed before continuing.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/verbs.c | 20 +++++++++-----------
1 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e3fe905..d94f379 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -565,6 +565,7 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
return 0;
out2:
rdma_destroy_id(ia->ri_id);
+ ia->ri_id = NULL;
out1:
return rc;
}
@@ -585,15 +586,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
dprintk("RPC: %s: ib_dereg_mr returned %i\n",
__func__, rc);
}
- if (ia->ri_id != NULL && !IS_ERR(ia->ri_id) && ia->ri_id->qp)
- rdma_destroy_qp(ia->ri_id);
+ if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
+ if (ia->ri_id->qp)
+ rdma_destroy_qp(ia->ri_id);
+ rdma_destroy_id(ia->ri_id);
+ ia->ri_id = NULL;
+ }
if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
rc = ib_dealloc_pd(ia->ri_pd);
dprintk("RPC: %s: ib_dealloc_pd returned %i\n",
__func__, rc);
}
- if (ia->ri_id != NULL && !IS_ERR(ia->ri_id))
- rdma_destroy_id(ia->ri_id);
}
/*
@@ -751,21 +754,16 @@ rpcrdma_ep_destroy(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
if (rc)
dprintk("RPC: %s: rpcrdma_ep_disconnect"
" returned %i\n", __func__, rc);
+ rdma_destroy_qp(ia->ri_id);
+ ia->ri_id->qp = NULL;
}
- ep->rep_func = NULL;
-
/* padding - could be done in rpcrdma_buffer_destroy... */
if (ep->rep_pad_mr) {
rpcrdma_deregister_internal(ia, ep->rep_pad_mr, &ep->rep_pad);
ep->rep_pad_mr = NULL;
}
- if (ia->ri_id->qp) {
- rdma_destroy_qp(ia->ri_id);
- ia->ri_id->qp = NULL;
- }
-
rpcrdma_clean_cq(ep->rep_cq);
rc = ib_destroy_cq(ep->rep_cq);
if (rc)
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 09/15] RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (7 preceding siblings ...)
2008-10-08 15:48 ` [PATCH 08/15] RPC/RDMA: avoid an oops due to disconnect racing with async upcalls Tom Talpey
@ 2008-10-08 15:48 ` Tom Talpey
[not found] ` <20081008154825.1336.79549.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:48 ` [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails Tom Talpey
` (5 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:48 UTC (permalink / raw)
To: linux-nfs
The RPC/RDMA protocol allows clients and servers to avoid RDMA
operations for data which is purely the result of XDR padding.
On the client, automatically insert the necessary padding for
such server replies, and optionally don't marshal such chunks.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 22 ++++++++++++++++++++--
net/sunrpc/xprtrdma/transport.c | 9 +++++++++
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 721dae7..c4b8011 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -118,6 +118,11 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
}
if (xdrbuf->tail[0].iov_len) {
+ /* the rpcrdma protocol allows us to omit any trailing
+ * xdr pad bytes, saving the server an RDMA operation. */
+ extern int xprt_rdma_pad_optimize; /* 0 == old server compat */
+ if (xdrbuf->tail[0].iov_len < 4 && xprt_rdma_pad_optimize)
+ return n;
if (n == nsegs)
return 0;
seg[n].mr_page = NULL;
@@ -594,7 +599,7 @@ rpcrdma_count_chunks(struct rpcrdma_rep *rep, unsigned int max, int wrchunk, __b
* Scatter inline received data back into provided iov's.
*/
static void
-rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len)
+rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len, int pad)
{
int i, npages, curlen, olen;
char *destp;
@@ -660,6 +665,13 @@ rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len)
} else
rqst->rq_rcv_buf.tail[0].iov_len = 0;
+ if (pad) {
+ /* implicit padding on terminal chunk */
+ unsigned char *p = rqst->rq_rcv_buf.tail[0].iov_base;
+ while (pad--)
+ p[rqst->rq_rcv_buf.tail[0].iov_len++] = 0;
+ }
+
if (copy_len)
dprintk("RPC: %s: %d bytes in"
" %d extra segments (%d lost)\n",
@@ -794,14 +806,20 @@ repost:
((unsigned char *)iptr - (unsigned char *)headerp);
status = rep->rr_len + rdmalen;
r_xprt->rx_stats.total_rdma_reply += rdmalen;
+ /* special case - last chunk may omit padding */
+ if (rdmalen &= 3) {
+ rdmalen = 4 - rdmalen;
+ status += rdmalen;
+ }
} else {
/* else ordinary inline */
+ rdmalen = 0;
iptr = (__be32 *)((unsigned char *)headerp + 28);
rep->rr_len -= 28; /*sizeof *headerp;*/
status = rep->rr_len;
}
/* Fix up the rpc results for upper layer */
- rpcrdma_inline_fixup(rqst, (char *)iptr, rep->rr_len);
+ rpcrdma_inline_fixup(rqst, (char *)iptr, rep->rr_len, rdmalen);
break;
case __constant_htonl(RDMA_NOMSG):
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index ec6d1e7..c7d2380 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -71,6 +71,7 @@ static unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
static unsigned int xprt_rdma_inline_write_padding;
static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
+ int xprt_rdma_pad_optimize = 0;
#ifdef RPC_DEBUG
@@ -136,6 +137,14 @@ static ctl_table xr_tunables_table[] = {
.extra2 = &max_memreg,
},
{
+ .ctl_name = CTL_UNNUMBERED,
+ .procname = "rdma_pad_optimize",
+ .data = &xprt_rdma_pad_optimize,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+ {
.ctl_name = 0,
},
};
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (8 preceding siblings ...)
2008-10-08 15:48 ` [PATCH 09/15] RPC/RDMA: adhere to protocol for unpadded client trailing write chunks Tom Talpey
@ 2008-10-08 15:48 ` Tom Talpey
[not found] ` <20081008154835.1336.85484.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:48 ` [PATCH 11/15] RPC/RDMA: fix connect/reconnect resource leak Tom Talpey
` (4 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:48 UTC (permalink / raw)
To: linux-nfs
The mount system call path does not expect such errors as ECONNREFUSED
to be returned from failed transport connection attempts, otherwise it
prints simply "internal error". Translate all such errors to ENOTCONN
from RPC/RDMA to match sockets behavior.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index c4b8011..11ea8da 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -700,7 +700,7 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
xprt_wake_pending_tasks(xprt, 0);
} else {
if (xprt_test_and_clear_connected(xprt))
- xprt_wake_pending_tasks(xprt, ep->rep_connected);
+ xprt_wake_pending_tasks(xprt, -ENOTCONN);
}
spin_unlock_bh(&xprt->transport_lock);
}
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 11/15] RPC/RDMA: fix connect/reconnect resource leak.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (9 preceding siblings ...)
2008-10-08 15:48 ` [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails Tom Talpey
@ 2008-10-08 15:48 ` Tom Talpey
2008-10-08 15:48 ` [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server Tom Talpey
` (3 subsequent siblings)
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:48 UTC (permalink / raw)
To: linux-nfs
The RPC/RDMA code can leak RDMA connection manager endpoints in
certain error cases on connect. Don't signal unwanted events,
and be certain to destroy any allocated qp.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/verbs.c | 9 ++++-----
1 files changed, 4 insertions(+), 5 deletions(-)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index d94f379..a63d0c0 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -338,10 +338,8 @@ connected:
wake_up_all(&ep->rep_connect_wait);
break;
default:
- ia->ri_async_rc = -EINVAL;
- dprintk("RPC: %s: unexpected CM event %X\n",
+ dprintk("RPC: %s: unexpected CM event %d\n",
__func__, event->event);
- complete(&ia->ri_done);
break;
}
@@ -355,6 +353,8 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
struct rdma_cm_id *id;
int rc;
+ init_completion(&ia->ri_done);
+
id = rdma_create_id(rpcrdma_conn_upcall, xprt, RDMA_PS_TCP);
if (IS_ERR(id)) {
rc = PTR_ERR(id);
@@ -427,8 +427,6 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
struct ib_device_attr devattr;
struct rpcrdma_ia *ia = &xprt->rx_ia;
- init_completion(&ia->ri_done);
-
ia->ri_id = rpcrdma_create_id(xprt, ia, addr);
if (IS_ERR(ia->ri_id)) {
rc = PTR_ERR(ia->ri_id);
@@ -815,6 +813,7 @@ retry:
goto out;
}
/* END TEMP */
+ rdma_destroy_qp(ia->ri_id);
rdma_destroy_id(ia->ri_id);
ia->ri_id = id;
}
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (10 preceding siblings ...)
2008-10-08 15:48 ` [PATCH 11/15] RPC/RDMA: fix connect/reconnect resource leak Tom Talpey
@ 2008-10-08 15:48 ` Tom Talpey
[not found] ` <20081008154856.1336.18339.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:49 ` [PATCH 13/15] RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls Tom Talpey
` (2 subsequent siblings)
14 siblings, 1 reply; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:48 UTC (permalink / raw)
To: linux-nfs
The RPC/RDMA code always performed a reconnect-with-backoff, even
when re-establishing a connection to a server after the RPC layer
closed it due to being idle.
---
net/sunrpc/xprtrdma/transport.c | 5 +++--
net/sunrpc/xprtrdma/verbs.c | 2 +-
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index c7d2380..278a544 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -486,8 +486,9 @@ xprt_rdma_connect(struct rpc_task *task)
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
if (!xprt_test_and_set_connecting(xprt)) {
- if (r_xprt->rx_ep.rep_connected != 0) {
- /* Reconnect */
+ if (r_xprt->rx_ep.rep_connected &&
+ r_xprt->rx_ep.rep_connected != -EPIPE) {
+ /* Reconnect with backoff */
schedule_delayed_work(&r_xprt->rdma_connect,
xprt->reestablish_timeout);
} else {
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index a63d0c0..9ef7e0d 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -317,7 +317,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
connstate = -ECONNREFUSED;
goto connected;
case RDMA_CM_EVENT_DISCONNECTED:
- connstate = -ECONNABORTED;
+ connstate = -EPIPE;
goto connected;
case RDMA_CM_EVENT_DEVICE_REMOVAL:
connstate = -ENODEV;
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 13/15] RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (11 preceding siblings ...)
2008-10-08 15:48 ` [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server Tom Talpey
@ 2008-10-08 15:49 ` Tom Talpey
2008-10-08 15:49 ` [PATCH 14/15] RPC/RDMA: reformat a debug printk to keep lines together Tom Talpey
2008-10-08 15:49 ` [PATCH 15/15] RPC/RDMA: optionally emit useful transport info upon connect/disconnect Tom Talpey
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:49 UTC (permalink / raw)
To: linux-nfs
Add defensive timeouts to wait_for_completion() calls in RDMA
address resolution, and make them interruptible. Fix the timeout
units to milliseconds (formerly jiffies) and move to private header.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/verbs.c | 11 +++++++----
net/sunrpc/xprtrdma/xprt_rdma.h | 3 +++
2 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
index 55a5d92..54a379c 100644
--- a/include/linux/sunrpc/xprtrdma.h
+++ b/include/linux/sunrpc/xprtrdma.h
@@ -66,9 +66,6 @@
#define RPCRDMA_INLINE_PAD_THRESH (512)/* payload threshold to pad (bytes) */
-#define RDMA_RESOLVE_TIMEOUT (5*HZ) /* TBD 5 seconds */
-#define RDMA_CONNECT_RETRY_MAX (2) /* retries if no listener backlog */
-
/* memory registration strategies */
#define RPCRDMA_PERSISTENT_REGISTRATION (1)
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 9ef7e0d..4076773 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -284,6 +284,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
switch (event->event) {
case RDMA_CM_EVENT_ADDR_RESOLVED:
case RDMA_CM_EVENT_ROUTE_RESOLVED:
+ ia->ri_async_rc = 0;
complete(&ia->ri_done);
break;
case RDMA_CM_EVENT_ADDR_ERROR:
@@ -363,26 +364,28 @@ rpcrdma_create_id(struct rpcrdma_xprt *xprt,
return id;
}
- ia->ri_async_rc = 0;
+ ia->ri_async_rc = -ETIMEDOUT;
rc = rdma_resolve_addr(id, NULL, addr, RDMA_RESOLVE_TIMEOUT);
if (rc) {
dprintk("RPC: %s: rdma_resolve_addr() failed %i\n",
__func__, rc);
goto out;
}
- wait_for_completion(&ia->ri_done);
+ wait_for_completion_interruptible_timeout(&ia->ri_done,
+ msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
rc = ia->ri_async_rc;
if (rc)
goto out;
- ia->ri_async_rc = 0;
+ ia->ri_async_rc = -ETIMEDOUT;
rc = rdma_resolve_route(id, RDMA_RESOLVE_TIMEOUT);
if (rc) {
dprintk("RPC: %s: rdma_resolve_route() failed %i\n",
__func__, rc);
goto out;
}
- wait_for_completion(&ia->ri_done);
+ wait_for_completion_interruptible_timeout(&ia->ri_done,
+ msecs_to_jiffies(RDMA_RESOLVE_TIMEOUT) + 1);
rc = ia->ri_async_rc;
if (rc)
goto out;
diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
index 2db2344..99bfbf3 100644
--- a/net/sunrpc/xprtrdma/xprt_rdma.h
+++ b/net/sunrpc/xprtrdma/xprt_rdma.h
@@ -51,6 +51,9 @@
#include <linux/sunrpc/rpc_rdma.h> /* RPC/RDMA protocol */
#include <linux/sunrpc/xprtrdma.h> /* xprt parameters */
+#define RDMA_RESOLVE_TIMEOUT (5000) /* 5 seconds */
+#define RDMA_CONNECT_RETRY_MAX (2) /* retries if no listener backlog */
+
/*
* Interface Adapter -- one per transport instance
*/
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 14/15] RPC/RDMA: reformat a debug printk to keep lines together.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (12 preceding siblings ...)
2008-10-08 15:49 ` [PATCH 13/15] RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls Tom Talpey
@ 2008-10-08 15:49 ` Tom Talpey
2008-10-08 15:49 ` [PATCH 15/15] RPC/RDMA: optionally emit useful transport info upon connect/disconnect Tom Talpey
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:49 UTC (permalink / raw)
To: linux-nfs
The send marshaling code split a particular dprintk across two
lines, which makes it hard to extract from logfiles.
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/rpc_rdma.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index 11ea8da..0dda063 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -513,8 +513,8 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
if (hdrlen == 0)
return -1;
- dprintk("RPC: %s: %s: hdrlen %zd rpclen %zd padlen %zd\n"
- " headerp 0x%p base 0x%p lkey 0x%x\n",
+ dprintk("RPC: %s: %s: hdrlen %zd rpclen %zd padlen %zd"
+ " headerp 0x%p base 0x%p lkey 0x%x\n",
__func__, transfertypes[wtype], hdrlen, rpclen, padlen,
headerp, base, req->rl_iov.lkey);
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 15/15] RPC/RDMA: optionally emit useful transport info upon connect/disconnect.
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
` (13 preceding siblings ...)
2008-10-08 15:49 ` [PATCH 14/15] RPC/RDMA: reformat a debug printk to keep lines together Tom Talpey
@ 2008-10-08 15:49 ` Tom Talpey
14 siblings, 0 replies; 36+ messages in thread
From: Tom Talpey @ 2008-10-08 15:49 UTC (permalink / raw)
To: linux-nfs
Signed-off-by: Tom Talpey <talpey@netapp.com>
---
net/sunrpc/xprtrdma/transport.c | 2 +-
net/sunrpc/xprtrdma/verbs.c | 21 +++++++++++++++++++++
2 files changed, 22 insertions(+), 1 deletions(-)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 278a544..b40d0b3 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -785,7 +785,7 @@ static void __exit xprt_rdma_cleanup(void)
{
int rc;
- dprintk("RPCRDMA Module Removed, deregister RPC RDMA transport\n");
+ dprintk(KERN_INFO "RPCRDMA Module Removed, deregister RPC RDMA transport\n");
#ifdef RPC_DEBUG
if (sunrpc_table_header) {
unregister_sysctl_table(sunrpc_table_header);
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4076773..f9841b6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -344,6 +344,27 @@ connected:
break;
}
+#ifdef RPC_DEBUG
+ if (connstate == 1) {
+ int ird = attr.max_dest_rd_atomic;
+ int tird = ep->rep_remote_cma.responder_resources;
+ printk(KERN_INFO "rpcrdma: connection to %u.%u.%u.%u:%u "
+ "on %s, memreg %d slots %d ird %d%s\n",
+ NIPQUAD(addr->sin_addr.s_addr),
+ ntohs(addr->sin_port),
+ ia->ri_id->device->name,
+ ia->ri_memreg_strategy,
+ xprt->rx_buf.rb_max_requests,
+ ird, ird < 4 && ird < tird / 2 ? " (low!)" : "");
+ } else if (connstate < 0) {
+ printk(KERN_INFO "rpcrdma: connection to %u.%u.%u.%u:%u "
+ "closed (%d)\n",
+ NIPQUAD(addr->sin_addr.s_addr),
+ ntohs(addr->sin_port),
+ connstate);
+ }
+#endif
+
return 0;
}
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 03/15] RPC/RDMA: check selected memory registration mode at runtime.
[not found] ` <20081008154723.1336.57976.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
@ 2008-10-08 17:22 ` Trond Myklebust
2008-10-08 17:29 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:22 UTC (permalink / raw)
To: Tom Talpey; +Cc: linux-nfs
On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
> At transport creation, check for, and use, any local dma lkey.
> Then, check that the selected memory registration mode is in fact
> supported by the RDMA adapter selected for the mount. Fall back
> to best alternative if not.
>
> Signed-off-by: Tom Talpey <talpey@netapp.com>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
I'm confused... Who is signing off on what? AFAICS, Tom Talpey is the
author and is the one sending this patch series. Where does Tom Tucker
come into the picture?
> ---
>
> net/sunrpc/xprtrdma/verbs.c | 95 ++++++++++++++++++++++++++++++++++++-------
> 1 files changed, 80 insertions(+), 15 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index d04208a..0f3b431 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -423,7 +423,8 @@ rpcrdma_clean_cq(struct ib_cq *cq)
> int
> rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
> {
> - int rc;
> + int rc, mem_priv;
> + struct ib_device_attr devattr;
> struct rpcrdma_ia *ia = &xprt->rx_ia;
>
> init_completion(&ia->ri_done);
> @@ -443,6 +444,53 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
> }
>
> /*
> + * Query the device to determine if the requested memory
> + * registration strategy is supported. If it isn't, set the
> + * strategy to a globally supported model.
> + */
> + rc = ib_query_device(ia->ri_id->device, &devattr);
> + if (rc) {
> + dprintk("RPC: %s: ib_query_device failed %d\n",
> + __func__, rc);
> + goto out2;
> + }
> +
> + if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
> + ia->ri_have_dma_lkey = 1;
> + ia->ri_dma_lkey = ia->ri_id->device->local_dma_lkey;
> + }
> +
> + switch (memreg) {
> + case RPCRDMA_MEMWINDOWS:
> + case RPCRDMA_MEMWINDOWS_ASYNC:
> + if (!(devattr.device_cap_flags & IB_DEVICE_MEM_WINDOW)) {
> + dprintk("RPC: %s: MEMWINDOWS registration "
> + "specified but not supported by adapter, "
> + "using slower RPCRDMA_REGISTER\n",
> + __func__);
> + memreg = RPCRDMA_REGISTER;
> + }
> + break;
> + case RPCRDMA_MTHCAFMR:
> + if (!ia->ri_id->device->alloc_fmr) {
> +#if RPCRDMA_PERSISTENT_REGISTRATION
> + dprintk("RPC: %s: MTHCAFMR registration "
> + "specified but not supported by adapter, "
> + "using riskier RPCRDMA_ALLPHYSICAL\n",
> + __func__);
> + memreg = RPCRDMA_ALLPHYSICAL;
> +#else
> + dprintk("RPC: %s: MTHCAFMR registration "
> + "specified but not supported by adapter, "
> + "using slower RPCRDMA_REGISTER\n",
> + __func__);
> + memreg = RPCRDMA_REGISTER;
> +#endif
> + }
> + break;
> + }
> +
> + /*
> * Optionally obtain an underlying physical identity mapping in
> * order to do a memory window-based bind. This base registration
> * is protected from remote access - that is enabled only by binding
> @@ -450,22 +498,27 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
> * revoked after the corresponding completion similar to a storage
> * adapter.
> */
> - if (memreg > RPCRDMA_REGISTER) {
> - int mem_priv = IB_ACCESS_LOCAL_WRITE;
> - switch (memreg) {
> + switch (memreg) {
> + case RPCRDMA_BOUNCEBUFFERS:
> + case RPCRDMA_REGISTER:
> + break;
> #if RPCRDMA_PERSISTENT_REGISTRATION
> - case RPCRDMA_ALLPHYSICAL:
> - mem_priv |= IB_ACCESS_REMOTE_WRITE;
> - mem_priv |= IB_ACCESS_REMOTE_READ;
> - break;
> + case RPCRDMA_ALLPHYSICAL:
> + mem_priv = IB_ACCESS_LOCAL_WRITE |
> + IB_ACCESS_REMOTE_WRITE |
> + IB_ACCESS_REMOTE_READ;
> + goto register_setup;
> #endif
> - case RPCRDMA_MEMWINDOWS_ASYNC:
> - case RPCRDMA_MEMWINDOWS:
> - mem_priv |= IB_ACCESS_MW_BIND;
> - break;
> - default:
> + case RPCRDMA_MEMWINDOWS_ASYNC:
> + case RPCRDMA_MEMWINDOWS:
> + mem_priv = IB_ACCESS_LOCAL_WRITE |
> + IB_ACCESS_MW_BIND;
> + goto register_setup;
> + case RPCRDMA_MTHCAFMR:
> + if (ia->ri_have_dma_lkey)
> break;
> - }
> + mem_priv = IB_ACCESS_LOCAL_WRITE;
> + register_setup:
> ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
> if (IS_ERR(ia->ri_bind_mem)) {
> printk(KERN_ALERT "%s: ib_get_dma_mr for "
> @@ -475,7 +528,15 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr, int memreg)
> memreg = RPCRDMA_REGISTER;
> ia->ri_bind_mem = NULL;
> }
> + break;
> + default:
> + printk(KERN_ERR "%s: invalid memory registration mode %d\n",
> + __func__, memreg);
> + rc = -EINVAL;
> + goto out2;
> }
> + dprintk("RPC: %s: memory registration strategy is %d\n",
> + __func__, memreg);
>
> /* Else will do memory reg/dereg for each chunk */
> ia->ri_memreg_strategy = memreg;
> @@ -1248,7 +1309,11 @@ rpcrdma_register_internal(struct rpcrdma_ia *ia, void *va, int len,
> va, len, DMA_BIDIRECTIONAL);
> iov->length = len;
>
> - if (ia->ri_bind_mem != NULL) {
> + if (ia->ri_have_dma_lkey) {
> + *mrp = NULL;
> + iov->lkey = ia->ri_dma_lkey;
> + return 0;
> + } else if (ia->ri_bind_mem != NULL) {
> *mrp = NULL;
> iov->lkey = ia->ri_bind_mem->lkey;
> return 0;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum.
[not found] ` <20081008154713.1336.41538.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
@ 2008-10-08 17:23 ` Trond Myklebust
2008-10-08 17:30 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:23 UTC (permalink / raw)
To: Tom Talpey; +Cc: linux-nfs
On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
> Internal RPC/RDMA structure updates in preparation for FRMR support.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
> Signed-off-by: Tom Talpey <talpey@netapp.com>
Shouldn't there be a
From: Tom Tucker <tom@opengridcomputing.com>
at the top of this email in order to indicate that Tom Tucker is the
author?
> ---
>
> net/sunrpc/xprtrdma/xprt_rdma.h | 8 +++++++-
> 1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/sunrpc/xprtrdma.h b/include/linux/sunrpc/xprtrdma.h
> index 4de56b1..55a5d92 100644
> --- a/include/linux/sunrpc/xprtrdma.h
> +++ b/include/linux/sunrpc/xprtrdma.h
> @@ -78,6 +78,7 @@ enum rpcrdma_memreg {
> RPCRDMA_MEMWINDOWS,
> RPCRDMA_MEMWINDOWS_ASYNC,
> RPCRDMA_MTHCAFMR,
> + RPCRDMA_FRMR,
> RPCRDMA_ALLPHYSICAL,
> RPCRDMA_LAST
> };
> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h b/net/sunrpc/xprtrdma/xprt_rdma.h
> index 2427822..05b7898 100644
> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> @@ -58,6 +58,8 @@ struct rpcrdma_ia {
> struct rdma_cm_id *ri_id;
> struct ib_pd *ri_pd;
> struct ib_mr *ri_bind_mem;
> + u32 ri_dma_lkey;
> + int ri_have_dma_lkey;
> struct completion ri_done;
> int ri_async_rc;
> enum rpcrdma_memreg ri_memreg_strategy;
> @@ -156,6 +158,10 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
> union {
> struct ib_mw *mw;
> struct ib_fmr *fmr;
> + struct {
> + struct ib_fast_reg_page_list *fr_pgl;
> + struct ib_mr *fr_mr;
> + } frmr;
> } r;
> struct list_head mw_list;
> } *rl_mw;
> @@ -198,7 +204,7 @@ struct rpcrdma_buffer {
> atomic_t rb_credits; /* most recent server credits */
> unsigned long rb_cwndscale; /* cached framework rpc_cwndscale */
> int rb_max_requests;/* client max requests */
> - struct list_head rb_mws; /* optional memory windows/fmrs */
> + struct list_head rb_mws; /* optional memory windows/fmrs/frmrs */
> int rb_send_index;
> struct rpcrdma_req **rb_send_bufs;
> int rb_recv_index;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 05/15] RPC/RDMA: fix connection IRD/ORD setting
[not found] ` <20081008154744.1336.20909.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
@ 2008-10-08 17:26 ` Trond Myklebust
2008-10-08 17:32 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:26 UTC (permalink / raw)
To: Tom Talpey; +Cc: linux-nfs
On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
> From: Tom Tucker <talpey@netapp.com>
Now I'm really confused!
> This logic sets the connection parameter that configures the local device
> and informs the remote peer how many concurrent incoming RDMA_READ
> requests are supported. The original logic didn't really do what was
> intended for two reasons:
>
> - The max number supported by the device is typically smaller than
> any one factor in the calculation used, and
>
> - The field in the connection parameter structure where the value is
> stored is a u8 and always overflows for the default settings.
>
> So what really happens is the value requested for responder resources
> is the left over 8 bits from the "desired value". If the desired value
> happened to be a multiple of 256, the result was zero and it wouldn't
> connect at all.
>
> Given the above and the fact that max_requests is almost always larger
> than the max responder resources supported by the adapter, this patch
> simplifies this logic and simply requests the max supported by the device,
> subject to a reasonable limit.
>
> This bug was found by Jim Schutt at Sandia.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
> Signed-off-by: Tom Talpey <talpey@netapp.com>
> ---
>
> net/sunrpc/xprtrdma/verbs.c | 51 ++++++++++++-------------------------------
> 1 files changed, 14 insertions(+), 37 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 39a1652..e3fe905 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -705,30 +705,13 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
> ep->rep_remote_cma.private_data_len = 0;
>
> /* Client offers RDMA Read but does not initiate */
> - switch (ia->ri_memreg_strategy) {
> - case RPCRDMA_BOUNCEBUFFERS:
> + ep->rep_remote_cma.initiator_depth = 0;
> + if (ia->ri_memreg_strategy == RPCRDMA_BOUNCEBUFFERS)
> ep->rep_remote_cma.responder_resources = 0;
> - break;
> - case RPCRDMA_MTHCAFMR:
> - case RPCRDMA_REGISTER:
> - case RPCRDMA_FRMR:
> - ep->rep_remote_cma.responder_resources = cdata->max_requests *
> - (RPCRDMA_MAX_DATA_SEGS / 8);
> - break;
> - case RPCRDMA_MEMWINDOWS:
> - case RPCRDMA_MEMWINDOWS_ASYNC:
> -#if RPCRDMA_PERSISTENT_REGISTRATION
> - case RPCRDMA_ALLPHYSICAL:
> -#endif
> - ep->rep_remote_cma.responder_resources = cdata->max_requests *
> - (RPCRDMA_MAX_DATA_SEGS / 2);
> - break;
> - default:
> - break;
> - }
> - if (ep->rep_remote_cma.responder_resources > devattr.max_qp_rd_atom)
> + else if (devattr.max_qp_rd_atom > 32) /* arbitrary but <= 255 */
> + ep->rep_remote_cma.responder_resources = 32;
> + else
> ep->rep_remote_cma.responder_resources = devattr.max_qp_rd_atom;
> - ep->rep_remote_cma.initiator_depth = 0;
>
> ep->rep_remote_cma.retry_count = 7;
> ep->rep_remote_cma.flow_control = 0;
> @@ -858,14 +841,6 @@ if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
> }
> }
>
> - /* Theoretically a client initiator_depth > 0 is not needed,
> - * but many peers fail to complete the connection unless they
> - * == responder_resources! */
> - if (ep->rep_remote_cma.initiator_depth !=
> - ep->rep_remote_cma.responder_resources)
> - ep->rep_remote_cma.initiator_depth =
> - ep->rep_remote_cma.responder_resources;
> -
> ep->rep_connected = 0;
>
> rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);
> @@ -894,14 +869,16 @@ if (strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
> if (ep->rep_connected <= 0) {
> /* Sometimes, the only way to reliably connect to remote
> * CMs is to use same nonzero values for ORD and IRD. */
> - ep->rep_remote_cma.initiator_depth =
> - ep->rep_remote_cma.responder_resources;
> - if (ep->rep_remote_cma.initiator_depth == 0)
> - ++ep->rep_remote_cma.initiator_depth;
> - if (ep->rep_remote_cma.responder_resources == 0)
> - ++ep->rep_remote_cma.responder_resources;
> - if (retry_count++ == 0)
> + if (retry_count++ <= RDMA_CONNECT_RETRY_MAX + 1 &&
> + (ep->rep_remote_cma.responder_resources == 0 ||
> + ep->rep_remote_cma.initiator_depth !=
> + ep->rep_remote_cma.responder_resources)) {
> + if (ep->rep_remote_cma.responder_resources == 0)
> + ep->rep_remote_cma.responder_resources = 1;
> + ep->rep_remote_cma.initiator_depth =
> + ep->rep_remote_cma.responder_resources;
> goto retry;
> + }
> rc = ep->rep_connected;
> } else {
> dprintk("RPC: %s: connected\n", __func__);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 09/15] RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.
[not found] ` <20081008154825.1336.79549.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
@ 2008-10-08 17:29 ` Trond Myklebust
2008-10-08 17:33 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:29 UTC (permalink / raw)
To: Tom Talpey; +Cc: linux-nfs
On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
> The RPC/RDMA protocol allows clients and servers to avoid RDMA
> operations for data which is purely the result of XDR padding.
> On the client, automatically insert the necessary padding for
> such server replies, and optionally don't marshal such chunks.
>
> Signed-off-by: Tom Talpey <talpey@netapp.com>
> ---
>
> net/sunrpc/xprtrdma/rpc_rdma.c | 22 ++++++++++++++++++++--
> net/sunrpc/xprtrdma/transport.c | 9 +++++++++
> 2 files changed, 29 insertions(+), 2 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index 721dae7..c4b8011 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -118,6 +118,11 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
> }
>
> if (xdrbuf->tail[0].iov_len) {
> + /* the rpcrdma protocol allows us to omit any trailing
> + * xdr pad bytes, saving the server an RDMA operation. */
> + extern int xprt_rdma_pad_optimize; /* 0 == old server compat */
^^^^^^^^^^^^^^^^^^^
Globals should really be declared in a header file. 'sparse' will
complain if you don't...
> + if (xdrbuf->tail[0].iov_len < 4 && xprt_rdma_pad_optimize)
> + return n;
> if (n == nsegs)
> return 0;
> seg[n].mr_page = NULL;
> @@ -594,7 +599,7 @@ rpcrdma_count_chunks(struct rpcrdma_rep *rep, unsigned int max, int wrchunk, __b
> * Scatter inline received data back into provided iov's.
> */
> static void
> -rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len)
> +rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len, int pad)
> {
> int i, npages, curlen, olen;
> char *destp;
> @@ -660,6 +665,13 @@ rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len)
> } else
> rqst->rq_rcv_buf.tail[0].iov_len = 0;
>
> + if (pad) {
> + /* implicit padding on terminal chunk */
> + unsigned char *p = rqst->rq_rcv_buf.tail[0].iov_base;
> + while (pad--)
> + p[rqst->rq_rcv_buf.tail[0].iov_len++] = 0;
> + }
> +
> if (copy_len)
> dprintk("RPC: %s: %d bytes in"
> " %d extra segments (%d lost)\n",
> @@ -794,14 +806,20 @@ repost:
> ((unsigned char *)iptr - (unsigned char *)headerp);
> status = rep->rr_len + rdmalen;
> r_xprt->rx_stats.total_rdma_reply += rdmalen;
> + /* special case - last chunk may omit padding */
> + if (rdmalen &= 3) {
> + rdmalen = 4 - rdmalen;
> + status += rdmalen;
> + }
> } else {
> /* else ordinary inline */
> + rdmalen = 0;
> iptr = (__be32 *)((unsigned char *)headerp + 28);
> rep->rr_len -= 28; /*sizeof *headerp;*/
> status = rep->rr_len;
> }
> /* Fix up the rpc results for upper layer */
> - rpcrdma_inline_fixup(rqst, (char *)iptr, rep->rr_len);
> + rpcrdma_inline_fixup(rqst, (char *)iptr, rep->rr_len, rdmalen);
> break;
>
> case __constant_htonl(RDMA_NOMSG):
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index ec6d1e7..c7d2380 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -71,6 +71,7 @@ static unsigned int xprt_rdma_max_inline_read = RPCRDMA_DEF_INLINE;
> static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
> static unsigned int xprt_rdma_inline_write_padding;
> static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
> + int xprt_rdma_pad_optimize = 0;
>
> #ifdef RPC_DEBUG
>
> @@ -136,6 +137,14 @@ static ctl_table xr_tunables_table[] = {
> .extra2 = &max_memreg,
> },
> {
> + .ctl_name = CTL_UNNUMBERED,
> + .procname = "rdma_pad_optimize",
> + .data = &xprt_rdma_pad_optimize,
> + .maxlen = sizeof(unsigned int),
> + .mode = 0644,
> + .proc_handler = &proc_dointvec,
> + },
> + {
> .ctl_name = 0,
> },
> };
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 03/15] RPC/RDMA: check selected memory registration mode at runtime.
2008-10-08 17:22 ` Trond Myklebust
@ 2008-10-08 17:29 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRD8yfog00000071-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
0 siblings, 1 reply; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 17:29 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, linux-nfs
At 01:22 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
>> At transport creation, check for, and use, any local dma lkey.
>> Then, check that the selected memory registration mode is in fact
>> supported by the RDMA adapter selected for the mount. Fall back
>> to best alternative if not.
>>
>> Signed-off-by: Tom Talpey <talpey@netapp.com>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>
>I'm confused... Who is signing off on what? AFAICS, Tom Talpey is the
>author and is the one sending this patch series. Where does Tom Tucker
>come into the picture?
Tom Tucker wrote the initial version of some of the FRMR series, he
had emailed them to linux-nfs a while back. I left his sign-off in place
on those, as a co-author, and then signed for my own contributions.
Tom.
>
>> ---
>>
>> net/sunrpc/xprtrdma/verbs.c | 95
>++++++++++++++++++++++++++++++++++++-------
>> 1 files changed, 80 insertions(+), 15 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index d04208a..0f3b431 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -423,7 +423,8 @@ rpcrdma_clean_cq(struct ib_cq *cq)
>> int
>> rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct sockaddr *addr,
>int memreg)
>> {
>> - int rc;
>> + int rc, mem_priv;
>> + struct ib_device_attr devattr;
>> struct rpcrdma_ia *ia = &xprt->rx_ia;
>>
>> init_completion(&ia->ri_done);
>> @@ -443,6 +444,53 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt,
>struct sockaddr *addr, int memreg)
>> }
>>
>> /*
>> + * Query the device to determine if the requested memory
>> + * registration strategy is supported. If it isn't, set the
>> + * strategy to a globally supported model.
>> + */
>> + rc = ib_query_device(ia->ri_id->device, &devattr);
>> + if (rc) {
>> + dprintk("RPC: %s: ib_query_device failed %d\n",
>> + __func__, rc);
>> + goto out2;
>> + }
>> +
>> + if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY) {
>> + ia->ri_have_dma_lkey = 1;
>> + ia->ri_dma_lkey = ia->ri_id->device->local_dma_lkey;
>> + }
>> +
>> + switch (memreg) {
>> + case RPCRDMA_MEMWINDOWS:
>> + case RPCRDMA_MEMWINDOWS_ASYNC:
>> + if (!(devattr.device_cap_flags & IB_DEVICE_MEM_WINDOW)) {
>> + dprintk("RPC: %s: MEMWINDOWS registration "
>> + "specified but not supported by adapter, "
>> + "using slower RPCRDMA_REGISTER\n",
>> + __func__);
>> + memreg = RPCRDMA_REGISTER;
>> + }
>> + break;
>> + case RPCRDMA_MTHCAFMR:
>> + if (!ia->ri_id->device->alloc_fmr) {
>> +#if RPCRDMA_PERSISTENT_REGISTRATION
>> + dprintk("RPC: %s: MTHCAFMR registration "
>> + "specified but not supported by adapter, "
>> + "using riskier RPCRDMA_ALLPHYSICAL\n",
>> + __func__);
>> + memreg = RPCRDMA_ALLPHYSICAL;
>> +#else
>> + dprintk("RPC: %s: MTHCAFMR registration "
>> + "specified but not supported by adapter, "
>> + "using slower RPCRDMA_REGISTER\n",
>> + __func__);
>> + memreg = RPCRDMA_REGISTER;
>> +#endif
>> + }
>> + break;
>> + }
>> +
>> + /*
>> * Optionally obtain an underlying physical identity mapping in
>> * order to do a memory window-based bind. This base registration
>> * is protected from remote access - that is enabled only by binding
>> @@ -450,22 +498,27 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt,
>struct sockaddr *addr, int memreg)
>> * revoked after the corresponding completion similar to a storage
>> * adapter.
>> */
>> - if (memreg > RPCRDMA_REGISTER) {
>> - int mem_priv = IB_ACCESS_LOCAL_WRITE;
>> - switch (memreg) {
>> + switch (memreg) {
>> + case RPCRDMA_BOUNCEBUFFERS:
>> + case RPCRDMA_REGISTER:
>> + break;
>> #if RPCRDMA_PERSISTENT_REGISTRATION
>> - case RPCRDMA_ALLPHYSICAL:
>> - mem_priv |= IB_ACCESS_REMOTE_WRITE;
>> - mem_priv |= IB_ACCESS_REMOTE_READ;
>> - break;
>> + case RPCRDMA_ALLPHYSICAL:
>> + mem_priv = IB_ACCESS_LOCAL_WRITE |
>> + IB_ACCESS_REMOTE_WRITE |
>> + IB_ACCESS_REMOTE_READ;
>> + goto register_setup;
>> #endif
>> - case RPCRDMA_MEMWINDOWS_ASYNC:
>> - case RPCRDMA_MEMWINDOWS:
>> - mem_priv |= IB_ACCESS_MW_BIND;
>> - break;
>> - default:
>> + case RPCRDMA_MEMWINDOWS_ASYNC:
>> + case RPCRDMA_MEMWINDOWS:
>> + mem_priv = IB_ACCESS_LOCAL_WRITE |
>> + IB_ACCESS_MW_BIND;
>> + goto register_setup;
>> + case RPCRDMA_MTHCAFMR:
>> + if (ia->ri_have_dma_lkey)
>> break;
>> - }
>> + mem_priv = IB_ACCESS_LOCAL_WRITE;
>> + register_setup:
>> ia->ri_bind_mem = ib_get_dma_mr(ia->ri_pd, mem_priv);
>> if (IS_ERR(ia->ri_bind_mem)) {
>> printk(KERN_ALERT "%s: ib_get_dma_mr for "
>> @@ -475,7 +528,15 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt,
>struct sockaddr *addr, int memreg)
>> memreg = RPCRDMA_REGISTER;
>> ia->ri_bind_mem = NULL;
>> }
>> + break;
>> + default:
>> + printk(KERN_ERR "%s: invalid memory registration mode %d\n",
>> + __func__, memreg);
>> + rc = -EINVAL;
>> + goto out2;
>> }
>> + dprintk("RPC: %s: memory registration strategy is %d\n",
>> + __func__, memreg);
>>
>> /* Else will do memory reg/dereg for each chunk */
>> ia->ri_memreg_strategy = memreg;
>> @@ -1248,7 +1309,11 @@ rpcrdma_register_internal(struct rpcrdma_ia
>*ia, void *va, int len,
>> va, len, DMA_BIDIRECTIONAL);
>> iov->length = len;
>>
>> - if (ia->ri_bind_mem != NULL) {
>> + if (ia->ri_have_dma_lkey) {
>> + *mrp = NULL;
>> + iov->lkey = ia->ri_dma_lkey;
>> + return 0;
>> + } else if (ia->ri_bind_mem != NULL) {
>> *mrp = NULL;
>> iov->lkey = ia->ri_bind_mem->lkey;
>> return 0;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum.
2008-10-08 17:23 ` Trond Myklebust
@ 2008-10-08 17:30 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDmcarc00000072-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
0 siblings, 1 reply; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 17:30 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, linux-nfs
At 01:23 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
>> Internal RPC/RDMA structure updates in preparation for FRMR support.
>>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>> Signed-off-by: Tom Talpey <talpey@netapp.com>
>
>Shouldn't there be a
>
>From: Tom Tucker <tom@opengridcomputing.com>
>
>at the top of this email in order to indicate that Tom Tucker is the
>author?
Co-author. Should it have two From lines?
Tom.
>
>> ---
>>
>> net/sunrpc/xprtrdma/xprt_rdma.h | 8 +++++++-
>> 1 files changed, 7 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/xprtrdma.h
>b/include/linux/sunrpc/xprtrdma.h
>> index 4de56b1..55a5d92 100644
>> --- a/include/linux/sunrpc/xprtrdma.h
>> +++ b/include/linux/sunrpc/xprtrdma.h
>> @@ -78,6 +78,7 @@ enum rpcrdma_memreg {
>> RPCRDMA_MEMWINDOWS,
>> RPCRDMA_MEMWINDOWS_ASYNC,
>> RPCRDMA_MTHCAFMR,
>> + RPCRDMA_FRMR,
>> RPCRDMA_ALLPHYSICAL,
>> RPCRDMA_LAST
>> };
>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>b/net/sunrpc/xprtrdma/xprt_rdma.h
>> index 2427822..05b7898 100644
>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>> @@ -58,6 +58,8 @@ struct rpcrdma_ia {
>> struct rdma_cm_id *ri_id;
>> struct ib_pd *ri_pd;
>> struct ib_mr *ri_bind_mem;
>> + u32 ri_dma_lkey;
>> + int ri_have_dma_lkey;
>> struct completion ri_done;
>> int ri_async_rc;
>> enum rpcrdma_memreg ri_memreg_strategy;
>> @@ -156,6 +158,10 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
>> union {
>> struct ib_mw *mw;
>> struct ib_fmr *fmr;
>> + struct {
>> + struct ib_fast_reg_page_list *fr_pgl;
>> + struct ib_mr *fr_mr;
>> + } frmr;
>> } r;
>> struct list_head mw_list;
>> } *rl_mw;
>> @@ -198,7 +204,7 @@ struct rpcrdma_buffer {
>> atomic_t rb_credits; /* most recent server credits */
>> unsigned long rb_cwndscale; /* cached framework rpc_cwndscale */
>> int rb_max_requests;/* client max requests */
>> - struct list_head rb_mws; /* optional memory windows/fmrs */
>> + struct list_head rb_mws; /* optional memory windows/fmrs/frmrs */
>> int rb_send_index;
>> struct rpcrdma_req **rb_send_bufs;
>> int rb_recv_index;
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails.
[not found] ` <20081008154835.1336.85484.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
@ 2008-10-08 17:31 ` Trond Myklebust
2008-10-08 17:40 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:31 UTC (permalink / raw)
To: Tom Talpey; +Cc: linux-nfs
On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
> The mount system call path does not expect such errors as ECONNREFUSED
> to be returned from failed transport connection attempts, otherwise it
> prints simply "internal error". Translate all such errors to ENOTCONN
> from RPC/RDMA to match sockets behavior.
Hmm... Shouldn't we be passing the ECONNREFUSED error here, and rather
fix the downstream error paths?
> Signed-off-by: Tom Talpey <talpey@netapp.com>
> ---
>
> net/sunrpc/xprtrdma/rpc_rdma.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index c4b8011..11ea8da 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -700,7 +700,7 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
> xprt_wake_pending_tasks(xprt, 0);
> } else {
> if (xprt_test_and_clear_connected(xprt))
> - xprt_wake_pending_tasks(xprt, ep->rep_connected);
> + xprt_wake_pending_tasks(xprt, -ENOTCONN);
> }
> spin_unlock_bh(&xprt->transport_lock);
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 05/15] RPC/RDMA: fix connection IRD/ORD setting
2008-10-08 17:26 ` Trond Myklebust
@ 2008-10-08 17:32 ` Talpey, Thomas
0 siblings, 0 replies; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 17:32 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, linux-nfs
At 01:26 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
>> From: Tom Tucker <talpey@netapp.com>
>
>Now I'm really confused!
Ok, that's a bug. :-)
Will fix, after figuring out the format you prefer for joint authors.
Tom.
>
>> This logic sets the connection parameter that configures the local device
>> and informs the remote peer how many concurrent incoming RDMA_READ
>> requests are supported. The original logic didn't really do what was
>> intended for two reasons:
>>
>> - The max number supported by the device is typically smaller than
>> any one factor in the calculation used, and
>>
>> - The field in the connection parameter structure where the value is
>> stored is a u8 and always overflows for the default settings.
>>
>> So what really happens is the value requested for responder resources
>> is the left over 8 bits from the "desired value". If the desired value
>> happened to be a multiple of 256, the result was zero and it wouldn't
>> connect at all.
>>
>> Given the above and the fact that max_requests is almost always larger
>> than the max responder resources supported by the adapter, this patch
>> simplifies this logic and simply requests the max supported by the device,
>> subject to a reasonable limit.
>>
>> This bug was found by Jim Schutt at Sandia.
>>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>> Signed-off-by: Tom Talpey <talpey@netapp.com>
>> ---
>>
>> net/sunrpc/xprtrdma/verbs.c | 51
>++++++++++++-------------------------------
>> 1 files changed, 14 insertions(+), 37 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 39a1652..e3fe905 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -705,30 +705,13 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep,
>struct rpcrdma_ia *ia,
>> ep->rep_remote_cma.private_data_len = 0;
>>
>> /* Client offers RDMA Read but does not initiate */
>> - switch (ia->ri_memreg_strategy) {
>> - case RPCRDMA_BOUNCEBUFFERS:
>> + ep->rep_remote_cma.initiator_depth = 0;
>> + if (ia->ri_memreg_strategy == RPCRDMA_BOUNCEBUFFERS)
>> ep->rep_remote_cma.responder_resources = 0;
>> - break;
>> - case RPCRDMA_MTHCAFMR:
>> - case RPCRDMA_REGISTER:
>> - case RPCRDMA_FRMR:
>> - ep->rep_remote_cma.responder_resources = cdata->max_requests *
>> - (RPCRDMA_MAX_DATA_SEGS / 8);
>> - break;
>> - case RPCRDMA_MEMWINDOWS:
>> - case RPCRDMA_MEMWINDOWS_ASYNC:
>> -#if RPCRDMA_PERSISTENT_REGISTRATION
>> - case RPCRDMA_ALLPHYSICAL:
>> -#endif
>> - ep->rep_remote_cma.responder_resources = cdata->max_requests *
>> - (RPCRDMA_MAX_DATA_SEGS / 2);
>> - break;
>> - default:
>> - break;
>> - }
>> - if (ep->rep_remote_cma.responder_resources > devattr.max_qp_rd_atom)
>> + else if (devattr.max_qp_rd_atom > 32) /* arbitrary but <= 255 */
>> + ep->rep_remote_cma.responder_resources = 32;
>> + else
>> ep->rep_remote_cma.responder_resources = devattr.max_qp_rd_atom;
>> - ep->rep_remote_cma.initiator_depth = 0;
>>
>> ep->rep_remote_cma.retry_count = 7;
>> ep->rep_remote_cma.flow_control = 0;
>> @@ -858,14 +841,6 @@ if
>(strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
>> }
>> }
>>
>> - /* Theoretically a client initiator_depth > 0 is not needed,
>> - * but many peers fail to complete the connection unless they
>> - * == responder_resources! */
>> - if (ep->rep_remote_cma.initiator_depth !=
>> - ep->rep_remote_cma.responder_resources)
>> - ep->rep_remote_cma.initiator_depth =
>> - ep->rep_remote_cma.responder_resources;
>> -
>> ep->rep_connected = 0;
>>
>> rc = rdma_connect(ia->ri_id, &ep->rep_remote_cma);
>> @@ -894,14 +869,16 @@ if
>(strnicmp(ia->ri_id->device->dma_device->bus->name, "pci", 3) == 0) {
>> if (ep->rep_connected <= 0) {
>> /* Sometimes, the only way to reliably connect to remote
>> * CMs is to use same nonzero values for ORD and IRD. */
>> - ep->rep_remote_cma.initiator_depth =
>> - ep->rep_remote_cma.responder_resources;
>> - if (ep->rep_remote_cma.initiator_depth == 0)
>> - ++ep->rep_remote_cma.initiator_depth;
>> - if (ep->rep_remote_cma.responder_resources == 0)
>> - ++ep->rep_remote_cma.responder_resources;
>> - if (retry_count++ == 0)
>> + if (retry_count++ <= RDMA_CONNECT_RETRY_MAX + 1 &&
>> + (ep->rep_remote_cma.responder_resources == 0 ||
>> + ep->rep_remote_cma.initiator_depth !=
>> + ep->rep_remote_cma.responder_resources)) {
>> + if (ep->rep_remote_cma.responder_resources == 0)
>> + ep->rep_remote_cma.responder_resources = 1;
>> + ep->rep_remote_cma.initiator_depth =
>> + ep->rep_remote_cma.responder_resources;
>> goto retry;
>> + }
>> rc = ep->rep_connected;
>> } else {
>> dprintk("RPC: %s: connected\n", __func__);
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 09/15] RPC/RDMA: adhere to protocol for unpadded client trailing write chunks.
2008-10-08 17:29 ` Trond Myklebust
@ 2008-10-08 17:33 ` Talpey, Thomas
0 siblings, 0 replies; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 17:33 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, linux-nfs
At 01:29 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
>> The RPC/RDMA protocol allows clients and servers to avoid RDMA
>> operations for data which is purely the result of XDR padding.
>> On the client, automatically insert the necessary padding for
>> such server replies, and optionally don't marshal such chunks.
>>
>> Signed-off-by: Tom Talpey <talpey@netapp.com>
>> ---
>>
>> net/sunrpc/xprtrdma/rpc_rdma.c | 22 ++++++++++++++++++++--
>> net/sunrpc/xprtrdma/transport.c | 9 +++++++++
>> 2 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>> index 721dae7..c4b8011 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -118,6 +118,11 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf,
>unsigned int pos,
>> }
>>
>> if (xdrbuf->tail[0].iov_len) {
>> + /* the rpcrdma protocol allows us to omit any trailing
>> + * xdr pad bytes, saving the server an RDMA operation. */
>> + extern int xprt_rdma_pad_optimize; /* 0 == old server compat */
> ^^^^^^^^^^^^^^^^^^^
>Globals should really be declared in a header file. 'sparse' will
>complain if you don't...
Nope, sparse was silent. I did get a lot of noise from some __cold__ attributes
in global header files (at least doing make C=1), that's it.
I'll move the extern.
Tom.
>
>> + if (xdrbuf->tail[0].iov_len < 4 && xprt_rdma_pad_optimize)
>> + return n;
>> if (n == nsegs)
>> return 0;
>> seg[n].mr_page = NULL;
>> @@ -594,7 +599,7 @@ rpcrdma_count_chunks(struct rpcrdma_rep *rep,
>unsigned int max, int wrchunk, __b
>> * Scatter inline received data back into provided iov's.
>> */
>> static void
>> -rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len)
>> +rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int
>copy_len, int pad)
>> {
>> int i, npages, curlen, olen;
>> char *destp;
>> @@ -660,6 +665,13 @@ rpcrdma_inline_fixup(struct rpc_rqst *rqst,
>char *srcp, int copy_len)
>> } else
>> rqst->rq_rcv_buf.tail[0].iov_len = 0;
>>
>> + if (pad) {
>> + /* implicit padding on terminal chunk */
>> + unsigned char *p = rqst->rq_rcv_buf.tail[0].iov_base;
>> + while (pad--)
>> + p[rqst->rq_rcv_buf.tail[0].iov_len++] = 0;
>> + }
>> +
>> if (copy_len)
>> dprintk("RPC: %s: %d bytes in"
>> " %d extra segments (%d lost)\n",
>> @@ -794,14 +806,20 @@ repost:
>> ((unsigned char *)iptr - (unsigned char *)headerp);
>> status = rep->rr_len + rdmalen;
>> r_xprt->rx_stats.total_rdma_reply += rdmalen;
>> + /* special case - last chunk may omit padding */
>> + if (rdmalen &= 3) {
>> + rdmalen = 4 - rdmalen;
>> + status += rdmalen;
>> + }
>> } else {
>> /* else ordinary inline */
>> + rdmalen = 0;
>> iptr = (__be32 *)((unsigned char *)headerp + 28);
>> rep->rr_len -= 28; /*sizeof *headerp;*/
>> status = rep->rr_len;
>> }
>> /* Fix up the rpc results for upper layer */
>> - rpcrdma_inline_fixup(rqst, (char *)iptr, rep->rr_len);
>> + rpcrdma_inline_fixup(rqst, (char *)iptr, rep->rr_len, rdmalen);
>> break;
>>
>> case __constant_htonl(RDMA_NOMSG):
>> diff --git a/net/sunrpc/xprtrdma/transport.c
>b/net/sunrpc/xprtrdma/transport.c
>> index ec6d1e7..c7d2380 100644
>> --- a/net/sunrpc/xprtrdma/transport.c
>> +++ b/net/sunrpc/xprtrdma/transport.c
>> @@ -71,6 +71,7 @@ static unsigned int xprt_rdma_max_inline_read =
>RPCRDMA_DEF_INLINE;
>> static unsigned int xprt_rdma_max_inline_write = RPCRDMA_DEF_INLINE;
>> static unsigned int xprt_rdma_inline_write_padding;
>> static unsigned int xprt_rdma_memreg_strategy = RPCRDMA_FRMR;
>> + int xprt_rdma_pad_optimize = 0;
>>
>> #ifdef RPC_DEBUG
>>
>> @@ -136,6 +137,14 @@ static ctl_table xr_tunables_table[] = {
>> .extra2 = &max_memreg,
>> },
>> {
>> + .ctl_name = CTL_UNNUMBERED,
>> + .procname = "rdma_pad_optimize",
>> + .data = &xprt_rdma_pad_optimize,
>> + .maxlen = sizeof(unsigned int),
>> + .mode = 0644,
>> + .proc_handler = &proc_dointvec,
>> + },
>> + {
>> .ctl_name = 0,
>> },
>> };
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server.
[not found] ` <20081008154856.1336.18339.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
@ 2008-10-08 17:35 ` Trond Myklebust
2008-10-08 17:51 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:35 UTC (permalink / raw)
To: Tom Talpey; +Cc: linux-nfs
On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
> The RPC/RDMA code always performed a reconnect-with-backoff, even
> when re-establishing a connection to a server after the RPC layer
> closed it due to being idle.
> ---
>
> net/sunrpc/xprtrdma/transport.c | 5 +++--
> net/sunrpc/xprtrdma/verbs.c | 2 +-
> 2 files changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index c7d2380..278a544 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -486,8 +486,9 @@ xprt_rdma_connect(struct rpc_task *task)
> struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
>
> if (!xprt_test_and_set_connecting(xprt)) {
> - if (r_xprt->rx_ep.rep_connected != 0) {
> - /* Reconnect */
> + if (r_xprt->rx_ep.rep_connected &&
> + r_xprt->rx_ep.rep_connected != -EPIPE) {
> + /* Reconnect with backoff */
> schedule_delayed_work(&r_xprt->rdma_connect,
> xprt->reestablish_timeout);
> } else {
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index a63d0c0..9ef7e0d 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -317,7 +317,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
> connstate = -ECONNREFUSED;
> goto connected;
> case RDMA_CM_EVENT_DISCONNECTED:
> - connstate = -ECONNABORTED;
> + connstate = -EPIPE;
> goto connected;
> case RDMA_CM_EVENT_DEVICE_REMOVAL:
> connstate = -ENODEV;
Hmm... Why not rather do the same as the socket code: have the
disconnect handler paths that don't require exponential backoff just
reset xprt->reestablish_timeout to 0?
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 03/15] RPC/RDMA: check selected memory registration mode at runtime.
[not found] ` <RTPCLUEXC1-PRD8yfog00000071-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
@ 2008-10-08 17:40 ` Trond Myklebust
0 siblings, 0 replies; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:40 UTC (permalink / raw)
To: Talpey, Thomas; +Cc: linux-nfs
On Wed, 2008-10-08 at 13:29 -0400, Talpey, Thomas wrote:
> At 01:22 PM 10/8/2008, Trond Myklebust wrote:
> >On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
> >> At transport creation, check for, and use, any local dma lkey.
> >> Then, check that the selected memory registration mode is in fact
> >> supported by the RDMA adapter selected for the mount. Fall back
> >> to best alternative if not.
> >>
> >> Signed-off-by: Tom Talpey <talpey@netapp.com>
> >> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
> >
> >I'm confused... Who is signing off on what? AFAICS, Tom Talpey is the
> >author and is the one sending this patch series. Where does Tom Tucker
> >come into the picture?
>
> Tom Tucker wrote the initial version of some of the FRMR series, he
> had emailed them to linux-nfs a while back. I left his sign-off in place
> on those, as a co-author, and then signed for my own contributions.
Hmm... Tricky...
I'd do something along the lines of
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
[Heavily rewritten by Tom Talpey <talpey@netapp.com>]
Signed-off-by: Tom Talpey <talpey@netapp.com>
Maybe with an extra description of why you rewrote stuff in between the
square brackets.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum.
[not found] ` <RTPCLUEXC1-PRDmcarc00000072-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
@ 2008-10-08 17:40 ` Trond Myklebust
2008-10-08 17:55 ` J. Bruce Fields
1 sibling, 0 replies; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:40 UTC (permalink / raw)
To: Talpey, Thomas; +Cc: linux-nfs
On Wed, 2008-10-08 at 13:30 -0400, Talpey, Thomas wrote:
> At 01:23 PM 10/8/2008, Trond Myklebust wrote:
> >On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
> >> Internal RPC/RDMA structure updates in preparation for FRMR support.
> >>
> >> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
> >> Signed-off-by: Tom Talpey <talpey@netapp.com>
> >
> >Shouldn't there be a
> >
> >From: Tom Tucker <tom@opengridcomputing.com>
> >
> >at the top of this email in order to indicate that Tom Tucker is the
> >author?
>
> Co-author. Should it have two From lines?
No. You can't do that... See previous suggestion.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails.
2008-10-08 17:31 ` Trond Myklebust
@ 2008-10-08 17:40 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDbpH7100000075-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
0 siblings, 1 reply; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 17:40 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, linux-nfs
At 01:31 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
>> The mount system call path does not expect such errors as ECONNREFUSED
>> to be returned from failed transport connection attempts, otherwise it
>> prints simply "internal error". Translate all such errors to ENOTCONN
>> from RPC/RDMA to match sockets behavior.
>
>Hmm... Shouldn't we be passing the ECONNREFUSED error here, and rather
>fix the downstream error paths?
That means fixing /sbin/mount.nfs, and an earlier conversation concluded that
"doing what TCP does" was preferred. The error path from NFS and RPC is,
frankly, more than a little tortuous. The error is translated and filtered in
both layers, after being returned from the transport. Then, the mount command
makes up its own diagnostic from what comes back from the syscall. Well beyond
the scope of RDMA.
Your call. As proposed, it is more compatible with current practice, IMO.
Tom.
>> Signed-off-by: Tom Talpey <talpey@netapp.com>
>> ---
>>
>> net/sunrpc/xprtrdma/rpc_rdma.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
>> index c4b8011..11ea8da 100644
>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>> @@ -700,7 +700,7 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
>> xprt_wake_pending_tasks(xprt, 0);
>> } else {
>> if (xprt_test_and_clear_connected(xprt))
>> - xprt_wake_pending_tasks(xprt, ep->rep_connected);
>> + xprt_wake_pending_tasks(xprt, -ENOTCONN);
>> }
>> spin_unlock_bh(&xprt->transport_lock);
>> }
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails.
[not found] ` <RTPCLUEXC1-PRDbpH7100000075-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
@ 2008-10-08 17:43 ` Trond Myklebust
2008-10-08 19:56 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 17:43 UTC (permalink / raw)
To: Talpey, Thomas; +Cc: linux-nfs
On Wed, 2008-10-08 at 13:40 -0400, Talpey, Thomas wrote:
> At 01:31 PM 10/8/2008, Trond Myklebust wrote:
> >On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
> >> The mount system call path does not expect such errors as ECONNREFUSED
> >> to be returned from failed transport connection attempts, otherwise it
> >> prints simply "internal error". Translate all such errors to ENOTCONN
> >> from RPC/RDMA to match sockets behavior.
> >
> >Hmm... Shouldn't we be passing the ECONNREFUSED error here, and rather
> >fix the downstream error paths?
>
> That means fixing /sbin/mount.nfs, and an earlier conversation concluded that
> "doing what TCP does" was preferred. The error path from NFS and RPC is,
> frankly, more than a little tortuous. The error is translated and filtered in
> both layers, after being returned from the transport. Then, the mount command
> makes up its own diagnostic from what comes back from the syscall. Well beyond
> the scope of RDMA.
>
> Your call. As proposed, it is more compatible with current practice, IMO.
Are you saying that mount.nfs translates 'ECONNREFUSED' as 'internal
error'? That would be a bug...
Trond
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server.
2008-10-08 17:35 ` Trond Myklebust
@ 2008-10-08 17:51 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDjbDt300000076-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
0 siblings, 1 reply; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 17:51 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Tom Talpey, linux-nfs
At 01:35 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
>> The RPC/RDMA code always performed a reconnect-with-backoff, even
>> when re-establishing a connection to a server after the RPC layer
>> closed it due to being idle.
>> ---
>>
>> net/sunrpc/xprtrdma/transport.c | 5 +++--
>> net/sunrpc/xprtrdma/verbs.c | 2 +-
>> 2 files changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/transport.c
>b/net/sunrpc/xprtrdma/transport.c
>> index c7d2380..278a544 100644
>> --- a/net/sunrpc/xprtrdma/transport.c
>> +++ b/net/sunrpc/xprtrdma/transport.c
>> @@ -486,8 +486,9 @@ xprt_rdma_connect(struct rpc_task *task)
>> struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
>>
>> if (!xprt_test_and_set_connecting(xprt)) {
>> - if (r_xprt->rx_ep.rep_connected != 0) {
>> - /* Reconnect */
>> + if (r_xprt->rx_ep.rep_connected &&
>> + r_xprt->rx_ep.rep_connected != -EPIPE) {
>> + /* Reconnect with backoff */
>> schedule_delayed_work(&r_xprt->rdma_connect,
>> xprt->reestablish_timeout);
>> } else {
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index a63d0c0..9ef7e0d 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -317,7 +317,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id,
>struct rdma_cm_event *event)
>> connstate = -ECONNREFUSED;
>> goto connected;
>> case RDMA_CM_EVENT_DISCONNECTED:
>> - connstate = -ECONNABORTED;
>> + connstate = -EPIPE;
>> goto connected;
>> case RDMA_CM_EVENT_DEVICE_REMOVAL:
>> connstate = -ENODEV;
>
>Hmm... Why not rather do the same as the socket code: have the
>disconnect handler paths that don't require exponential backoff just
>reset xprt->reestablish_timeout to 0?
Because we do want a non-zero reestablishment timeout in general, and
the RDMA client has not implemented a connection backoff. So in effect
the value is constant for this code, and I thought treating it as such is
the safer fix.
I'm not 100% convinced the TCP code is correct, btw. It appears to
zero out the reestablish timeout on idle-disconnect, but it's not obvious
to me where it sets it back to a non-zero value. It does try to double
it in xs_connect() though! :-)
I have that issue on my list to look into, of course, but I think it was
out of scope for RDMA.
Tom.
>
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum.
[not found] ` <RTPCLUEXC1-PRDmcarc00000072-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-08 17:40 ` Trond Myklebust
@ 2008-10-08 17:55 ` J. Bruce Fields
2008-10-08 17:58 ` Talpey, Thomas
1 sibling, 1 reply; 36+ messages in thread
From: J. Bruce Fields @ 2008-10-08 17:55 UTC (permalink / raw)
To: Talpey, Thomas; +Cc: Trond Myklebust, Tom Talpey, linux-nfs
On Wed, Oct 08, 2008 at 01:30:56PM -0400, Talpey, Thomas wrote:
> At 01:23 PM 10/8/2008, Trond Myklebust wrote:
> >On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
> >> Internal RPC/RDMA structure updates in preparation for FRMR support.
> >>
> >> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
> >> Signed-off-by: Tom Talpey <talpey@netapp.com>
> >
> >Shouldn't there be a
> >
> >From: Tom Tucker <tom@opengridcomputing.com>
> >
> >at the top of this email in order to indicate that Tom Tucker is the
> >author?
>
> Co-author. Should it have two From lines?
Gotta pick one. I tend to leave whoever got there first as the author.
If it was a pretty involved collaboration I suppose you could even do
something cheesy like assinging half the series to one person and half
to the other.
--b.
>
> Tom.
>
> >
> >> ---
> >>
> >> net/sunrpc/xprtrdma/xprt_rdma.h | 8 +++++++-
> >> 1 files changed, 7 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/include/linux/sunrpc/xprtrdma.h
> >b/include/linux/sunrpc/xprtrdma.h
> >> index 4de56b1..55a5d92 100644
> >> --- a/include/linux/sunrpc/xprtrdma.h
> >> +++ b/include/linux/sunrpc/xprtrdma.h
> >> @@ -78,6 +78,7 @@ enum rpcrdma_memreg {
> >> RPCRDMA_MEMWINDOWS,
> >> RPCRDMA_MEMWINDOWS_ASYNC,
> >> RPCRDMA_MTHCAFMR,
> >> + RPCRDMA_FRMR,
> >> RPCRDMA_ALLPHYSICAL,
> >> RPCRDMA_LAST
> >> };
> >> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
> >b/net/sunrpc/xprtrdma/xprt_rdma.h
> >> index 2427822..05b7898 100644
> >> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
> >> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
> >> @@ -58,6 +58,8 @@ struct rpcrdma_ia {
> >> struct rdma_cm_id *ri_id;
> >> struct ib_pd *ri_pd;
> >> struct ib_mr *ri_bind_mem;
> >> + u32 ri_dma_lkey;
> >> + int ri_have_dma_lkey;
> >> struct completion ri_done;
> >> int ri_async_rc;
> >> enum rpcrdma_memreg ri_memreg_strategy;
> >> @@ -156,6 +158,10 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
> >> union {
> >> struct ib_mw *mw;
> >> struct ib_fmr *fmr;
> >> + struct {
> >> + struct ib_fast_reg_page_list *fr_pgl;
> >> + struct ib_mr *fr_mr;
> >> + } frmr;
> >> } r;
> >> struct list_head mw_list;
> >> } *rl_mw;
> >> @@ -198,7 +204,7 @@ struct rpcrdma_buffer {
> >> atomic_t rb_credits; /* most recent server credits */
> >> unsigned long rb_cwndscale; /* cached framework rpc_cwndscale */
> >> int rb_max_requests;/* client max requests */
> >> - struct list_head rb_mws; /* optional memory windows/fmrs */
> >> + struct list_head rb_mws; /* optional memory windows/fmrs/frmrs */
> >> int rb_send_index;
> >> struct rpcrdma_req **rb_send_bufs;
> >> int rb_recv_index;
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum.
2008-10-08 17:55 ` J. Bruce Fields
@ 2008-10-08 17:58 ` Talpey, Thomas
0 siblings, 0 replies; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 17:58 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Talpey, Thomas, Trond Myklebust, linux-nfs
At 01:55 PM 10/8/2008, J. Bruce Fields wrote:
>On Wed, Oct 08, 2008 at 01:30:56PM -0400, Talpey, Thomas wrote:
>> At 01:23 PM 10/8/2008, Trond Myklebust wrote:
>> >On Wed, 2008-10-08 at 11:47 -0400, Tom Talpey wrote:
>> >> Internal RPC/RDMA structure updates in preparation for FRMR support.
>> >>
>> >> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>> >> Signed-off-by: Tom Talpey <talpey@netapp.com>
>> >
>> >Shouldn't there be a
>> >
>> >From: Tom Tucker <tom@opengridcomputing.com>
>> >
>> >at the top of this email in order to indicate that Tom Tucker is the
>> >author?
>>
>> Co-author. Should it have two From lines?
>
>Gotta pick one. I tend to leave whoever got there first as the author.
Well, I threw out more than half of Tom's code, so I took over as primary. :-)
Tom.
>If it was a pretty involved collaboration I suppose you could even do
>something cheesy like assinging half the series to one person and half
>to the other.
>
>--b.
>
>>
>> Tom.
>>
>> >
>> >> ---
>> >>
>> >> net/sunrpc/xprtrdma/xprt_rdma.h | 8 +++++++-
>> >> 1 files changed, 7 insertions(+), 1 deletions(-)
>> >>
>> >> diff --git a/include/linux/sunrpc/xprtrdma.h
>> >b/include/linux/sunrpc/xprtrdma.h
>> >> index 4de56b1..55a5d92 100644
>> >> --- a/include/linux/sunrpc/xprtrdma.h
>> >> +++ b/include/linux/sunrpc/xprtrdma.h
>> >> @@ -78,6 +78,7 @@ enum rpcrdma_memreg {
>> >> RPCRDMA_MEMWINDOWS,
>> >> RPCRDMA_MEMWINDOWS_ASYNC,
>> >> RPCRDMA_MTHCAFMR,
>> >> + RPCRDMA_FRMR,
>> >> RPCRDMA_ALLPHYSICAL,
>> >> RPCRDMA_LAST
>> >> };
>> >> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>> >b/net/sunrpc/xprtrdma/xprt_rdma.h
>> >> index 2427822..05b7898 100644
>> >> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>> >> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>> >> @@ -58,6 +58,8 @@ struct rpcrdma_ia {
>> >> struct rdma_cm_id *ri_id;
>> >> struct ib_pd *ri_pd;
>> >> struct ib_mr *ri_bind_mem;
>> >> + u32 ri_dma_lkey;
>> >> + int ri_have_dma_lkey;
>> >> struct completion ri_done;
>> >> int ri_async_rc;
>> >> enum rpcrdma_memreg ri_memreg_strategy;
>> >> @@ -156,6 +158,10 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
>> >> union {
>> >> struct ib_mw *mw;
>> >> struct ib_fmr *fmr;
>> >> + struct {
>> >> + struct ib_fast_reg_page_list *fr_pgl;
>> >> + struct ib_mr *fr_mr;
>> >> + } frmr;
>> >> } r;
>> >> struct list_head mw_list;
>> >> } *rl_mw;
>> >> @@ -198,7 +204,7 @@ struct rpcrdma_buffer {
>> >> atomic_t rb_credits; /* most recent server credits */
>> >> unsigned long rb_cwndscale; /* cached framework rpc_cwndscale */
>> >> int rb_max_requests;/* client max requests */
>> >> - struct list_head rb_mws; /* optional memory windows/fmrs */
>> >> + struct list_head rb_mws; /* optional memory windows/fmrs/frmrs */
>> >> int rb_send_index;
>> >> struct rpcrdma_req **rb_send_bufs;
>> >> int rb_recv_index;
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server.
[not found] ` <RTPCLUEXC1-PRDjbDt300000076-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
@ 2008-10-08 18:04 ` Trond Myklebust
2008-10-08 19:05 ` Talpey, Thomas
0 siblings, 1 reply; 36+ messages in thread
From: Trond Myklebust @ 2008-10-08 18:04 UTC (permalink / raw)
To: Talpey, Thomas; +Cc: linux-nfs
On Wed, 2008-10-08 at 13:51 -0400, Talpey, Thomas wrote:
> At 01:35 PM 10/8/2008, Trond Myklebust wrote:
> >Hmm... Why not rather do the same as the socket code: have the
> >disconnect handler paths that don't require exponential backoff just
> >reset xprt->reestablish_timeout to 0?
>
> Because we do want a non-zero reestablishment timeout in general, and
> the RDMA client has not implemented a connection backoff. So in effect
> the value is constant for this code, and I thought treating it as such is
> the safer fix.
>
> I'm not 100% convinced the TCP code is correct, btw. It appears to
> zero out the reestablish timeout on idle-disconnect, but it's not obvious
> to me where it sets it back to a non-zero value. It does try to double
> it in xs_connect() though! :-)
The TCP code sets the xprt->reestablish_timeout to a non-zero value
whenever the _server_ closes the connection (i.e. if ever we enter a
SYN_SENT state followed by a reset, a CLOSE_WAIT state or a CLOSING
state.
Why would the RDMA client want to do anything different?
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server.
2008-10-08 18:04 ` Trond Myklebust
@ 2008-10-08 19:05 ` Talpey, Thomas
0 siblings, 0 replies; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 19:05 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Talpey, Thomas, linux-nfs
At 02:04 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 13:51 -0400, Talpey, Thomas wrote:
>> At 01:35 PM 10/8/2008, Trond Myklebust wrote:
>> >Hmm... Why not rather do the same as the socket code: have the
>> >disconnect handler paths that don't require exponential backoff just
>> >reset xprt->reestablish_timeout to 0?
>>
>> Because we do want a non-zero reestablishment timeout in general, and
>> the RDMA client has not implemented a connection backoff. So in effect
>> the value is constant for this code, and I thought treating it as such is
>> the safer fix.
>>
>> I'm not 100% convinced the TCP code is correct, btw. It appears to
>> zero out the reestablish timeout on idle-disconnect, but it's not obvious
>> to me where it sets it back to a non-zero value. It does try to double
>> it in xs_connect() though! :-)
>
>The TCP code sets the xprt->reestablish_timeout to a non-zero value
>whenever the _server_ closes the connection (i.e. if ever we enter a
>SYN_SENT state followed by a reset, a CLOSE_WAIT state or a CLOSING
>state.
Hmm, I guess. It's driven off the TCP state machine so it doesn't translate
well to the RDMA layer, which doesn't give the same granularity of upcall
event (all we see is success/fail). I think I can get close though.
>Why would the RDMA client want to do anything different?
It wouldn't, but the RDMA layer isn't the same as TCP. For example,
on Infiniband, which is a nearly lossless local medium, there are fewer
reasons to back off. And even over iWARP, the NIC's TCP stack is
handling all the recovery and retry, so it's generally better to not
overthink it.
The constant-backoff approach in the current RDMA client, however,
takes in the issue when the upper layer (nfsd) is involved. So, "going
exponential" isn't a big change, and worthwhile.
New patch on the way.
Tom.
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails.
2008-10-08 17:43 ` Trond Myklebust
@ 2008-10-08 19:56 ` Talpey, Thomas
0 siblings, 0 replies; 36+ messages in thread
From: Talpey, Thomas @ 2008-10-08 19:56 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Talpey, Thomas, linux-nfs
At 01:43 PM 10/8/2008, Trond Myklebust wrote:
>On Wed, 2008-10-08 at 13:40 -0400, Talpey, Thomas wrote:
>> At 01:31 PM 10/8/2008, Trond Myklebust wrote:
>> >On Wed, 2008-10-08 at 11:48 -0400, Tom Talpey wrote:
>> >> The mount system call path does not expect such errors as ECONNREFUSED
>> >> to be returned from failed transport connection attempts, otherwise it
>> >> prints simply "internal error". Translate all such errors to ENOTCONN
>> >> from RPC/RDMA to match sockets behavior.
>> >
>> >Hmm... Shouldn't we be passing the ECONNREFUSED error here, and rather
>> >fix the downstream error paths?
>>
>> That means fixing /sbin/mount.nfs, and an earlier conversation concluded that
>> "doing what TCP does" was preferred. The error path from NFS and RPC is,
>> frankly, more than a little tortuous. The error is translated and filtered in
>> both layers, after being returned from the transport. Then, the mount command
>> makes up its own diagnostic from what comes back from the syscall.
>Well beyond
>> the scope of RDMA.
>>
>> Your call. As proposed, it is more compatible with current practice, IMO.
>
>Are you saying that mount.nfs translates 'ECONNREFUSED' as 'internal
>error'? That would be a bug...
No, unfortunately it's a good bit more complicated than that. Sorry for
oversimplifying. Mount.nfs would need to change, but the kernel too.
What happens is, the XYZ transport returns a connect status, which xprt.c's
xprt_connect_status() looks at and if non-zero decides what to dprintk, and
whether to retry.
xprt_connect_status() only parses two errors: ENOTCONN and ETIMEDOUT.
These result in various attempts to rebind and retry, as appropriate.
If any other error is returned, the status is changed to EIO and the call
is aborted. When the caller is mount, this results in EIO popping out of
the kernel as the return of sys_mount().
The EIO is then handled by mount.nfs in various unhelpful ways. Mount
pretty much never sees ECONNREFUSED from this call (though its userspace
stuff such as looking up ports and pinging servers does).
So, I just decided to return ENOTCONN like the other transports. I could
add new error arms to this code, but IMO they'd be unnecessary, for TCP
and UDP anyway.
Tom.
>
>Trond
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2008-10-08 19:57 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-08 15:46 [PATCH 00/15] RPC/RDMA patchset for next merge window Tom Talpey
[not found] ` <20081008154506.1336.59892.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 15:47 ` [PATCH 01/15] RPC/RDMA: refactor the inline memory registration code Tom Talpey
2008-10-08 15:47 ` [PATCH 02/15] RPC/RDMA: add data types and new FRMR memory registration enum Tom Talpey
[not found] ` <20081008154713.1336.41538.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 17:23 ` Trond Myklebust
2008-10-08 17:30 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDmcarc00000072-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-08 17:40 ` Trond Myklebust
2008-10-08 17:55 ` J. Bruce Fields
2008-10-08 17:58 ` Talpey, Thomas
2008-10-08 15:47 ` [PATCH 03/15] RPC/RDMA: check selected memory registration mode at runtime Tom Talpey
[not found] ` <20081008154723.1336.57976.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 17:22 ` Trond Myklebust
2008-10-08 17:29 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRD8yfog00000071-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-08 17:40 ` Trond Myklebust
2008-10-08 15:47 ` [PATCH 04/15] RPC/RDMA: support FRMR client memory registration Tom Talpey
2008-10-08 15:47 ` [PATCH 05/15] RPC/RDMA: fix connection IRD/ORD setting Tom Talpey
[not found] ` <20081008154744.1336.20909.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 17:26 ` Trond Myklebust
2008-10-08 17:32 ` Talpey, Thomas
2008-10-08 15:47 ` [PATCH 06/15] RPC/RDMA: suppress retransmit on RPC/RDMA clients Tom Talpey
2008-10-08 15:48 ` [PATCH 07/15] RPC/RDMA: maintain the RPC task bytes-sent statistic Tom Talpey
2008-10-08 15:48 ` [PATCH 08/15] RPC/RDMA: avoid an oops due to disconnect racing with async upcalls Tom Talpey
2008-10-08 15:48 ` [PATCH 09/15] RPC/RDMA: adhere to protocol for unpadded client trailing write chunks Tom Talpey
[not found] ` <20081008154825.1336.79549.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 17:29 ` Trond Myklebust
2008-10-08 17:33 ` Talpey, Thomas
2008-10-08 15:48 ` [PATCH 10/15] RPC/RDMA: return a consistent error to mount, when connect fails Tom Talpey
[not found] ` <20081008154835.1336.85484.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 17:31 ` Trond Myklebust
2008-10-08 17:40 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDbpH7100000075-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-08 17:43 ` Trond Myklebust
2008-10-08 19:56 ` Talpey, Thomas
2008-10-08 15:48 ` [PATCH 11/15] RPC/RDMA: fix connect/reconnect resource leak Tom Talpey
2008-10-08 15:48 ` [PATCH 12/15] RPC/RDMA: correct a 5 second pause on reconnecting to an idle server Tom Talpey
[not found] ` <20081008154856.1336.18339.stgit-pfX4bTJKMULWwzOYslWYilaTQe2KTcn/@public.gmane.org>
2008-10-08 17:35 ` Trond Myklebust
2008-10-08 17:51 ` Talpey, Thomas
[not found] ` <RTPCLUEXC1-PRDjbDt300000076-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-08 18:04 ` Trond Myklebust
2008-10-08 19:05 ` Talpey, Thomas
2008-10-08 15:49 ` [PATCH 13/15] RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls Tom Talpey
2008-10-08 15:49 ` [PATCH 14/15] RPC/RDMA: reformat a debug printk to keep lines together Tom Talpey
2008-10-08 15:49 ` [PATCH 15/15] RPC/RDMA: optionally emit useful transport info upon connect/disconnect Tom Talpey
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.