* Re: [PATCH 01/10] svcrdma: Add Fast Reg MR Data Types
[not found] ` <1221564879-85046-2-git-send-email-tom@opengridcomputing.com>
@ 2008-09-24 19:11 ` J. Bruce Fields
2008-09-25 14:27 ` Tom Tucker
[not found] ` <1221564879-85046-3-git-send-email-tom@opengridcomputing.com>
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-24 19:11 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Tue, Sep 16, 2008 at 06:34:30AM -0500, Tom Tucker wrote:
> Add data types to track Fast Reg Memory Regions. The core data type is
> svc_rdma_fastreg_mr that associates a device MR with a host kva and page
> list. A field is added to the WR context to keep track of the FRMR
> used to map the local memory for an RPC.
>
> An FRMR list and spin lock are added to the transport instance to keep
> track of all FRMR allocated for the transport. Also added are device
> capability flags to indicate what the memory registration
> capabilities are for the underlying device and whether or not fast
> memory registration is supported.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>
> ---
> include/linux/sunrpc/svc_rdma.h | 27 ++++++++++++++++++++++++++-
> 1 files changed, 26 insertions(+), 1 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index dc05b54..295ebbc 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -72,6 +72,7 @@ extern atomic_t rdma_stat_sq_prod;
> */
> struct svc_rdma_op_ctxt {
> struct svc_rdma_op_ctxt *read_hdr;
> + struct svc_rdma_fastreg_mr *frmr;
> int hdr_count;
> struct xdr_buf arg;
> struct list_head dto_q;
> @@ -103,16 +104,35 @@ struct svc_rdma_chunk_sge {
> int start; /* sge no for this chunk */
> int count; /* sge count for this chunk */
> };
> +struct svc_rdma_fastreg_mr {
> + struct ib_mr *mr;
> + void *kva;
> + struct ib_fast_reg_page_list *page_list;
> + int page_list_len;
> + unsigned long access_flags;
> + unsigned long map_len;
> + enum dma_data_direction direction;
> + struct list_head frmr_list;
> +};
> struct svc_rdma_req_map {
> + struct svc_rdma_fastreg_mr *frmr;
> unsigned long count;
> union {
> struct kvec sge[RPCSVC_MAXPAGES];
> struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
> };
> };
> -
> +#define RDMACTXT_F_FAST_UNREG 1
> #define RDMACTXT_F_LAST_CTXT 2
>
> +enum svcrdma_dev_cap { /* device supports: */
> + SVCRDMA_DEVCAP_FAST_REG = 1, /* - fast mr registration */
> + SVCRDMA_DEVCAP_READ_W_INV = 2, /* - read w/ invalidate */
> + SVCRDMA_DEVCAP_LCL_DMA_MR = 4, /* - lcl dma lkey */
> + SVCRDMA_DEVCAP_LCL_WR_REQ = 8, /* - data sink requires remote */
> + /* write */
> +};
Nit: I suppose that works, but I thought the idea of an enum was that it
enumerated every possible value of the given type? If you're expecting
to "or" these together I would have expected just some kind of unsigned
int with some #define's.
--b.
> +
> struct svcxprt_rdma {
> struct svc_xprt sc_xprt; /* SVC transport structure */
> struct rdma_cm_id *sc_cm_id; /* RDMA connection id */
> @@ -136,6 +156,11 @@ struct svcxprt_rdma {
> struct ib_cq *sc_rq_cq;
> struct ib_cq *sc_sq_cq;
> struct ib_mr *sc_phys_mr; /* MR for server memory */
> + enum svcrdma_dev_cap sc_dev_caps; /* distilled device caps */
> + u32 sc_dma_lkey; /* local dma key */
> + unsigned int sc_frmr_pg_list_len;
> + struct list_head sc_frmr_q;
> + spinlock_t sc_frmr_q_lock;
>
> spinlock_t sc_lock; /* transport lock */
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/10] svcrdma: Add FRMR get/put services
[not found] ` <1221564879-85046-3-git-send-email-tom@opengridcomputing.com>
@ 2008-09-24 19:45 ` J. Bruce Fields
2008-09-25 14:25 ` Tom Tucker
[not found] ` <1221564879-85046-4-git-send-email-tom@opengridcomputing.com>
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-24 19:45 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Tue, Sep 16, 2008 at 06:34:31AM -0500, Tom Tucker wrote:
> Add services for the allocating, freeing, and unmapping Fast Reg MR. These
> services will be used by the transport connection setup, send and receive
> routines.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>
> ---
> include/linux/sunrpc/svc_rdma.h | 3 +
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 125 ++++++++++++++++++++++++++++-
> 2 files changed, 123 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index 295ebbc..100754e 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -219,6 +219,9 @@ extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
> extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
> extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
> extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
> +extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
> +extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
> + struct svc_rdma_fastreg_mr *);
> extern void svc_sq_reap(struct svcxprt_rdma *);
> extern void svc_rq_reap(struct svcxprt_rdma *);
> extern struct svc_xprt_class svc_rdma_class;
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 900cb69..f200345 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -100,6 +100,7 @@ struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
> ctxt->xprt = xprt;
> INIT_LIST_HEAD(&ctxt->dto_q);
> ctxt->count = 0;
> + ctxt->frmr = NULL;
> atomic_inc(&xprt->sc_ctxt_used);
> return ctxt;
> }
> @@ -109,11 +110,19 @@ static void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
> struct svcxprt_rdma *xprt = ctxt->xprt;
> int i;
> for (i = 0; i < ctxt->count && ctxt->sge[i].length; i++) {
> - atomic_dec(&xprt->sc_dma_used);
> - ib_dma_unmap_single(xprt->sc_cm_id->device,
> - ctxt->sge[i].addr,
> - ctxt->sge[i].length,
> - ctxt->direction);
> + /*
> + * Unmap the DMA addr in the SGE if the lkey matches
> + * the sc_dma_lkey, otherwise, ignore it since it is
> + * an FRMR lkey and will be unmapped later when the
> + * last WR that uses it completes.
I kinda wish at the start of this that we'd made a rule that any acronym
that won't doesn't have a definition in the first page of its google
results should have at least its expansion added to a glossary someplace
in Documentation/ or the c source. Maybe it's too late now.
> + */
> + if (ctxt->sge[i].lkey == xprt->sc_dma_lkey) {
I'd probably have done
if (ctx->sge[i].lkey != xprt->sc_dma_lkey)
continue;
and left the rest alone; but OK.
(By the way, is it ever possible this test will return different results
on different passes through the loop, or could we just as well break out
the first time we see this?:
if (ctx->sge[i].lkey != xprt->sc_dma_lkey)
break;
)
> + atomic_dec(&xprt->sc_dma_used);
> + ib_dma_unmap_single(xprt->sc_cm_id->device,
> + ctxt->sge[i].addr,
> + ctxt->sge[i].length,
> + ctxt->direction);
> + }
> }
> }
>
> @@ -150,6 +159,7 @@ struct svc_rdma_req_map *svc_rdma_get_req_map(void)
> schedule_timeout_uninterruptible(msecs_to_jiffies(500));
> }
> map->count = 0;
> + map->frmr = NULL;
> return map;
> }
>
> @@ -425,10 +435,12 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
> INIT_LIST_HEAD(&cma_xprt->sc_dto_q);
> INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
> INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
> + INIT_LIST_HEAD(&cma_xprt->sc_frmr_q);
> init_waitqueue_head(&cma_xprt->sc_send_wait);
>
> spin_lock_init(&cma_xprt->sc_lock);
> spin_lock_init(&cma_xprt->sc_rq_dto_lock);
> + spin_lock_init(&cma_xprt->sc_frmr_q_lock);
>
> cma_xprt->sc_ord = svcrdma_ord;
>
> @@ -686,6 +698,106 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
> return ERR_PTR(ret);
> }
>
> +static int rdma_alloc_frmr(struct svcxprt_rdma *xprt)
> +{
> + struct ib_mr *mr;
> + struct ib_fast_reg_page_list *pl;
> + struct svc_rdma_fastreg_mr *frmr;
> +
> + mr = ib_alloc_fast_reg_mr(xprt->sc_pd, RPCSVC_MAXPAGES);
> + if (!mr)
> + goto errout;
> + pl = ib_alloc_fast_reg_page_list(xprt->sc_cm_id->device,
> + RPCSVC_MAXPAGES);
> + if (!pl) {
> + ib_dereg_mr(mr);
> + goto errout;
> + }
> + frmr = kmalloc(sizeof(*frmr), GFP_KERNEL);
> + if (!frmr) {
> + ib_dereg_mr(mr);
> + ib_free_fast_reg_page_list(pl);
> + goto errout;
> + }
> + frmr->mr = mr;
> + frmr->page_list = pl;
> + INIT_LIST_HEAD(&frmr->frmr_list);
This INIT_LIST_HEAD() is redundant given the following list_add().
> + spin_lock_bh(&xprt->sc_frmr_q_lock);
> + list_add(&frmr->frmr_list, &xprt->sc_frmr_q);
> + spin_unlock_bh(&xprt->sc_frmr_q_lock);
> +
> + return 0;
> +
> + errout:
> + return -ENOMEM;
Just a style nit: I like the exit-in-one-place thing, but if it's just a
bare "return" I don't see much advantage over replacing each goto by a
bare return. The standard kernel style here would be:
...
if (!mr)
goto errout;
...
if (!pl)
goto err_free_mr;
...
if (!frmr)
goto err_free_frmr;
...
return 0;
err_free_frmr:
ib_free_fast_reg_page_list(pl);
err_free_mr:
ib_dereg_mr(mr);
errout:
return -ENOMEM;
}
> +}
> +
> +static void rdma_dealloc_frmr_q(struct svcxprt_rdma *xprt)
> +{
> + struct svc_rdma_fastreg_mr *frmr;
> +
> + while (!list_empty(&xprt->sc_frmr_q)) {
> + frmr = list_entry(xprt->sc_frmr_q.next,
> + struct svc_rdma_fastreg_mr, frmr_list);
> + list_del_init(&frmr->frmr_list);
> + ib_dereg_mr(frmr->mr);
> + ib_free_fast_reg_page_list(frmr->page_list);
> + kfree(frmr);
> + }
> +}
> +
> +struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *rdma)
> +{
> + struct svc_rdma_fastreg_mr *frmr = NULL;
> +
> + while (1) {
> + spin_lock_bh(&rdma->sc_frmr_q_lock);
> + if (!list_empty(&rdma->sc_frmr_q)) {
> + frmr = list_entry(rdma->sc_frmr_q.next,
> + struct svc_rdma_fastreg_mr, frmr_list);
> + list_del_init(&frmr->frmr_list);
> + }
> + spin_unlock_bh(&rdma->sc_frmr_q_lock);
> + if (frmr)
> + break;
> + if (rdma_alloc_frmr(rdma))
> + return ERR_PTR(-ENOMEM);
This business with allocating stuff and adding it to the list, then
looking for it next time around, seems a little more clever than
necessary. Why not ditch the loop, take the list_add() out of
rdma_alloc_frmr() and have it return the newly allocated frmr or NULL
instead, and then do something like this?:
spin_lock_bh(&rdma->sc_frmr_q_lock);
if (!list_empty(&rdma->sc_frmr_q)) {
frmr = list_entry(rdma->sc_frmr_q.next, struct
svc_rdma_fastreg_mr, frmr_list);
list_del_init(&frmr->frmr_list);
}
spin_unlock_bh(&rdma->sc_frmr_q_lock);
if (!frmr) {
frmr = rdma_alloc_frmr(rdma);
if (!frmr)
return ERR_PTR(-ENOMEM);
}
frmr->map_len = 0;
...
There doesn't seem to be much point to adding to sc_frmr_q just to take
it right back off again immediately. Maybe I'm missing something.
> + }
> + frmr->map_len = 0;
> + frmr->page_list_len = 0;
> +
> + return frmr;
> +}
> +
> +static void frmr_unmap_dma(struct svcxprt_rdma *xprt,
> + struct svc_rdma_fastreg_mr *frmr)
> +{
> + int page_no;
> + dprintk("svcrdma:%s: xprt %p page_list_len %d\n",
> + __func__, xprt, frmr->page_list_len);
> + for (page_no = 0; page_no < frmr->page_list_len; page_no++) {
> + dma_addr_t addr = frmr->page_list->page_list[page_no];
> + dprintk("svcrdma: %08x %llx\n", frmr->mr->lkey, addr);
Are these dprintk's going to be useful for debugging user issues
remotely, or were they just for your personal use while writing the
code?
We saw recently that we may already have too many dprintk's for them to
be useful in production, and the above seem likely to be rather
frequent.
--b.
> + if (ib_dma_mapping_error(frmr->mr->device, addr))
> + continue;
> + atomic_dec(&xprt->sc_dma_used);
> + ib_dma_unmap_single(frmr->mr->device, addr, PAGE_SIZE,
> + frmr->direction);
> + }
> +}
> +
> +void svc_rdma_put_frmr(struct svcxprt_rdma *rdma,
> + struct svc_rdma_fastreg_mr *frmr)
> +{
> + if (frmr) {
> + frmr_unmap_dma(rdma, frmr);
> + spin_lock_bh(&rdma->sc_frmr_q_lock);
> + BUG_ON(!list_empty(&frmr->frmr_list));
> + list_add(&frmr->frmr_list, &rdma->sc_frmr_q);
> + spin_unlock_bh(&rdma->sc_frmr_q_lock);
> + }
> +}
> +
> /*
> * This is the xpo_recvfrom function for listening endpoints. Its
> * purpose is to accept incoming connections. The CMA callback handler
> @@ -961,6 +1073,9 @@ static void __svc_rdma_free(struct work_struct *work)
> WARN_ON(atomic_read(&rdma->sc_ctxt_used) != 0);
> WARN_ON(atomic_read(&rdma->sc_dma_used) != 0);
>
> + /* De-allocate fastreg mr */
> + rdma_dealloc_frmr_q(rdma);
> +
> /* Destroy the QP if present (not a listener) */
> if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
> ib_destroy_qp(rdma->sc_qp);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 03/10] svcrdma: Query device for Fast Reg support during connection setup
[not found] ` <1221564879-85046-4-git-send-email-tom@opengridcomputing.com>
@ 2008-09-24 20:10 ` J. Bruce Fields
2008-09-25 14:08 ` Tom Tucker
[not found] ` <1221564879-85046-5-git-send-email-tom@opengridcomputing.com>
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-24 20:10 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Tue, Sep 16, 2008 at 06:34:32AM -0500, Tom Tucker wrote:
> Query the device capabilities in the svc_rdma_accept function to determine
> what advanced memory management capabilities are supported by the device.
> Based on the query, select the most secure model available given the
> requirements of the transport and capabilities of the adapter.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>
> ---
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 86 +++++++++++++++++++++++++++--
> 1 files changed, 80 insertions(+), 6 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index f200345..8586c7d 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -816,6 +816,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> struct rdma_conn_param conn_param;
> struct ib_qp_init_attr qp_attr;
> struct ib_device_attr devattr;
> + int dma_mr_acc;
> int ret;
> int i;
>
> @@ -931,15 +932,88 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> }
> newxprt->sc_qp = newxprt->sc_cm_id->qp;
>
> - /* Register all of physical memory */
> - newxprt->sc_phys_mr = ib_get_dma_mr(newxprt->sc_pd,
> - IB_ACCESS_LOCAL_WRITE |
> - IB_ACCESS_REMOTE_WRITE);
> - if (IS_ERR(newxprt->sc_phys_mr)) {
> - dprintk("svcrdma: Failed to create DMA MR ret=%d\n", ret);
> + /*
> + * Use the most secure set of MR resources based on the
> + * transport type and available memory management features in
> + * the device. Here's the table implemented below:
> + *
> + * Fast Global DMA Remote WR
> + * Reg LKEY MR Access
> + * Sup'd Sup'd Needed Needed
> + *
> + * IWARP N N Y Y
> + * N Y Y Y
> + * Y N Y N
> + * Y Y N -
> + *
> + * IB N N Y N
> + * N Y N -
> + * Y N Y N
> + * Y Y N -
> + *
> + * NB: iWARP requires remote write access for the data sink
> + * of an RDMA_READ. IB does not.
> + */
> + if (devattr.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
> + newxprt->sc_frmr_pg_list_len =
> + devattr.max_fast_reg_page_list_len;
> + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
> + BUG_ON(!newxprt->sc_frmr_pg_list_len);
> + }
> +
> + /*
> + * Determine if a DMA MR is required and if so, what privs are required
> + */
> + switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) {
> + case RDMA_TRANSPORT_IWARP:
> + if (0 == (newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
> + dma_mr_acc =
> + (IB_ACCESS_LOCAL_WRITE |
> + IB_ACCESS_REMOTE_WRITE);
> + } else if (0 == (devattr.device_cap_flags &
> + IB_DEVICE_LOCAL_DMA_LKEY)) {
> + dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
> + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
> + } else {
> + dma_mr_acc = 0;
> + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
> + }
> + break;
> + case RDMA_TRANSPORT_IB:
> + if ((0 == (newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) &&
> + (0 == (devattr.device_cap_flags &
> + IB_DEVICE_LOCAL_DMA_LKEY)))
> + dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
> + else
> + dma_mr_acc = 0;
This doesn't seem to agree exactly with the table above, e.g. in the IB
Y N Y N
case? Of course, I may be misreading the table....
> + break;
> + default:
> goto errout;
> }
>
> + /*
> + * If we have a global dma lkey, use it for local access
> + */
> + if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
> + newxprt->sc_dma_lkey =
> + newxprt->sc_cm_id->device->local_dma_lkey;
> +
> + /*
> + * If local write was required for local access, the lcl dma
> + * lkey can't be used.
> + */
> + if (dma_mr_acc) {
> + /* Register all of physical memory */
> + newxprt->sc_phys_mr =
> + ib_get_dma_mr(newxprt->sc_pd, dma_mr_acc);
> + if (IS_ERR(newxprt->sc_phys_mr)) {
> + dprintk("svcrdma: Failed to create DMA MR ret=%d\n",
> + ret);
> + goto errout;
> + }
> + newxprt->sc_dma_lkey = newxprt->sc_phys_mr->lkey;
> + }
Is it possible for neither of the above conditions to be true, and if so
is it a problem if newxprt->sc_dma_lkey doesn't get initialized?
--b.
> +
> /* Post receive buffers */
> for (i = 0; i < newxprt->sc_max_requests; i++) {
> ret = svc_rdma_post_recv(newxprt);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 04/10] svcrdma: Add a service to register a Fast Reg MR with the device
[not found] ` <1221564879-85046-5-git-send-email-tom@opengridcomputing.com>
@ 2008-09-24 20:25 ` J. Bruce Fields
2008-09-25 13:31 ` Tom Tucker
[not found] ` <1221564879-85046-6-git-send-email-tom@opengridcomputing.com>
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-24 20:25 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Tue, Sep 16, 2008 at 06:34:33AM -0500, Tom Tucker wrote:
> Fast Reg MR introduces a new WR type. Add a service to register the
> region with the adapter and update the completion handling to support
> completions with a NULL WR context.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>
> ---
> include/linux/sunrpc/svc_rdma.h | 1 +
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 53 ++++++++++++++++++++++++++---
> 2 files changed, 48 insertions(+), 6 deletions(-)
>
> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
> index 100754e..6899b71 100644
> --- a/include/linux/sunrpc/svc_rdma.h
> +++ b/include/linux/sunrpc/svc_rdma.h
> @@ -219,6 +219,7 @@ extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
> extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
> extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
> extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
> +extern int svc_rdma_fastreg(struct svcxprt_rdma *, struct svc_rdma_fastreg_mr *);
> extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
> extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
> struct svc_rdma_fastreg_mr *);
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 8586c7d..b8c642d 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -344,10 +344,6 @@ static void sq_cq_reap(struct svcxprt_rdma *xprt)
> ib_req_notify_cq(xprt->sc_sq_cq, IB_CQ_NEXT_COMP);
> atomic_inc(&rdma_stat_sq_poll);
> while ((ret = ib_poll_cq(cq, 1, &wc)) > 0) {
> - ctxt = (struct svc_rdma_op_ctxt *)(unsigned long)wc.wr_id;
> - xprt = ctxt->xprt;
So this assignment to xprt was always unnecessary?
> -
> - svc_rdma_unmap_dma(ctxt);
> if (wc.status != IB_WC_SUCCESS)
> /* Close the transport */
> set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
> @@ -356,6 +352,11 @@ static void sq_cq_reap(struct svcxprt_rdma *xprt)
> atomic_dec(&xprt->sc_sq_count);
> wake_up(&xprt->sc_send_wait);
>
> + ctxt = (struct svc_rdma_op_ctxt *)(unsigned long)wc.wr_id;
> + if (!ctxt)
> + goto skip_it;
> + svc_rdma_unmap_dma(ctxt);
> +
> switch (ctxt->wr_op) {
> case IB_WR_SEND:
> svc_rdma_put_context(ctxt, 1);
> @@ -385,6 +386,7 @@ static void sq_cq_reap(struct svcxprt_rdma *xprt)
> wc.opcode, wc.status);
> break;
> }
> + skip_it:
That big switch we just goto'd over might be a candidate for
encapsulating in a separate function.
--b.
> svc_xprt_put(&xprt->sc_xprt);
> }
>
> @@ -1203,6 +1205,47 @@ static int svc_rdma_has_wspace(struct svc_xprt *xprt)
> return 1;
> }
>
> +/*
> + * Attempt to register the kvec representing the RPC memory with the
> + * device.
> + *
> + * Returns:
> + * NULL : The device does not support fastreg or there were no more
> + * fastreg mr.
> + * frmr : The kvec register request was successfully posted.
> + * <0 : An error was encountered attempting to register the kvec.
> + */
> +int svc_rdma_fastreg(struct svcxprt_rdma *xprt,
> + struct svc_rdma_fastreg_mr *frmr)
> +{
> + struct ib_send_wr fastreg_wr;
> + u8 key;
> + int ret;
> +
> + /* Bump the key */
> + key = (u8)(frmr->mr->lkey & 0x000000FF);
> + ib_update_fast_reg_key(frmr->mr, ++key);
> +
> + /* Prepare FASTREG WR */
> + memset(&fastreg_wr, 0, sizeof fastreg_wr);
> + fastreg_wr.opcode = IB_WR_FAST_REG_MR;
> + fastreg_wr.send_flags = IB_SEND_SIGNALED;
> + fastreg_wr.wr.fast_reg.iova_start = (unsigned long)frmr->kva;
> + fastreg_wr.wr.fast_reg.page_list = frmr->page_list;
> + fastreg_wr.wr.fast_reg.page_list_len = frmr->page_list_len;
> + fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
> + fastreg_wr.wr.fast_reg.length = frmr->map_len;
> + fastreg_wr.wr.fast_reg.access_flags = frmr->access_flags;
> + fastreg_wr.wr.fast_reg.rkey = frmr->mr->lkey;
> + ret = svc_rdma_send(xprt, &fastreg_wr);
> + dprintk("svcrdma:%s: reg lkey %08x kva %p mrlen %lu pages %d ret %d\n",
> + __func__, frmr->mr->lkey, frmr->kva, frmr->map_len,
> + frmr->page_list_len,
> + ret);
> +
> + return ret;
> +}
> +
> int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
> {
> struct ib_send_wr *bad_wr;
> @@ -1212,8 +1255,6 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
> return -ENOTCONN;
>
> BUG_ON(wr->send_flags != IB_SEND_SIGNALED);
> - BUG_ON(((struct svc_rdma_op_ctxt *)(unsigned long)wr->wr_id)->wr_op !=
> - wr->opcode);
> /* If the SQ is full, wait until an SQ entry is available */
> while (1) {
> spin_lock_bh(&xprt->sc_lock);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 05/10] svcrdma: Modify post recv path to use local dma key
[not found] ` <1221564879-85046-6-git-send-email-tom@opengridcomputing.com>
@ 2008-09-24 20:31 ` J. Bruce Fields
2008-09-25 13:36 ` Tom Tucker
[not found] ` <1221564879-85046-7-git-send-email-tom@opengridcomputing.com>
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-24 20:31 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Tue, Sep 16, 2008 at 06:34:34AM -0500, Tom Tucker wrote:
> Update the svc_rdma_post_recv routine to use the adapter's global LKEY
> instead of sc_phys_mr which is only valid when using a DMA MR.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>
> ---
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index b8c642d..a61caa7 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -476,7 +476,7 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
> struct ib_recv_wr recv_wr, *bad_recv_wr;
> struct svc_rdma_op_ctxt *ctxt;
> struct page *page;
> - unsigned long pa;
> + dma_addr_t pa;
> int sge_no;
> int buflen;
> int ret;
> @@ -488,13 +488,17 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
> BUG_ON(sge_no >= xprt->sc_max_sge);
> page = svc_rdma_get_page();
> ctxt->pages[sge_no] = page;
> - atomic_inc(&xprt->sc_dma_used);
> pa = ib_dma_map_page(xprt->sc_cm_id->device,
> page, 0, PAGE_SIZE,
> DMA_FROM_DEVICE);
> + if (ib_dma_mapping_error(xprt->sc_cm_id->device, pa)) {
> + svc_rdma_put_context(ctxt, 1);
> + return -ENOMEM;
> + }
Might be a tad nicer to do a "goto out_put_ctx" here and consolidate the
cleanup with the similar cleanup done after the later ib_post_recv
failure.
--b.
> + atomic_inc(&xprt->sc_dma_used);
> ctxt->sge[sge_no].addr = pa;
> ctxt->sge[sge_no].length = PAGE_SIZE;
> - ctxt->sge[sge_no].lkey = xprt->sc_phys_mr->lkey;
> + ctxt->sge[sge_no].lkey = xprt->sc_dma_lkey;
> buflen += PAGE_SIZE;
> }
> ctxt->count = sge_no;
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
[not found] ` <1221564879-85046-11-git-send-email-tom@opengridcomputing.com>
@ 2008-09-24 21:21 ` J. Bruce Fields
2008-09-25 13:35 ` Tom Tucker
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-24 21:21 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Tue, Sep 16, 2008 at 06:34:39AM -0500, Tom Tucker wrote:
> This patch adds security related documentation to the nfs-rdma.txt file
> that describes the memory registration model, the potential security
> exploits, and compares these exploits to a similar threat when using TCP
> as the transport.
Thanks for doing this.
>
> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>
> ---
> Documentation/filesystems/nfs-rdma.txt | 66 ++++++++++++++++++++++++++++++++
> 1 files changed, 66 insertions(+), 0 deletions(-)
>
> diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt
> index 44bd766..41f0fb2 100644
> --- a/Documentation/filesystems/nfs-rdma.txt
> +++ b/Documentation/filesystems/nfs-rdma.txt
> @@ -269,3 +269,69 @@ NFS/RDMA Setup
> the "proto" field for the given mount.
>
> Congratulations! You're using NFS/RDMA!
> +
> +Security
> +--------
> +
> + NFSRDMA exploits the RDMA capabilities of the IB and iWARP
> + transports to more efficiently exchange RPC data between the client
> + and the server. This section discusses the security implications of
> + the exchange of memory information on the wire when the wire may be
> + monitorable by an untrusted application. The identifier that
> + encapsulates this memory information is called an RKEY.
> +
> + A principal exploit is that a node listening on a mirror port of a
> + switch
There are probably always other ways to do trick the switch into sending
an attacker some of the traffic. It might be simpler just to say "a
node on the local network".
> + could snoop RDMA packets containing RKEY and then forge a
> + packet with this RKEY to write and/or read the memory of the peer to
> + which the RKEY referred.
> +
> + The NFSRDMA protocol is defined such that a) only the server
> + initiates RDMA, and b) only the client's memory is exposed via
> + RKEY. This is why the server reads to fetch RPC data from the client
> + even though it would be more efficient for the client to write the
> + data to the server's memory. This design goal is not entirely
> + realized with iWARP, however, because the RKEY (called an STag on
> + iWARP) for the data sink of an RDMA_READ is actually placed on the
> + wire, and this RKEY has Remote Write permission. This means that the
> + server's memory is exposed by virtue of having placed the RKEY for
> + it's local memory on the wire in order to receive the result of the
s/it's/its/
> + RDMA_READ.
> +
> + By contrast, IB uses an opaque transaction ID# to associate the
> + READ_RPL with the READ_REQ and the data sink of an READ_REQ does not
> + require remote access. That said, the byzantine node in question
> + could forge a packet with this transaction ID and corrupt the target
> + memory, however, the scope of the exploit is bounded to the lifetime
> + of this single RDMA_READ request and to the memory mapped by the
> + data sink of the READ_REQ.
> +
> + The newer RDMA adapters (both iWARP and IB) support "Fast Memory
> + Registration". This capability allows memory to be quickly
> + registered (i.e. made available for remote access) and de-registered
> + by submitting WR on the SQ. These capabilities provide a mechanism
> + to reduce the exposure discused above by limiting the scope of the
> + exploit. The idea is to create an RKEY that only maps the single RPC
> + and whose effective lifetime is only the exchange of this single
> + RPC. This is the default memory model that is employed by the server
> + when supported by the adapter and by the client when the
> + rdma_memreg_strategy is set to 6. Note that the client and server
> + may use different memory registration strategies, however,
> + performance is better when both the client and server use the
> + FastReg memory registration strategy.
> +
> + This approach has two benefits, a) it restricts the domain of the
> + exploit to the memory of a single RPC, and b) it limits the duration
> + of the exploit to the time it takes to satisfy the RDMA_READ.
> +
> + It is arguable that a one-shot STag/RKEY is no less secure than RPC
> + on the TCP transport. Consider that the exact same byzantine
> + application could more easily corrupt TCP RPC payload by simply
> + forging a packet with the correct TCP sequence number -- in fact
> + it's easier than the RDMA exploit because the RDMA exploit requires
> + that you correctly forge both the TCP packet and the RDMA
> + payload. In addition the duration of the TCP exploit is the lifetime
> + of the connection, not the lifetime of a single WR/RPC data transfer.
> +
> + So if you buy the argument above, RDMA on IB or iWARP using Fast Reg
> + is no less secure than TCP.
I'd leave out the first seven words of that last sentence on the grounds
that it's implicit....
This explanation is helpful, thanks. It would also be helpful if we
could boil down the advice to just a sentence or two for the busy admin.
Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
rdma on a network where you cannot trust every machine....
And better at some point might be to allow nfs-utils to automatically
check for that situation, and/or just to drop support for anything that
can't provide at least a tcp/auth_unix-like security model.
--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 04/10] svcrdma: Add a service to register a Fast Reg MR with the device
2008-09-24 20:25 ` [PATCH 04/10] svcrdma: Add a service to register a Fast Reg MR with the device J. Bruce Fields
@ 2008-09-25 13:31 ` Tom Tucker
0 siblings, 0 replies; 28+ messages in thread
From: Tom Tucker @ 2008-09-25 13:31 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
J. Bruce Fields wrote:
> On Tue, Sep 16, 2008 at 06:34:33AM -0500, Tom Tucker wrote:
>> Fast Reg MR introduces a new WR type. Add a service to register the
>> region with the adapter and update the completion handling to support
>> completions with a NULL WR context.
>>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>
>> ---
>> include/linux/sunrpc/svc_rdma.h | 1 +
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 53 ++++++++++++++++++++++++++---
>> 2 files changed, 48 insertions(+), 6 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>> index 100754e..6899b71 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -219,6 +219,7 @@ extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
>> extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
>> extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
>> extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
>> +extern int svc_rdma_fastreg(struct svcxprt_rdma *, struct svc_rdma_fastreg_mr *);
>> extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
>> extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
>> struct svc_rdma_fastreg_mr *);
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 8586c7d..b8c642d 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -344,10 +344,6 @@ static void sq_cq_reap(struct svcxprt_rdma *xprt)
>> ib_req_notify_cq(xprt->sc_sq_cq, IB_CQ_NEXT_COMP);
>> atomic_inc(&rdma_stat_sq_poll);
>> while ((ret = ib_poll_cq(cq, 1, &wc)) > 0) {
>> - ctxt = (struct svc_rdma_op_ctxt *)(unsigned long)wc.wr_id;
>> - xprt = ctxt->xprt;
>
> So this assignment to xprt was always unnecessary?
>
Yes, the correct xprt is passed in as an argument.
BUG_ON(ctxt->xprt != xprt)
>> -
>> - svc_rdma_unmap_dma(ctxt);
>> if (wc.status != IB_WC_SUCCESS)
>> /* Close the transport */
>> set_bit(XPT_CLOSE, &xprt->sc_xprt.xpt_flags);
>> @@ -356,6 +352,11 @@ static void sq_cq_reap(struct svcxprt_rdma *xprt)
>> atomic_dec(&xprt->sc_sq_count);
>> wake_up(&xprt->sc_send_wait);
>>
>> + ctxt = (struct svc_rdma_op_ctxt *)(unsigned long)wc.wr_id;
>> + if (!ctxt)
>> + goto skip_it;
>> + svc_rdma_unmap_dma(ctxt);
>> +
>> switch (ctxt->wr_op) {
>> case IB_WR_SEND:
>> svc_rdma_put_context(ctxt, 1);
>> @@ -385,6 +386,7 @@ static void sq_cq_reap(struct svcxprt_rdma *xprt)
>> wc.opcode, wc.status);
>> break;
>> }
>> + skip_it:
>
> That big switch we just goto'd over might be a candidate for
> encapsulating in a separate function.
Yes, you're right, that would look better.
>
> --b.
>
>> svc_xprt_put(&xprt->sc_xprt);
>> }
>>
>> @@ -1203,6 +1205,47 @@ static int svc_rdma_has_wspace(struct svc_xprt *xprt)
>> return 1;
>> }
>>
>> +/*
>> + * Attempt to register the kvec representing the RPC memory with the
>> + * device.
>> + *
>> + * Returns:
>> + * NULL : The device does not support fastreg or there were no more
>> + * fastreg mr.
>> + * frmr : The kvec register request was successfully posted.
>> + * <0 : An error was encountered attempting to register the kvec.
>> + */
>> +int svc_rdma_fastreg(struct svcxprt_rdma *xprt,
>> + struct svc_rdma_fastreg_mr *frmr)
>> +{
>> + struct ib_send_wr fastreg_wr;
>> + u8 key;
>> + int ret;
>> +
>> + /* Bump the key */
>> + key = (u8)(frmr->mr->lkey & 0x000000FF);
>> + ib_update_fast_reg_key(frmr->mr, ++key);
>> +
>> + /* Prepare FASTREG WR */
>> + memset(&fastreg_wr, 0, sizeof fastreg_wr);
>> + fastreg_wr.opcode = IB_WR_FAST_REG_MR;
>> + fastreg_wr.send_flags = IB_SEND_SIGNALED;
>> + fastreg_wr.wr.fast_reg.iova_start = (unsigned long)frmr->kva;
>> + fastreg_wr.wr.fast_reg.page_list = frmr->page_list;
>> + fastreg_wr.wr.fast_reg.page_list_len = frmr->page_list_len;
>> + fastreg_wr.wr.fast_reg.page_shift = PAGE_SHIFT;
>> + fastreg_wr.wr.fast_reg.length = frmr->map_len;
>> + fastreg_wr.wr.fast_reg.access_flags = frmr->access_flags;
>> + fastreg_wr.wr.fast_reg.rkey = frmr->mr->lkey;
>> + ret = svc_rdma_send(xprt, &fastreg_wr);
>> + dprintk("svcrdma:%s: reg lkey %08x kva %p mrlen %lu pages %d ret %d\n",
>> + __func__, frmr->mr->lkey, frmr->kva, frmr->map_len,
>> + frmr->page_list_len,
>> + ret);
>> +
>> + return ret;
>> +}
>> +
>> int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
>> {
>> struct ib_send_wr *bad_wr;
>> @@ -1212,8 +1255,6 @@ int svc_rdma_send(struct svcxprt_rdma *xprt, struct ib_send_wr *wr)
>> return -ENOTCONN;
>>
>> BUG_ON(wr->send_flags != IB_SEND_SIGNALED);
>> - BUG_ON(((struct svc_rdma_op_ctxt *)(unsigned long)wr->wr_id)->wr_op !=
>> - wr->opcode);
>> /* If the SQ is full, wait until an SQ entry is available */
>> while (1) {
>> spin_lock_bh(&xprt->sc_lock);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-24 21:21 ` [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model J. Bruce Fields
@ 2008-09-25 13:35 ` Tom Tucker
2008-09-26 16:01 ` Talpey, Thomas
2008-09-26 23:40 ` J. Bruce Fields
0 siblings, 2 replies; 28+ messages in thread
From: Tom Tucker @ 2008-09-25 13:35 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
J. Bruce Fields wrote:
> On Tue, Sep 16, 2008 at 06:34:39AM -0500, Tom Tucker wrote:
>> This patch adds security related documentation to the nfs-rdma.txt file
>> that describes the memory registration model, the potential security
>> exploits, and compares these exploits to a similar threat when using TCP
>> as the transport.
>
> Thanks for doing this.
>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>
>> ---
>> Documentation/filesystems/nfs-rdma.txt | 66 ++++++++++++++++++++++++++++++++
>> 1 files changed, 66 insertions(+), 0 deletions(-)
>>
>> diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt
>> index 44bd766..41f0fb2 100644
>> --- a/Documentation/filesystems/nfs-rdma.txt
>> +++ b/Documentation/filesystems/nfs-rdma.txt
>> @@ -269,3 +269,69 @@ NFS/RDMA Setup
>> the "proto" field for the given mount.
>>
>> Congratulations! You're using NFS/RDMA!
>> +
>> +Security
>> +--------
>> +
>> + NFSRDMA exploits the RDMA capabilities of the IB and iWARP
>> + transports to more efficiently exchange RPC data between the client
>> + and the server. This section discusses the security implications of
>> + the exchange of memory information on the wire when the wire may be
>> + monitorable by an untrusted application. The identifier that
>> + encapsulates this memory information is called an RKEY.
>> +
>> + A principal exploit is that a node listening on a mirror port of a
>> + switch
>
> There are probably always other ways to do trick the switch into sending
> an attacker some of the traffic. It might be simpler just to say "a
> node on the local network".
Ok.
>
>> + could snoop RDMA packets containing RKEY and then forge a
>> + packet with this RKEY to write and/or read the memory of the peer to
>> + which the RKEY referred.
>> +
>> + The NFSRDMA protocol is defined such that a) only the server
>> + initiates RDMA, and b) only the client's memory is exposed via
>> + RKEY. This is why the server reads to fetch RPC data from the client
>> + even though it would be more efficient for the client to write the
>> + data to the server's memory. This design goal is not entirely
>> + realized with iWARP, however, because the RKEY (called an STag on
>> + iWARP) for the data sink of an RDMA_READ is actually placed on the
>> + wire, and this RKEY has Remote Write permission. This means that the
>> + server's memory is exposed by virtue of having placed the RKEY for
>> + it's local memory on the wire in order to receive the result of the
>
> s/it's/its/
>
Yes, erf.
>> + RDMA_READ.
>> +
>> + By contrast, IB uses an opaque transaction ID# to associate the
>> + READ_RPL with the READ_REQ and the data sink of an READ_REQ does not
>> + require remote access. That said, the byzantine node in question
>> + could forge a packet with this transaction ID and corrupt the target
>> + memory, however, the scope of the exploit is bounded to the lifetime
>> + of this single RDMA_READ request and to the memory mapped by the
>> + data sink of the READ_REQ.
>> +
>> + The newer RDMA adapters (both iWARP and IB) support "Fast Memory
>> + Registration". This capability allows memory to be quickly
>> + registered (i.e. made available for remote access) and de-registered
>> + by submitting WR on the SQ. These capabilities provide a mechanism
>> + to reduce the exposure discused above by limiting the scope of the
>> + exploit. The idea is to create an RKEY that only maps the single RPC
>> + and whose effective lifetime is only the exchange of this single
>> + RPC. This is the default memory model that is employed by the server
>> + when supported by the adapter and by the client when the
>> + rdma_memreg_strategy is set to 6. Note that the client and server
>> + may use different memory registration strategies, however,
>> + performance is better when both the client and server use the
>> + FastReg memory registration strategy.
>> +
>> + This approach has two benefits, a) it restricts the domain of the
>> + exploit to the memory of a single RPC, and b) it limits the duration
>> + of the exploit to the time it takes to satisfy the RDMA_READ.
>> +
>> + It is arguable that a one-shot STag/RKEY is no less secure than RPC
>> + on the TCP transport. Consider that the exact same byzantine
>> + application could more easily corrupt TCP RPC payload by simply
>> + forging a packet with the correct TCP sequence number -- in fact
>> + it's easier than the RDMA exploit because the RDMA exploit requires
>> + that you correctly forge both the TCP packet and the RDMA
>> + payload. In addition the duration of the TCP exploit is the lifetime
>> + of the connection, not the lifetime of a single WR/RPC data transfer.
>> +
>> + So if you buy the argument above, RDMA on IB or iWARP using Fast Reg
>> + is no less secure than TCP.
>
> I'd leave out the first seven words of that last sentence on the grounds
> that it's implicit....
Agreed.
>
> This explanation is helpful, thanks. It would also be helpful if we
> could boil down the advice to just a sentence or two for the busy admin.
> Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
> rdma on a network where you cannot trust every machine....
Would it be better to say, "Do not use RDMA on a network where your
policy requires a security model stronger than tcp/auth_unix."
>
> And better at some point might be to allow nfs-utils to automatically
> check for that situation, and/or just to drop support for anything that
> can't provide at least a tcp/auth_unix-like security model.
>
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 05/10] svcrdma: Modify post recv path to use local dma key
2008-09-24 20:31 ` [PATCH 05/10] svcrdma: Modify post recv path to use local dma key J. Bruce Fields
@ 2008-09-25 13:36 ` Tom Tucker
0 siblings, 0 replies; 28+ messages in thread
From: Tom Tucker @ 2008-09-25 13:36 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
J. Bruce Fields wrote:
> On Tue, Sep 16, 2008 at 06:34:34AM -0500, Tom Tucker wrote:
>> Update the svc_rdma_post_recv routine to use the adapter's global LKEY
>> instead of sc_phys_mr which is only valid when using a DMA MR.
>>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 10 +++++++---
>> 1 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index b8c642d..a61caa7 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -476,7 +476,7 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
>> struct ib_recv_wr recv_wr, *bad_recv_wr;
>> struct svc_rdma_op_ctxt *ctxt;
>> struct page *page;
>> - unsigned long pa;
>> + dma_addr_t pa;
>> int sge_no;
>> int buflen;
>> int ret;
>> @@ -488,13 +488,17 @@ int svc_rdma_post_recv(struct svcxprt_rdma *xprt)
>> BUG_ON(sge_no >= xprt->sc_max_sge);
>> page = svc_rdma_get_page();
>> ctxt->pages[sge_no] = page;
>> - atomic_inc(&xprt->sc_dma_used);
>> pa = ib_dma_map_page(xprt->sc_cm_id->device,
>> page, 0, PAGE_SIZE,
>> DMA_FROM_DEVICE);
>> + if (ib_dma_mapping_error(xprt->sc_cm_id->device, pa)) {
>> + svc_rdma_put_context(ctxt, 1);
>> + return -ENOMEM;
>> + }
>
> Might be a tad nicer to do a "goto out_put_ctx" here and consolidate the
> cleanup with the similar cleanup done after the later ib_post_recv
> failure.
>
Yes it would.
> --b.
>
>> + atomic_inc(&xprt->sc_dma_used);
>> ctxt->sge[sge_no].addr = pa;
>> ctxt->sge[sge_no].length = PAGE_SIZE;
>> - ctxt->sge[sge_no].lkey = xprt->sc_phys_mr->lkey;
>> + ctxt->sge[sge_no].lkey = xprt->sc_dma_lkey;
>> buflen += PAGE_SIZE;
>> }
>> ctxt->count = sge_no;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 03/10] svcrdma: Query device for Fast Reg support during connection setup
2008-09-24 20:10 ` [PATCH 03/10] svcrdma: Query device for Fast Reg support during connection setup J. Bruce Fields
@ 2008-09-25 14:08 ` Tom Tucker
0 siblings, 0 replies; 28+ messages in thread
From: Tom Tucker @ 2008-09-25 14:08 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
J. Bruce Fields wrote:
> On Tue, Sep 16, 2008 at 06:34:32AM -0500, Tom Tucker wrote:
>> Query the device capabilities in the svc_rdma_accept function to determine
>> what advanced memory management capabilities are supported by the device.
>> Based on the query, select the most secure model available given the
>> requirements of the transport and capabilities of the adapter.
>>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 86 +++++++++++++++++++++++++++--
>> 1 files changed, 80 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index f200345..8586c7d 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -816,6 +816,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>> struct rdma_conn_param conn_param;
>> struct ib_qp_init_attr qp_attr;
>> struct ib_device_attr devattr;
>> + int dma_mr_acc;
>> int ret;
>> int i;
>>
>> @@ -931,15 +932,88 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>> }
>> newxprt->sc_qp = newxprt->sc_cm_id->qp;
>>
>> - /* Register all of physical memory */
>> - newxprt->sc_phys_mr = ib_get_dma_mr(newxprt->sc_pd,
>> - IB_ACCESS_LOCAL_WRITE |
>> - IB_ACCESS_REMOTE_WRITE);
>> - if (IS_ERR(newxprt->sc_phys_mr)) {
>> - dprintk("svcrdma: Failed to create DMA MR ret=%d\n", ret);
>> + /*
>> + * Use the most secure set of MR resources based on the
>> + * transport type and available memory management features in
>> + * the device. Here's the table implemented below:
>> + *
>> + * Fast Global DMA Remote WR
>> + * Reg LKEY MR Access
>> + * Sup'd Sup'd Needed Needed
>> + *
>> + * IWARP N N Y Y
>> + * N Y Y Y
>> + * Y N Y N
>> + * Y Y N -
>> + *
>> + * IB N N Y N
>> + * N Y N -
>> + * Y N Y N
>> + * Y Y N -
>> + *
>> + * NB: iWARP requires remote write access for the data sink
>> + * of an RDMA_READ. IB does not.
>> + */
>> + if (devattr.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS) {
>> + newxprt->sc_frmr_pg_list_len =
>> + devattr.max_fast_reg_page_list_len;
>> + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_FAST_REG;
>> + BUG_ON(!newxprt->sc_frmr_pg_list_len);
>> + }
>> +
>> + /*
>> + * Determine if a DMA MR is required and if so, what privs are required
>> + */
>> + switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) {
>> + case RDMA_TRANSPORT_IWARP:
>> + if (0 == (newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
>> + dma_mr_acc =
>> + (IB_ACCESS_LOCAL_WRITE |
>> + IB_ACCESS_REMOTE_WRITE);
>> + } else if (0 == (devattr.device_cap_flags &
>> + IB_DEVICE_LOCAL_DMA_LKEY)) {
>> + dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
>> + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
>> + } else {
>> + dma_mr_acc = 0;
>> + newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
>> + }
>> + break;
>> + case RDMA_TRANSPORT_IB:
>> + if ((0 == (newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) &&
>> + (0 == (devattr.device_cap_flags &
>> + IB_DEVICE_LOCAL_DMA_LKEY)))
>> + dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
>> + else
>> + dma_mr_acc = 0;
>
> This doesn't seem to agree exactly with the table above, e.g. in the IB
>
> Y N Y N
>
> case? Of course, I may be misreading the table....
>
Yes, the code only implements N N and Y Y. It's a latent bug because
that's what currently out there. I'll fix. Nice catch Bruce.
>> + break;
>> + default:
>> goto errout;
>> }
>>
>> + /*
>> + * If we have a global dma lkey, use it for local access
>> + */
>> + if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
>> + newxprt->sc_dma_lkey =
>> + newxprt->sc_cm_id->device->local_dma_lkey;
>> +
>> + /*
>> + * If local write was required for local access, the lcl dma
>> + * lkey can't be used.
>> + */
>> + if (dma_mr_acc) {
>> + /* Register all of physical memory */
>> + newxprt->sc_phys_mr =
>> + ib_get_dma_mr(newxprt->sc_pd, dma_mr_acc);
>> + if (IS_ERR(newxprt->sc_phys_mr)) {
>> + dprintk("svcrdma: Failed to create DMA MR ret=%d\n",
>> + ret);
>> + goto errout;
>> + }
>> + newxprt->sc_dma_lkey = newxprt->sc_phys_mr->lkey;
>> + }
>
> Is it possible for neither of the above conditions to be true, and if so
> is it a problem if newxprt->sc_dma_lkey doesn't get initialized?
>
See above, dma_mr_acc is overloaded to mean "dma mr required when
anything other than local read required". It's just bad code. I'll fix it.
> --b.
>
>> +
>> /* Post receive buffers */
>> for (i = 0; i < newxprt->sc_max_requests; i++) {
>> ret = svc_rdma_post_recv(newxprt);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/10] svcrdma: Add FRMR get/put services
2008-09-24 19:45 ` [PATCH 02/10] svcrdma: Add FRMR get/put services J. Bruce Fields
@ 2008-09-25 14:25 ` Tom Tucker
2008-09-25 14:44 ` J. Bruce Fields
0 siblings, 1 reply; 28+ messages in thread
From: Tom Tucker @ 2008-09-25 14:25 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
J. Bruce Fields wrote:
> On Tue, Sep 16, 2008 at 06:34:31AM -0500, Tom Tucker wrote:
>> Add services for the allocating, freeing, and unmapping Fast Reg MR. These
>> services will be used by the transport connection setup, send and receive
>> routines.
>>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>
>> ---
>> include/linux/sunrpc/svc_rdma.h | 3 +
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 125 ++++++++++++++++++++++++++++-
>> 2 files changed, 123 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>> index 295ebbc..100754e 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -219,6 +219,9 @@ extern struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *);
>> extern void svc_rdma_put_context(struct svc_rdma_op_ctxt *, int);
>> extern struct svc_rdma_req_map *svc_rdma_get_req_map(void);
>> extern void svc_rdma_put_req_map(struct svc_rdma_req_map *);
>> +extern struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *);
>> +extern void svc_rdma_put_frmr(struct svcxprt_rdma *,
>> + struct svc_rdma_fastreg_mr *);
>> extern void svc_sq_reap(struct svcxprt_rdma *);
>> extern void svc_rq_reap(struct svcxprt_rdma *);
>> extern struct svc_xprt_class svc_rdma_class;
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 900cb69..f200345 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -100,6 +100,7 @@ struct svc_rdma_op_ctxt *svc_rdma_get_context(struct svcxprt_rdma *xprt)
>> ctxt->xprt = xprt;
>> INIT_LIST_HEAD(&ctxt->dto_q);
>> ctxt->count = 0;
>> + ctxt->frmr = NULL;
>> atomic_inc(&xprt->sc_ctxt_used);
>> return ctxt;
>> }
>> @@ -109,11 +110,19 @@ static void svc_rdma_unmap_dma(struct svc_rdma_op_ctxt *ctxt)
>> struct svcxprt_rdma *xprt = ctxt->xprt;
>> int i;
>> for (i = 0; i < ctxt->count && ctxt->sge[i].length; i++) {
>> - atomic_dec(&xprt->sc_dma_used);
>> - ib_dma_unmap_single(xprt->sc_cm_id->device,
>> - ctxt->sge[i].addr,
>> - ctxt->sge[i].length,
>> - ctxt->direction);
>> + /*
>> + * Unmap the DMA addr in the SGE if the lkey matches
>> + * the sc_dma_lkey, otherwise, ignore it since it is
>> + * an FRMR lkey and will be unmapped later when the
>> + * last WR that uses it completes.
>
> I kinda wish at the start of this that we'd made a rule that any acronym
> that won't doesn't have a definition in the first page of its google
> results should have at least its expansion added to a glossary someplace
> in Documentation/ or the c source. Maybe it's too late now.
>
I suppose I could add a glossary to the doc,
>> + */
>> + if (ctxt->sge[i].lkey == xprt->sc_dma_lkey) {
>
> I'd probably have done
>
> if (ctx->sge[i].lkey != xprt->sc_dma_lkey)
> continue;
I'm game, it reduces the indent level.
>
> and left the rest alone; but OK.
>
> (By the way, is it ever possible this test will return different results
> on different passes through the loop, or could we just as well break out
> the first time we see this?:
>
> if (ctx->sge[i].lkey != xprt->sc_dma_lkey)
> break;
>
I believe that you could have a head that's an lkey and a tail that's an
lkey, but the page_list is fetched with RDMA and was fastreg'd. Mapping
that to an SGE would give you a mix.
> )
>
>> + atomic_dec(&xprt->sc_dma_used);
>> + ib_dma_unmap_single(xprt->sc_cm_id->device,
>> + ctxt->sge[i].addr,
>> + ctxt->sge[i].length,
>> + ctxt->direction);
>> + }
>> }
>> }
>>
>> @@ -150,6 +159,7 @@ struct svc_rdma_req_map *svc_rdma_get_req_map(void)
>> schedule_timeout_uninterruptible(msecs_to_jiffies(500));
>> }
>> map->count = 0;
>> + map->frmr = NULL;
>> return map;
>> }
>>
>> @@ -425,10 +435,12 @@ static struct svcxprt_rdma *rdma_create_xprt(struct svc_serv *serv,
>> INIT_LIST_HEAD(&cma_xprt->sc_dto_q);
>> INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
>> INIT_LIST_HEAD(&cma_xprt->sc_read_complete_q);
>> + INIT_LIST_HEAD(&cma_xprt->sc_frmr_q);
>> init_waitqueue_head(&cma_xprt->sc_send_wait);
>>
>> spin_lock_init(&cma_xprt->sc_lock);
>> spin_lock_init(&cma_xprt->sc_rq_dto_lock);
>> + spin_lock_init(&cma_xprt->sc_frmr_q_lock);
>>
>> cma_xprt->sc_ord = svcrdma_ord;
>>
>> @@ -686,6 +698,106 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
>> return ERR_PTR(ret);
>> }
>>
>> +static int rdma_alloc_frmr(struct svcxprt_rdma *xprt)
>> +{
>> + struct ib_mr *mr;
>> + struct ib_fast_reg_page_list *pl;
>> + struct svc_rdma_fastreg_mr *frmr;
>> +
>> + mr = ib_alloc_fast_reg_mr(xprt->sc_pd, RPCSVC_MAXPAGES);
>> + if (!mr)
>> + goto errout;
>> + pl = ib_alloc_fast_reg_page_list(xprt->sc_cm_id->device,
>> + RPCSVC_MAXPAGES);
>> + if (!pl) {
>> + ib_dereg_mr(mr);
>> + goto errout;
>> + }
>> + frmr = kmalloc(sizeof(*frmr), GFP_KERNEL);
>> + if (!frmr) {
>> + ib_dereg_mr(mr);
>> + ib_free_fast_reg_page_list(pl);
>> + goto errout;
>> + }
>> + frmr->mr = mr;
>> + frmr->page_list = pl;
>> + INIT_LIST_HEAD(&frmr->frmr_list);
>
> This INIT_LIST_HEAD() is redundant given the following list_add().
>
>> + spin_lock_bh(&xprt->sc_frmr_q_lock);
>> + list_add(&frmr->frmr_list, &xprt->sc_frmr_q);
>> + spin_unlock_bh(&xprt->sc_frmr_q_lock);
>> +
>> + return 0;
>> +
>> + errout:
>> + return -ENOMEM;
>
> Just a style nit: I like the exit-in-one-place thing, but if it's just a
> bare "return" I don't see much advantage over replacing each goto by a
> bare return. The standard kernel style here would be:
>
> ...
> if (!mr)
> goto errout;
> ...
> if (!pl)
> goto err_free_mr;
> ...
> if (!frmr)
> goto err_free_frmr;
> ...
> return 0;
> err_free_frmr:
> ib_free_fast_reg_page_list(pl);
> err_free_mr:
> ib_dereg_mr(mr);
> errout:
> return -ENOMEM;
> }
>
agreed.
>> +}
>> +
>> +static void rdma_dealloc_frmr_q(struct svcxprt_rdma *xprt)
>> +{
>> + struct svc_rdma_fastreg_mr *frmr;
>> +
>> + while (!list_empty(&xprt->sc_frmr_q)) {
>> + frmr = list_entry(xprt->sc_frmr_q.next,
>> + struct svc_rdma_fastreg_mr, frmr_list);
>> + list_del_init(&frmr->frmr_list);
>> + ib_dereg_mr(frmr->mr);
>> + ib_free_fast_reg_page_list(frmr->page_list);
>> + kfree(frmr);
>> + }
>> +}
>> +
>> +struct svc_rdma_fastreg_mr *svc_rdma_get_frmr(struct svcxprt_rdma *rdma)
>> +{
>> + struct svc_rdma_fastreg_mr *frmr = NULL;
>> +
>> + while (1) {
>> + spin_lock_bh(&rdma->sc_frmr_q_lock);
>> + if (!list_empty(&rdma->sc_frmr_q)) {
>> + frmr = list_entry(rdma->sc_frmr_q.next,
>> + struct svc_rdma_fastreg_mr, frmr_list);
>> + list_del_init(&frmr->frmr_list);
>> + }
>> + spin_unlock_bh(&rdma->sc_frmr_q_lock);
>> + if (frmr)
>> + break;
>> + if (rdma_alloc_frmr(rdma))
>> + return ERR_PTR(-ENOMEM);
>
> This business with allocating stuff and adding it to the list, then
> looking for it next time around, seems a little more clever than
> necessary.
Are you sure "clever" is the word you're looking for here :-)
> ... Why not ditch the loop, take the list_add() out of
> rdma_alloc_frmr() and have it return the newly allocated frmr or NULL
> instead, and then do something like this?:
>
> spin_lock_bh(&rdma->sc_frmr_q_lock);
> if (!list_empty(&rdma->sc_frmr_q)) {
> frmr = list_entry(rdma->sc_frmr_q.next, struct
> svc_rdma_fastreg_mr, frmr_list);
> list_del_init(&frmr->frmr_list);
> }
> spin_unlock_bh(&rdma->sc_frmr_q_lock);
> if (!frmr) {
> frmr = rdma_alloc_frmr(rdma);
> if (!frmr)
> return ERR_PTR(-ENOMEM);
> }
> frmr->map_len = 0;
> ...
>
>
> There doesn't seem to be much point to adding to sc_frmr_q just to take
> it right back off again immediately. Maybe I'm missing something.
>
Umm. No. This code used to preallocate some entries and then grow. I
removed that figuring that the frmr cache would very quickly reach it's
natural size auto-magically. This "some other thread scammed my frmr"
loop is a left-over.
Nice clean-up.
>> + }
>> + frmr->map_len = 0;
>> + frmr->page_list_len = 0;
>> +
>> + return frmr;
>> +}
>> +
>> +static void frmr_unmap_dma(struct svcxprt_rdma *xprt,
>> + struct svc_rdma_fastreg_mr *frmr)
>> +{
>> + int page_no;
>> + dprintk("svcrdma:%s: xprt %p page_list_len %d\n",
>> + __func__, xprt, frmr->page_list_len);
>> + for (page_no = 0; page_no < frmr->page_list_len; page_no++) {
>> + dma_addr_t addr = frmr->page_list->page_list[page_no];
>> + dprintk("svcrdma: %08x %llx\n", frmr->mr->lkey, addr);
>
> Are these dprintk's going to be useful for debugging user issues
> remotely, or were they just for your personal use while writing the
> code?
>
> We saw recently that we may already have too many dprintk's for them to
> be useful in production, and the above seem likely to be rather
> frequent.
Agreed. It needs to go.
>
> --b.
>
>> + if (ib_dma_mapping_error(frmr->mr->device, addr))
>> + continue;
>> + atomic_dec(&xprt->sc_dma_used);
>> + ib_dma_unmap_single(frmr->mr->device, addr, PAGE_SIZE,
>> + frmr->direction);
>> + }
>> +}
>> +
>> +void svc_rdma_put_frmr(struct svcxprt_rdma *rdma,
>> + struct svc_rdma_fastreg_mr *frmr)
>> +{
>> + if (frmr) {
>> + frmr_unmap_dma(rdma, frmr);
>> + spin_lock_bh(&rdma->sc_frmr_q_lock);
>> + BUG_ON(!list_empty(&frmr->frmr_list));
>> + list_add(&frmr->frmr_list, &rdma->sc_frmr_q);
>> + spin_unlock_bh(&rdma->sc_frmr_q_lock);
>> + }
>> +}
>> +
>> /*
>> * This is the xpo_recvfrom function for listening endpoints. Its
>> * purpose is to accept incoming connections. The CMA callback handler
>> @@ -961,6 +1073,9 @@ static void __svc_rdma_free(struct work_struct *work)
>> WARN_ON(atomic_read(&rdma->sc_ctxt_used) != 0);
>> WARN_ON(atomic_read(&rdma->sc_dma_used) != 0);
>>
>> + /* De-allocate fastreg mr */
>> + rdma_dealloc_frmr_q(rdma);
>> +
>> /* Destroy the QP if present (not a listener) */
>> if (rdma->sc_qp && !IS_ERR(rdma->sc_qp))
>> ib_destroy_qp(rdma->sc_qp);
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 01/10] svcrdma: Add Fast Reg MR Data Types
2008-09-24 19:11 ` [PATCH 01/10] svcrdma: Add Fast Reg MR Data Types J. Bruce Fields
@ 2008-09-25 14:27 ` Tom Tucker
0 siblings, 0 replies; 28+ messages in thread
From: Tom Tucker @ 2008-09-25 14:27 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
J. Bruce Fields wrote:
> On Tue, Sep 16, 2008 at 06:34:30AM -0500, Tom Tucker wrote:
>> Add data types to track Fast Reg Memory Regions. The core data type is
>> svc_rdma_fastreg_mr that associates a device MR with a host kva and page
>> list. A field is added to the WR context to keep track of the FRMR
>> used to map the local memory for an RPC.
>>
>> An FRMR list and spin lock are added to the transport instance to keep
>> track of all FRMR allocated for the transport. Also added are device
>> capability flags to indicate what the memory registration
>> capabilities are for the underlying device and whether or not fast
>> memory registration is supported.
>>
>> Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
>>
>> ---
>> include/linux/sunrpc/svc_rdma.h | 27 ++++++++++++++++++++++++++-
>> 1 files changed, 26 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
>> index dc05b54..295ebbc 100644
>> --- a/include/linux/sunrpc/svc_rdma.h
>> +++ b/include/linux/sunrpc/svc_rdma.h
>> @@ -72,6 +72,7 @@ extern atomic_t rdma_stat_sq_prod;
>> */
>> struct svc_rdma_op_ctxt {
>> struct svc_rdma_op_ctxt *read_hdr;
>> + struct svc_rdma_fastreg_mr *frmr;
>> int hdr_count;
>> struct xdr_buf arg;
>> struct list_head dto_q;
>> @@ -103,16 +104,35 @@ struct svc_rdma_chunk_sge {
>> int start; /* sge no for this chunk */
>> int count; /* sge count for this chunk */
>> };
>> +struct svc_rdma_fastreg_mr {
>> + struct ib_mr *mr;
>> + void *kva;
>> + struct ib_fast_reg_page_list *page_list;
>> + int page_list_len;
>> + unsigned long access_flags;
>> + unsigned long map_len;
>> + enum dma_data_direction direction;
>> + struct list_head frmr_list;
>> +};
>> struct svc_rdma_req_map {
>> + struct svc_rdma_fastreg_mr *frmr;
>> unsigned long count;
>> union {
>> struct kvec sge[RPCSVC_MAXPAGES];
>> struct svc_rdma_chunk_sge ch[RPCSVC_MAXPAGES];
>> };
>> };
>> -
>> +#define RDMACTXT_F_FAST_UNREG 1
>> #define RDMACTXT_F_LAST_CTXT 2
>>
>> +enum svcrdma_dev_cap { /* device supports: */
>> + SVCRDMA_DEVCAP_FAST_REG = 1, /* - fast mr registration */
>> + SVCRDMA_DEVCAP_READ_W_INV = 2, /* - read w/ invalidate */
>> + SVCRDMA_DEVCAP_LCL_DMA_MR = 4, /* - lcl dma lkey */
>> + SVCRDMA_DEVCAP_LCL_WR_REQ = 8, /* - data sink requires remote */
>> + /* write */
>> +};
>
> Nit: I suppose that works, but I thought the idea of an enum was that it
> enumerated every possible value of the given type? If you're expecting
> to "or" these together I would have expected just some kind of unsigned
> int with some #define's.
>
Agreed.
> --b.
>
>> +
>> struct svcxprt_rdma {
>> struct svc_xprt sc_xprt; /* SVC transport structure */
>> struct rdma_cm_id *sc_cm_id; /* RDMA connection id */
>> @@ -136,6 +156,11 @@ struct svcxprt_rdma {
>> struct ib_cq *sc_rq_cq;
>> struct ib_cq *sc_sq_cq;
>> struct ib_mr *sc_phys_mr; /* MR for server memory */
>> + enum svcrdma_dev_cap sc_dev_caps; /* distilled device caps */
>> + u32 sc_dma_lkey; /* local dma key */
>> + unsigned int sc_frmr_pg_list_len;
>> + struct list_head sc_frmr_q;
>> + spinlock_t sc_frmr_q_lock;
>>
>> spinlock_t sc_lock; /* transport lock */
>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/10] svcrdma: Add FRMR get/put services
2008-09-25 14:25 ` Tom Tucker
@ 2008-09-25 14:44 ` J. Bruce Fields
2008-09-25 20:31 ` Tom Tucker
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-25 14:44 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Thu, Sep 25, 2008 at 09:25:30AM -0500, Tom Tucker wrote:
> J. Bruce Fields wrote:
>> On Tue, Sep 16, 2008 at 06:34:31AM -0500, Tom Tucker wrote:
>>> + }
>>> + frmr->map_len = 0;
>>> + frmr->page_list_len = 0;
>>> +
>>> + return frmr;
>>> +}
>>> +
>>> +static void frmr_unmap_dma(struct svcxprt_rdma *xprt,
>>> + struct svc_rdma_fastreg_mr *frmr)
>>> +{
>>> + int page_no;
>>> + dprintk("svcrdma:%s: xprt %p page_list_len %d\n",
>>> + __func__, xprt, frmr->page_list_len);
>>> + for (page_no = 0; page_no < frmr->page_list_len; page_no++) {
>>> + dma_addr_t addr = frmr->page_list->page_list[page_no];
>>> + dprintk("svcrdma: %08x %llx\n", frmr->mr->lkey, addr);
>>
>> Are these dprintk's going to be useful for debugging user issues
>> remotely, or were they just for your personal use while writing the
>> code?
>>
>> We saw recently that we may already have too many dprintk's for them to
>> be useful in production, and the above seem likely to be rather
>> frequent.
>
> Agreed. It needs to go.
OK! That being the case: I ignored other additions of dprintk's in this
patch set; would you mind scanning through them to see if there's others
that should also go? Stuff to look for, off the top of my head:
- How frequently is a dprintk going to be called?
- Is this dprintk redundant with some other dprintk? (E.g. are
a function and its caller both dprintk'ing the same
information?)
- Is this going to be help debug problems with users in the
field?
--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 02/10] svcrdma: Add FRMR get/put services
2008-09-25 14:44 ` J. Bruce Fields
@ 2008-09-25 20:31 ` Tom Tucker
0 siblings, 0 replies; 28+ messages in thread
From: Tom Tucker @ 2008-09-25 20:31 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
J. Bruce Fields wrote:
> On Thu, Sep 25, 2008 at 09:25:30AM -0500, Tom Tucker wrote:
>> J. Bruce Fields wrote:
>>> On Tue, Sep 16, 2008 at 06:34:31AM -0500, Tom Tucker wrote:
>>>> + }
>>>> + frmr->map_len = 0;
>>>> + frmr->page_list_len = 0;
>>>> +
>>>> + return frmr;
>>>> +}
>>>> +
>>>> +static void frmr_unmap_dma(struct svcxprt_rdma *xprt,
>>>> + struct svc_rdma_fastreg_mr *frmr)
>>>> +{
>>>> + int page_no;
>>>> + dprintk("svcrdma:%s: xprt %p page_list_len %d\n",
>>>> + __func__, xprt, frmr->page_list_len);
>>>> + for (page_no = 0; page_no < frmr->page_list_len; page_no++) {
>>>> + dma_addr_t addr = frmr->page_list->page_list[page_no];
>>>> + dprintk("svcrdma: %08x %llx\n", frmr->mr->lkey, addr);
>>> Are these dprintk's going to be useful for debugging user issues
>>> remotely, or were they just for your personal use while writing the
>>> code?
>>>
>>> We saw recently that we may already have too many dprintk's for them to
>>> be useful in production, and the above seem likely to be rather
>>> frequent.
>> Agreed. It needs to go.
>
> OK! That being the case: I ignored other additions of dprintk's in this
> patch set; would you mind scanning through them to see if there's others
> that should also go? Stuff to look for, off the top of my head:
>
> - How frequently is a dprintk going to be called?
> - Is this dprintk redundant with some other dprintk? (E.g. are
> a function and its caller both dprintk'ing the same
> information?)
> - Is this going to be help debug problems with users in the
> field?
>
Yes, I will remove all pointless chatter :-)
> --b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-25 13:35 ` Tom Tucker
@ 2008-09-26 16:01 ` Talpey, Thomas
[not found] ` <RTPCLUEXC2-PRDGryWt0000003c-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-09-26 23:40 ` J. Bruce Fields
1 sibling, 1 reply; 28+ messages in thread
From: Talpey, Thomas @ 2008-09-26 16:01 UTC (permalink / raw)
To: Tom Tucker; +Cc: J. Bruce Fields, linux-nfs
At 09:35 AM 9/25/2008, Tom Tucker wrote:
>> This explanation is helpful, thanks. It would also be helpful if we
>> could boil down the advice to just a sentence or two for the busy admin.
>> Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
>> rdma on a network where you cannot trust every machine....
>
>
>Would it be better to say, "Do not use RDMA on a network where your
>policy requires a security model stronger than tcp/auth_unix."
No! This would confuse integrity and privacy concerns (the root of the
RDMA attack you describe) with authentication. While it's true there are
different attacks with a different transport, they do not in any way
contravene the protections in the RPC and NFS layers.
In fact, I believe the text is unfairly protraying a vulnerability in iWARP
as to be residing in NFS/RDMA, which is isn't.
While many of today's adapters allow so-called "type 2" RKEYs, the
protocol does not encourage them, and their use introduces these
risks. The risks are avoidable. The IETF RFCs describe these in detail,
for both RDDP and NFS/RPC/RDMA.
Tom.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-25 13:35 ` Tom Tucker
2008-09-26 16:01 ` Talpey, Thomas
@ 2008-09-26 23:40 ` J. Bruce Fields
2008-09-30 3:07 ` Tom Tucker
1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-26 23:40 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs
On Thu, Sep 25, 2008 at 08:35:26AM -0500, Tom Tucker wrote:
> J. Bruce Fields wrote:
>> This explanation is helpful, thanks. It would also be helpful if we
>> could boil down the advice to just a sentence or two for the busy admin.
>> Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
>> rdma on a network where you cannot trust every machine....
>
>
> Would it be better to say, "Do not use RDMA on a network where your
> policy requires a security model stronger than tcp/auth_unix."
I'm not worried about the case where the security provided is roughly
equivalent to that provided by tcp/auth_unix.
I'm worried about the non-"Fast Reg" case where I thought you were
saying that the network could access memory other than that meant to
hold rpc data.
--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
[not found] ` <RTPCLUEXC2-PRDGryWt0000003c-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
@ 2008-09-30 2:59 ` Tom Tucker
0 siblings, 0 replies; 28+ messages in thread
From: Tom Tucker @ 2008-09-30 2:59 UTC (permalink / raw)
To: Talpey, Thomas; +Cc: J. Bruce Fields, linux-nfs
Talpey, Thomas wrote:
> At 09:35 AM 9/25/2008, Tom Tucker wrote:
>>> This explanation is helpful, thanks. It would also be helpful if we
>>> could boil down the advice to just a sentence or two for the busy admin.
>>> Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
>>> rdma on a network where you cannot trust every machine....
>>
>> Would it be better to say, "Do not use RDMA on a network where your
>> policy requires a security model stronger than tcp/auth_unix."
>
> No! This would confuse integrity and privacy concerns (the root of the
> RDMA attack you describe) with authentication. While it's true there are
> different attacks with a different transport, they do not in any way
> contravene the protections in the RPC and NFS layers.
>
> In fact, I believe the text is unfairly protraying a vulnerability in iWARP
> as to be residing in NFS/RDMA, which is isn't.
>
> While many of today's adapters allow so-called "type 2" RKEYs, the
> protocol does not encourage them, and their use introduces these
> risks. The risks are avoidable. The IETF RFCs describe these in detail,
> for both RDDP and NFS/RPC/RDMA.
>
Ok, but I need some text that correctly represents the guidance to the
naive administrator. I think Bruce's goal is a good one, but I thought
his text was only "point in time" relevant.
I'm open to suggestions for specific wording!
Tom
> Tom.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-26 23:40 ` J. Bruce Fields
@ 2008-09-30 3:07 ` Tom Tucker
2008-09-30 18:44 ` J. Bruce Fields
0 siblings, 1 reply; 28+ messages in thread
From: Tom Tucker @ 2008-09-30 3:07 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs, Tom Talpey
J. Bruce Fields wrote:
> On Thu, Sep 25, 2008 at 08:35:26AM -0500, Tom Tucker wrote:
>> J. Bruce Fields wrote:
>>> This explanation is helpful, thanks. It would also be helpful if we
>>> could boil down the advice to just a sentence or two for the busy admin.
>>> Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
>>> rdma on a network where you cannot trust every machine....
>>
>> Would it be better to say, "Do not use RDMA on a network where your
>> policy requires a security model stronger than tcp/auth_unix."
>
> I'm not worried about the case where the security provided is roughly
> equivalent to that provided by tcp/auth_unix.
>
> I'm worried about the non-"Fast Reg" case where I thought you were
> saying that the network could access memory other than that meant to
> hold rpc data.
>
Ok, so maybe we could state the security exposure of ALLPHYSICAL instead
of dwelling on the relative differences between the similar exposures of
tcp/auth_unix vs. fastreg?
I'd also like to get to fastreg as a default at some point.
Tom:
What's your perspective on the lifetime of bounce buffers, memory
windows, and the other strategies in client?
Tom
> --b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-30 3:07 ` Tom Tucker
@ 2008-09-30 18:44 ` J. Bruce Fields
2008-09-30 18:55 ` Tom Tucker
2008-09-30 19:04 ` Talpey, Thomas
0 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-30 18:44 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs, Tom Talpey
On Mon, Sep 29, 2008 at 10:07:25PM -0500, Tom Tucker wrote:
> J. Bruce Fields wrote:
>> On Thu, Sep 25, 2008 at 08:35:26AM -0500, Tom Tucker wrote:
>>> J. Bruce Fields wrote:
>>>> This explanation is helpful, thanks. It would also be helpful if we
>>>> could boil down the advice to just a sentence or two for the busy admin.
>>>> Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
>>>> rdma on a network where you cannot trust every machine....
>>>
>>> Would it be better to say, "Do not use RDMA on a network where your
>>> policy requires a security model stronger than tcp/auth_unix."
>>
>> I'm not worried about the case where the security provided is roughly
>> equivalent to that provided by tcp/auth_unix.
>>
>> I'm worried about the non-"Fast Reg" case where I thought you were
>> saying that the network could access memory other than that meant to
>> hold rpc data.
>>
>
> Ok, so maybe we could state the security exposure of ALLPHYSICAL instead
> of dwelling on the relative differences between the similar exposures of
> tcp/auth_unix vs. fastreg?
Right. It's all interesting, so I wouldn't cut anything out--but you
could put off the details to the end; e.g., start with "in situation
<...>, nfs/rdma provides roughly the same security as nfs over tcp and
auth_unix (see below for more details)".
> I'd also like to get to fastreg as a default at some point.
OK, good.
> What's your perspective on the lifetime of bounce buffers, memory
> windows, and the other strategies in client?
I'm ignorant. Pointer to something else I should read?
I assume there are similar issues on the client?
--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-30 18:44 ` J. Bruce Fields
@ 2008-09-30 18:55 ` Tom Tucker
2008-09-30 18:57 ` J. Bruce Fields
2008-09-30 19:04 ` Talpey, Thomas
1 sibling, 1 reply; 28+ messages in thread
From: Tom Tucker @ 2008-09-30 18:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs, Tom Talpey
J. Bruce Fields wrote:
> On Mon, Sep 29, 2008 at 10:07:25PM -0500, Tom Tucker wrote:
>> J. Bruce Fields wrote:
>>> On Thu, Sep 25, 2008 at 08:35:26AM -0500, Tom Tucker wrote:
>>>> J. Bruce Fields wrote:
>>>>> This explanation is helpful, thanks. It would also be helpful if we
>>>>> could boil down the advice to just a sentence or two for the busy admin.
>>>>> Something like: unless you have card XYZ and kernel 2.6.y, do *not* use
>>>>> rdma on a network where you cannot trust every machine....
>>>> Would it be better to say, "Do not use RDMA on a network where your
>>>> policy requires a security model stronger than tcp/auth_unix."
>>> I'm not worried about the case where the security provided is roughly
>>> equivalent to that provided by tcp/auth_unix.
>>>
>>> I'm worried about the non-"Fast Reg" case where I thought you were
>>> saying that the network could access memory other than that meant to
>>> hold rpc data.
>>>
>> Ok, so maybe we could state the security exposure of ALLPHYSICAL instead
>> of dwelling on the relative differences between the similar exposures of
>> tcp/auth_unix vs. fastreg?
>
> Right. It's all interesting, so I wouldn't cut anything out--but you
> could put off the details to the end; e.g., start with "in situation
> <...>, nfs/rdma provides roughly the same security as nfs over tcp and
> auth_unix (see below for more details)".
Sounds reasonable.
>
>> I'd also like to get to fastreg as a default at some point.
>
> OK, good.
>
>> What's your perspective on the lifetime of bounce buffers, memory
>> windows, and the other strategies in client?
>
> I'm ignorant. Pointer to something else I should read?
>
> I assume there are similar issues on the client?
>
Sorry, that was a question for Tom Talpey. It's a client specific issue
since the server only implements ALLPHYSICAL and FAST REG.
> --b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-30 18:55 ` Tom Tucker
@ 2008-09-30 18:57 ` J. Bruce Fields
2008-09-30 20:17 ` Tom Tucker
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-09-30 18:57 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-nfs, Tom Talpey
On Tue, Sep 30, 2008 at 01:55:45PM -0500, Tom Tucker wrote:
> J. Bruce Fields wrote:
>> On Mon, Sep 29, 2008 at 10:07:25PM -0500, Tom Tucker wrote:
>>> What's your perspective on the lifetime of bounce buffers, memory
>>> windows, and the other strategies in client?
>>
>> I'm ignorant. Pointer to something else I should read?
>>
>> I assume there are similar issues on the client?
>>
>
> Sorry, that was a question for Tom Talpey.
Phew!
> It's a client specific issue since the server only implements
> ALLPHYSICAL and FAST REG.
OK.--b.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-30 18:44 ` J. Bruce Fields
2008-09-30 18:55 ` Tom Tucker
@ 2008-09-30 19:04 ` Talpey, Thomas
[not found] ` <RTPCLUEXC2-PRDgFrYI00000094-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
1 sibling, 1 reply; 28+ messages in thread
From: Talpey, Thomas @ 2008-09-30 19:04 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
At 02:44 PM 9/30/2008, J. Bruce Fields wrote:
>On Mon, Sep 29, 2008 at 10:07:25PM -0500, Tom Tucker wrote:
>> What's your perspective on the lifetime of bounce buffers, memory
>> windows, and the other strategies in client?
>
>I'm ignorant. Pointer to something else I should read?
>
>I assume there are similar issues on the client?
Tom's asking about the client memory registration options and
whether they should all remain in the client code going forward.
There are different strategies in there, depending on what the
hardware supports, how well it supports it, and how you want
to run it all.
The multiple strategies stem from the early days when no two
adapters did quite the same thing. That said, at least one of
them is no longer useful - no adapters support windows today,
since the demise of Ammasso, although the kernel API to drive
them is still there in the OFA stack below.
I think it's possible that some of the other modes can collapse, but
not just now. There's a lot of older hardware out there, and newer
hardware may appear to use the old stuff too.
Another concern I have, frankly, is interoperability. If we collapse
the modes, then the temptation to assume both ends have such-
and-such a capability increase. If one side tries some RDMA operation
that is only supported properly by a certain adapter, it's hard to detect
that in a monoculture. Having the option to switch modes can help avert
this, by testing for it.
In any case, if the current code doesn't work, it's a bug. Certainly
bouncebuffers (non-RDMA mode) should work perfectly and I plan to
check it asap.
Tom.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-30 18:57 ` J. Bruce Fields
@ 2008-09-30 20:17 ` Tom Tucker
2008-10-01 16:17 ` J. Bruce Fields
0 siblings, 1 reply; 28+ messages in thread
From: Tom Tucker @ 2008-09-30 20:17 UTC (permalink / raw)
To: J. Bruce Fields, Tom Talpey; +Cc: linux-nfs
Bruce/Tom:
Below is an updated Documentation patch. Please take a look and tell me
what you think.
I've made all the changes to the code per Bruce's suggestions plus added
a patch to display the mem. reg. strategy used at mount time.
Please tweak the doc patch as needed and then I'll repost the whole lot.
Thanks,
Tom
From: Tom Tucker <tom@opengridcomputing.com>
Date: Tue, 30 Sep 2008 14:41:30 -0500
Subject: [PATCH 10/11] svcrdma: Documentation update for the FastReg
memory model
This patch adds security related documentation to the nfs-rdma.txt file
that describes the memory registration model, the potential security
exploits, and compares these exploits to a similar threat when using TCP
as the transport.
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
---
Documentation/filesystems/nfs-rdma.txt | 84
++++++++++++++++++++++++++++++++
1 files changed, 84 insertions(+), 0 deletions(-)
diff --git a/Documentation/filesystems/nfs-rdma.txt
b/Documentation/filesystems/nfs-rdma.txt
index 44bd766..266a57b 100644
--- a/Documentation/filesystems/nfs-rdma.txt
+++ b/Documentation/filesystems/nfs-rdma.txt
@@ -269,3 +269,87 @@ NFS/RDMA Setup
the "proto" field for the given mount.
Congratulations! You're using NFS/RDMA!
+
+Security
+--------
+
+ NFSRDMA exploits the RDMA capabilities of the IB and iWARP
+ transports to more efficiently exchange RPC data between the client
+ and the server. This section discusses the security implications of
+ the exchange of memory information on the wire when the wire may be
+ monitorable by an untrusted application. The identifier that
+ encapsulates this memory information is called an RKEY.
+
+ A principal exploit is that a node on the local network could snoop
+ RDMA packets containing RKEY and then forge a packet with this RKEY
+ to write and/or read the memory of the peer to which the RKEY
+ referred.
+
+ If the underlying RDMA device is capable of Fast Memory
+ Registration, then NFSRDMA is no less secure than TCP with
+ auth_unix. However, if the device does not support Fast Memory
+ Registration, then such a node could write anywhere in the server's
+ memory using the method above. At mount time, the server sends a
+ string to the message log to indicate whether or not Fast Memory
+ Registration is being used. If Fast Memory Registration is being
+ used, the string
+
+ "svcrdma: Using Fast Memory Registration"
+
+ is logged, otherwise,
+
+ "svcrdma: Using a Global DMA MR"
+
+ will be logged.
+
+ The sections below provide additional information on this issue.
+
+ The NFSRDMA protocol is defined such that a) only the server
+ initiates RDMA, and b) only the client's memory is exposed via
+ RKEY. This is why the server reads to fetch RPC data from the client
+ even though it would be more efficient for the client to write the
+ data to the server's memory. This design goal is not entirely
+ realized with iWARP, however, because the RKEY (called an STag on
+ iWARP) for the data sink of an RDMA_READ is actually placed on the
+ wire, and this RKEY has Remote Write permission. This means that the
+ server's memory is exposed by virtue of having placed the RKEY for
+ its local memory on the wire in order to receive the result of the
+ RDMA_READ.
+
+ By contrast, IB uses an opaque transaction ID# to associate the
+ READ_RPL with the READ_REQ and the data sink of an READ_REQ does not
+ require remote access. That said, the byzantine node in question
+ could forge a packet with this transaction ID and corrupt the target
+ memory, however, the scope of the exploit is bounded to the lifetime
+ of this single RDMA_READ request and to the memory mapped by the
+ data sink of the READ_REQ.
+
+ The newer RDMA adapters (both iWARP and IB) support "Fast Memory
+ Registration". This capability allows memory to be quickly
+ registered (i.e. made available for remote access) and de-registered
+ by submitting WR on the SQ. These capabilities provide a mechanism
+ to reduce the exposure discused above by limiting the scope of the
+ exploit. The idea is to create an RKEY that only maps the single RPC
+ and whose effective lifetime is only the exchange of this single
+ RPC. This is the default memory model that is employed by the server
+ when supported by the adapter and by the client when the
+ rdma_memreg_strategy is set to 6. Note that the client and server
+ may use different memory registration strategies, however,
+ performance is better when both the client and server use the
+ FastReg memory registration strategy.
+
+ This approach has two benefits, a) it restricts the domain of the
+ exploit to the memory of a single RPC, and b) it limits the duration
+ of the exploit to the time it takes to satisfy the RDMA_READ.
+
+ It is arguable that a one-shot STag/RKEY is no less secure than RPC
+ on the TCP transport. Consider that the exact same byzantine
+ application could more easily corrupt TCP RPC payload by simply
+ forging a packet with the correct TCP sequence number -- in fact
+ it's easier than the RDMA exploit because the RDMA exploit requires
+ that you correctly forge both the TCP packet and the RDMA
+ payload. In addition the duration of the TCP exploit is the lifetime
+ of the connection, not the lifetime of a single WR/RPC data transfer.
+
+ RDMA on IB or iWARP using Fast Reg is no less secure than TCP.
+
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-09-30 20:17 ` Tom Tucker
@ 2008-10-01 16:17 ` J. Bruce Fields
2008-10-02 0:38 ` Tom Tucker
0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2008-10-01 16:17 UTC (permalink / raw)
To: Tom Tucker; +Cc: Tom Talpey, linux-nfs
Thanks, I think this is much more helpful.
On Tue, Sep 30, 2008 at 03:17:21PM -0500, Tom Tucker wrote:
> +Security
> +--------
> +
> + NFSRDMA exploits the RDMA capabilities of the IB and iWARP
> + transports to more efficiently exchange RPC data between the client
> + and the server. This section discusses the security implications of
> + the exchange of memory information on the wire when the wire may be
> + monitorable by an untrusted application. The identifier that
> + encapsulates this memory information is called an RKEY.
> +
> + A principal exploit is that a node on the local network could snoop
> + RDMA packets containing RKEY and then forge a packet with this RKEY
> + to write and/or read the memory of the peer to which the RKEY
> + referred.
> +
> + If the underlying RDMA device is capable of Fast Memory
> + Registration, then NFSRDMA is no less secure than TCP with
> + auth_unix. However, if the device does not support Fast Memory
> + Registration, then such a node could write anywhere in the server's
> + memory using the method above. At mount time, the server sends a
The server doesn't really know about mounts, especially not at this
level, so I assume you mean either server start time or client connect
time?
> + string to the message log to indicate whether or not Fast Memory
> + Registration is being used. If Fast Memory Registration is being
> + used, the string
> +
> + "svcrdma: Using Fast Memory Registration"
> +
> + is logged, otherwise,
> +
> + "svcrdma: Using a Global DMA MR"
> +
> + will be logged.
It'd be nicer to have something that can be queried by a program--a file
in proc or nfsd, for example--without having to grep through log files.
(Or is it possible the drivers already export enough information under
sysfs someplace to figure this out with a simple script?)
Or maybe the non-fast registration stuff should be under a separate
configuration option entirely? Distro's could eventually enable only
the safer configurations and people doing testing could build their own
kernels with the rest enabled.
My initial impulse is to be a bit scared of the non-fast-registration
case, but maybe I don't understand how this hardware is deployed.
--b.
> +
> + The sections below provide additional information on this issue.
> +
> + The NFSRDMA protocol is defined such that a) only the server
> + initiates RDMA, and b) only the client's memory is exposed via
> + RKEY. This is why the server reads to fetch RPC data from the client
> + even though it would be more efficient for the client to write the
> + data to the server's memory. This design goal is not entirely
> + realized with iWARP, however, because the RKEY (called an STag on
> + iWARP) for the data sink of an RDMA_READ is actually placed on the
> + wire, and this RKEY has Remote Write permission. This means that the
> + server's memory is exposed by virtue of having placed the RKEY for
> + its local memory on the wire in order to receive the result of the
> + RDMA_READ.
> +
> + By contrast, IB uses an opaque transaction ID# to associate the
> + READ_RPL with the READ_REQ and the data sink of an READ_REQ does not
> + require remote access. That said, the byzantine node in question
> + could forge a packet with this transaction ID and corrupt the target
> + memory, however, the scope of the exploit is bounded to the lifetime
> + of this single RDMA_READ request and to the memory mapped by the
> + data sink of the READ_REQ.
> +
> + The newer RDMA adapters (both iWARP and IB) support "Fast Memory
> + Registration". This capability allows memory to be quickly
> + registered (i.e. made available for remote access) and de-registered
> + by submitting WR on the SQ. These capabilities provide a mechanism
> + to reduce the exposure discused above by limiting the scope of the
> + exploit. The idea is to create an RKEY that only maps the single RPC
> + and whose effective lifetime is only the exchange of this single
> + RPC. This is the default memory model that is employed by the server
> + when supported by the adapter and by the client when the
> + rdma_memreg_strategy is set to 6. Note that the client and server
> + may use different memory registration strategies, however,
> + performance is better when both the client and server use the
> + FastReg memory registration strategy.
> +
> + This approach has two benefits, a) it restricts the domain of the
> + exploit to the memory of a single RPC, and b) it limits the duration
> + of the exploit to the time it takes to satisfy the RDMA_READ.
> +
> + It is arguable that a one-shot STag/RKEY is no less secure than RPC
> + on the TCP transport. Consider that the exact same byzantine
> + application could more easily corrupt TCP RPC payload by simply
> + forging a packet with the correct TCP sequence number -- in fact
> + it's easier than the RDMA exploit because the RDMA exploit requires
> + that you correctly forge both the TCP packet and the RDMA
> + payload. In addition the duration of the TCP exploit is the lifetime
> + of the connection, not the lifetime of a single WR/RPC data transfer.
> +
> + RDMA on IB or iWARP using Fast Reg is no less secure than TCP.
> +
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
[not found] ` <RTPCLUEXC2-PRDgFrYI00000094-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
@ 2008-10-01 18:26 ` J. Bruce Fields
2008-10-01 19:18 ` Talpey, Thomas
[not found] ` <RTPCLUEXC2-PRDVjCRG000000bb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
0 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2008-10-01 18:26 UTC (permalink / raw)
To: Talpey, Thomas; +Cc: linux-nfs
On Tue, Sep 30, 2008 at 03:04:26PM -0400, Talpey, Thomas wrote:
> At 02:44 PM 9/30/2008, J. Bruce Fields wrote:
> >On Mon, Sep 29, 2008 at 10:07:25PM -0500, Tom Tucker wrote:
> >> What's your perspective on the lifetime of bounce buffers, memory
> >> windows, and the other strategies in client?
> >
> >I'm ignorant. Pointer to something else I should read?
> >
> >I assume there are similar issues on the client?
>
> Tom's asking about the client memory registration options and
> whether they should all remain in the client code going forward.
> There are different strategies in there, depending on what the
> hardware supports, how well it supports it, and how you want
> to run it all.
Do any of them allow reading or writing memory other than that holding
rpc data?
--b.
>
> The multiple strategies stem from the early days when no two
> adapters did quite the same thing. That said, at least one of
> them is no longer useful - no adapters support windows today,
> since the demise of Ammasso, although the kernel API to drive
> them is still there in the OFA stack below.
>
> I think it's possible that some of the other modes can collapse, but
> not just now. There's a lot of older hardware out there, and newer
> hardware may appear to use the old stuff too.
>
> Another concern I have, frankly, is interoperability. If we collapse
> the modes, then the temptation to assume both ends have such-
> and-such a capability increase. If one side tries some RDMA operation
> that is only supported properly by a certain adapter, it's hard to detect
> that in a monoculture. Having the option to switch modes can help avert
> this, by testing for it.
>
> In any case, if the current code doesn't work, it's a bug. Certainly
> bouncebuffers (non-RDMA mode) should work perfectly and I plan to
> check it asap.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-10-01 18:26 ` J. Bruce Fields
@ 2008-10-01 19:18 ` Talpey, Thomas
[not found] ` <RTPCLUEXC2-PRDVjCRG000000bb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
1 sibling, 0 replies; 28+ messages in thread
From: Talpey, Thomas @ 2008-10-01 19:18 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
At 02:26 PM 10/1/2008, J. Bruce Fields wrote:
>On Tue, Sep 30, 2008 at 03:04:26PM -0400, Talpey, Thomas wrote:
>> At 02:44 PM 9/30/2008, J. Bruce Fields wrote:
>> >On Mon, Sep 29, 2008 at 10:07:25PM -0500, Tom Tucker wrote:
>> >> What's your perspective on the lifetime of bounce buffers, memory
>> >> windows, and the other strategies in client?
>> >
>> >I'm ignorant. Pointer to something else I should read?
>> >
>> >I assume there are similar issues on the client?
>>
>> Tom's asking about the client memory registration options and
>> whether they should all remain in the client code going forward.
>> There are different strategies in there, depending on what the
>> hardware supports, how well it supports it, and how you want
>> to run it all.
>
>Do any of them allow reading or writing memory other than that holding
>rpc data?
Some of them, and the memory exposure varies. The only modes which
expose additional memory are FMR's, which expose data on the same
page as RPC data, and ALLPHYSICAL, which, well, exposes everything.
The BOUNCEBUFFERS and REGISTER modes protect memory fiercely,
but are relatively expensive. MEMWINDOWS are safe but not very fast,
and only supported by obsolete devices. FRMR's are both safe and fast
but a) still have a cost and b) are only supported by one device at present.
Long term, we all hope FRMRs will be the default, logical, and only choice.
I still owe feedback on Tom's new text... it's coming soon.
Tom.
>
>--b.
>
>>
>> The multiple strategies stem from the early days when no two
>> adapters did quite the same thing. That said, at least one of
>> them is no longer useful - no adapters support windows today,
>> since the demise of Ammasso, although the kernel API to drive
>> them is still there in the OFA stack below.
>>
>> I think it's possible that some of the other modes can collapse, but
>> not just now. There's a lot of older hardware out there, and newer
>> hardware may appear to use the old stuff too.
>>
>> Another concern I have, frankly, is interoperability. If we collapse
>> the modes, then the temptation to assume both ends have such-
>> and-such a capability increase. If one side tries some RDMA operation
>> that is only supported properly by a certain adapter, it's hard to detect
>> that in a monoculture. Having the option to switch modes can help avert
>> this, by testing for it.
>>
>> In any case, if the current code doesn't work, it's a bug. Certainly
>> bouncebuffers (non-RDMA mode) should work perfectly and I plan to
>> check it asap.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
[not found] ` <RTPCLUEXC2-PRDVjCRG000000bb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
@ 2008-10-01 19:23 ` Talpey, Thomas
0 siblings, 0 replies; 28+ messages in thread
From: Talpey, Thomas @ 2008-10-01 19:23 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
At 03:18 PM 10/1/2008, Talpey, Thomas wrote:
>At 02:26 PM 10/1/2008, J. Bruce Fields wrote:
>>On Tue, Sep 30, 2008 at 03:04:26PM -0400, Talpey, Thomas wrote:
>>> At 02:44 PM 9/30/2008, J. Bruce Fields wrote:
>>> >On Mon, Sep 29, 2008 at 10:07:25PM -0500, Tom Tucker wrote:
>>> >> What's your perspective on the lifetime of bounce buffers, memory
>>> >> windows, and the other strategies in client?
>>> >
>>> >I'm ignorant. Pointer to something else I should read?
>>> >
>>> >I assume there are similar issues on the client?
>>>
>>> Tom's asking about the client memory registration options and
>>> whether they should all remain in the client code going forward.
>>> There are different strategies in there, depending on what the
>>> hardware supports, how well it supports it, and how you want
>>> to run it all.
>>
>>Do any of them allow reading or writing memory other than that holding
>>rpc data?
BTW, there's a hopefully helpful slide on page 20 of the following presentation
which I gave back in April to the RDMA developers:
<http://openfabrics.org/archives/spring2008sonoma/Tuesday/NFS-RDMA-OFA-08.pdf>
FRMR's are, in part, a result of that request. Certainly they fulfill it!
Tom.
>
>Some of them, and the memory exposure varies. The only modes which
>expose additional memory are FMR's, which expose data on the same
>page as RPC data, and ALLPHYSICAL, which, well, exposes everything.
>
>The BOUNCEBUFFERS and REGISTER modes protect memory fiercely,
>but are relatively expensive. MEMWINDOWS are safe but not very fast,
>and only supported by obsolete devices. FRMR's are both safe and fast
> but a) still have a cost and b) are only supported by one device at present.
>
>Long term, we all hope FRMRs will be the default, logical, and only choice.
>
>I still owe feedback on Tom's new text... it's coming soon.
>
>Tom.
>
>>
>>--b.
>>
>>>
>>> The multiple strategies stem from the early days when no two
>>> adapters did quite the same thing. That said, at least one of
>>> them is no longer useful - no adapters support windows today,
>>> since the demise of Ammasso, although the kernel API to drive
>>> them is still there in the OFA stack below.
>>>
>>> I think it's possible that some of the other modes can collapse, but
>>> not just now. There's a lot of older hardware out there, and newer
>>> hardware may appear to use the old stuff too.
>>>
>>> Another concern I have, frankly, is interoperability. If we collapse
>>> the modes, then the temptation to assume both ends have such-
>>> and-such a capability increase. If one side tries some RDMA operation
>>> that is only supported properly by a certain adapter, it's hard to detect
>>> that in a monoculture. Having the option to switch modes can help avert
>>> this, by testing for it.
>>>
>>> In any case, if the current code doesn't work, it's a bug. Certainly
>>> bouncebuffers (non-RDMA mode) should work perfectly and I plan to
>>> check it asap.
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model
2008-10-01 16:17 ` J. Bruce Fields
@ 2008-10-02 0:38 ` Tom Tucker
0 siblings, 0 replies; 28+ messages in thread
From: Tom Tucker @ 2008-10-02 0:38 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Tom Talpey, linux-nfs
J. Bruce Fields wrote:
> Thanks, I think this is much more helpful.
>
> On Tue, Sep 30, 2008 at 03:17:21PM -0500, Tom Tucker wrote:
>> +Security
>> +--------
>> +
>> + NFSRDMA exploits the RDMA capabilities of the IB and iWARP
>> + transports to more efficiently exchange RPC data between the client
>> + and the server. This section discusses the security implications of
>> + the exchange of memory information on the wire when the wire may be
>> + monitorable by an untrusted application. The identifier that
>> + encapsulates this memory information is called an RKEY.
>> +
>> + A principal exploit is that a node on the local network could snoop
>> + RDMA packets containing RKEY and then forge a packet with this RKEY
>> + to write and/or read the memory of the peer to which the RKEY
>> + referred.
>> +
>> + If the underlying RDMA device is capable of Fast Memory
>> + Registration, then NFSRDMA is no less secure than TCP with
>> + auth_unix. However, if the device does not support Fast Memory
>> + Registration, then such a node could write anywhere in the server's
>> + memory using the method above. At mount time, the server sends a
>
> The server doesn't really know about mounts, especially not at this
> level, so I assume you mean either server start time or client connect
> time?
Right, client connect time, I'll fix. Thanks.
>
>> + string to the message log to indicate whether or not Fast Memory
>> + Registration is being used. If Fast Memory Registration is being
>> + used, the string
>> +
>> + "svcrdma: Using Fast Memory Registration"
>> +
>> + is logged, otherwise,
>> +
>> + "svcrdma: Using a Global DMA MR"
>> +
>> + will be logged.
>
> It'd be nicer to have something that can be queried by a program--a file
> in proc or nfsd, for example--without having to grep through log files.
> (Or is it possible the drivers already export enough information under
> sysfs someplace to figure this out with a simple script?)
Yes, it's gross. But I was trying to keep it simple for the first go-round and
since it is conceivable that you have two adapters, one that supports FRMR and
the other doesn't, you would need a proc file per adapter. All my systems have
both iWARP and IB adapters in them. So half my connections are DMA MR and the
other FRMR.
>
> Or maybe the non-fast registration stuff should be under a separate
> configuration option entirely? Distro's could eventually enable only
> the safer configurations and people doing testing could build their own
> kernels with the rest enabled.
Perhaps, or maybe a module option that specifically disables DMA_MR. Also
note that with IB the DMA MR is RKEY is not put on the wire so I think I
need to qualify the warning somewhat.
>
> My initial impulse is to be a bit scared of the non-fast-registration
> case, but maybe I don't understand how this hardware is deployed.
>
In practice, I think the exposure is real, but somewhat academic.
Obviously as this sees wider adoption the likelihood that this could be
deployed on a network with untrusted hosts grows significantly. Today
I don't believe that's the case.
I would lean towards the module option and a perhaps a Kconfig option that
allows you to tweak the default. I also think the policy should be transport
dependent. IOW, DMA MR is OK for IB, but verboten for iWARP.
Thanks for the feedback,
Tom
> --b.
>
>> +
>> + The sections below provide additional information on this issue.
>> +
>> + The NFSRDMA protocol is defined such that a) only the server
>> + initiates RDMA, and b) only the client's memory is exposed via
>> + RKEY. This is why the server reads to fetch RPC data from the client
>> + even though it would be more efficient for the client to write the
>> + data to the server's memory. This design goal is not entirely
>> + realized with iWARP, however, because the RKEY (called an STag on
>> + iWARP) for the data sink of an RDMA_READ is actually placed on the
>> + wire, and this RKEY has Remote Write permission. This means that the
>> + server's memory is exposed by virtue of having placed the RKEY for
>> + its local memory on the wire in order to receive the result of the
>> + RDMA_READ.
>> +
>> + By contrast, IB uses an opaque transaction ID# to associate the
>> + READ_RPL with the READ_REQ and the data sink of an READ_REQ does not
>> + require remote access. That said, the byzantine node in question
>> + could forge a packet with this transaction ID and corrupt the target
>> + memory, however, the scope of the exploit is bounded to the lifetime
>> + of this single RDMA_READ request and to the memory mapped by the
>> + data sink of the READ_REQ.
>> +
>> + The newer RDMA adapters (both iWARP and IB) support "Fast Memory
>> + Registration". This capability allows memory to be quickly
>> + registered (i.e. made available for remote access) and de-registered
>> + by submitting WR on the SQ. These capabilities provide a mechanism
>> + to reduce the exposure discused above by limiting the scope of the
>> + exploit. The idea is to create an RKEY that only maps the single RPC
>> + and whose effective lifetime is only the exchange of this single
>> + RPC. This is the default memory model that is employed by the server
>> + when supported by the adapter and by the client when the
>> + rdma_memreg_strategy is set to 6. Note that the client and server
>> + may use different memory registration strategies, however,
>> + performance is better when both the client and server use the
>> + FastReg memory registration strategy.
>> +
>> + This approach has two benefits, a) it restricts the domain of the
>> + exploit to the memory of a single RPC, and b) it limits the duration
>> + of the exploit to the time it takes to satisfy the RDMA_READ.
>> +
>> + It is arguable that a one-shot STag/RKEY is no less secure than RPC
>> + on the TCP transport. Consider that the exact same byzantine
>> + application could more easily corrupt TCP RPC payload by simply
>> + forging a packet with the correct TCP sequence number -- in fact
>> + it's easier than the RDMA exploit because the RDMA exploit requires
>> + that you correctly forge both the TCP packet and the RDMA
>> + payload. In addition the duration of the TCP exploit is the lifetime
>> + of the connection, not the lifetime of a single WR/RPC data transfer.
>> +
>> + RDMA on IB or iWARP using Fast Reg is no less secure than TCP.
>> +
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2008-10-02 0:38 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1221564879-85046-1-git-send-email-tom@opengridcomputing.com>
[not found] ` <1221564879-85046-2-git-send-email-tom@opengridcomputing.com>
2008-09-24 19:11 ` [PATCH 01/10] svcrdma: Add Fast Reg MR Data Types J. Bruce Fields
2008-09-25 14:27 ` Tom Tucker
[not found] ` <1221564879-85046-3-git-send-email-tom@opengridcomputing.com>
2008-09-24 19:45 ` [PATCH 02/10] svcrdma: Add FRMR get/put services J. Bruce Fields
2008-09-25 14:25 ` Tom Tucker
2008-09-25 14:44 ` J. Bruce Fields
2008-09-25 20:31 ` Tom Tucker
[not found] ` <1221564879-85046-4-git-send-email-tom@opengridcomputing.com>
2008-09-24 20:10 ` [PATCH 03/10] svcrdma: Query device for Fast Reg support during connection setup J. Bruce Fields
2008-09-25 14:08 ` Tom Tucker
[not found] ` <1221564879-85046-5-git-send-email-tom@opengridcomputing.com>
2008-09-24 20:25 ` [PATCH 04/10] svcrdma: Add a service to register a Fast Reg MR with the device J. Bruce Fields
2008-09-25 13:31 ` Tom Tucker
[not found] ` <1221564879-85046-6-git-send-email-tom@opengridcomputing.com>
2008-09-24 20:31 ` [PATCH 05/10] svcrdma: Modify post recv path to use local dma key J. Bruce Fields
2008-09-25 13:36 ` Tom Tucker
[not found] ` <1221564879-85046-7-git-send-email-tom@opengridcomputing.com>
[not found] ` <1221564879-85046-8-git-send-email-tom@opengridcomputing.com>
[not found] ` <1221564879-85046-9-git-send-email-tom@opengridcomputing.com>
[not found] ` <1221564879-85046-10-git-send-email-tom@opengridcomputing.com>
[not found] ` <1221564879-85046-11-git-send-email-tom@opengridcomputing.com>
2008-09-24 21:21 ` [PATCH 10/10] svcrdma: Documentation update for the FastReg memory model J. Bruce Fields
2008-09-25 13:35 ` Tom Tucker
2008-09-26 16:01 ` Talpey, Thomas
[not found] ` <RTPCLUEXC2-PRDGryWt0000003c-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-09-30 2:59 ` Tom Tucker
2008-09-26 23:40 ` J. Bruce Fields
2008-09-30 3:07 ` Tom Tucker
2008-09-30 18:44 ` J. Bruce Fields
2008-09-30 18:55 ` Tom Tucker
2008-09-30 18:57 ` J. Bruce Fields
2008-09-30 20:17 ` Tom Tucker
2008-10-01 16:17 ` J. Bruce Fields
2008-10-02 0:38 ` Tom Tucker
2008-09-30 19:04 ` Talpey, Thomas
[not found] ` <RTPCLUEXC2-PRDgFrYI00000094-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-01 18:26 ` J. Bruce Fields
2008-10-01 19:18 ` Talpey, Thomas
[not found] ` <RTPCLUEXC2-PRDVjCRG000000bb-rtwIt2gI0FxT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2008-10-01 19:23 ` Talpey, Thomas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.