From: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v1 05/14] svcrdma: Introduce local rdma_rw API helpers
Date: Wed, 22 Mar 2017 16:17:38 +0200 [thread overview]
Message-ID: <cfa49433-ab26-d2f0-27d4-2a96ff0adaba@grimberg.me> (raw)
In-Reply-To: <20170316155306.4482.68041.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
> The plan is to replace the local bespoke code that constructs and
> posts RDMA Read and Write Work Requests with calls to the rdma_rw
> API. This shares code with other RDMA-enabled ULPs that manages the
> gory details of buffer registration and posting Work Requests.
>
> Some design notes:
>
> o svc_xprt reference counting is modified, since one rdma_rw_ctx
> generates one completion, no matter how many Write WRs are
> posted. To accommodate the new reference counting scheme, a new
> version of svc_rdma_send() is introduced.
>
> o The structure of RPC-over-RDMA transport headers is flexible,
> allowing multiple segments per Reply with arbitrary alignment.
> Thus I did not take the further step of chaining Write WRs with
> the Send WR containing the RPC Reply message. The Write and Send
> WRs continue to be built by separate pieces of code.
>
> o The current code builds the transport header as it is construct-
> ing Write WRs. I've replaced that with marshaling of transport
> header data items in a separate step. This is because the exact
> structure of client-provided segments may not align with the
> components of the server's reply xdr_buf, or the pages in the
> page list. Thus parts of each client-provided segment may be
> written at different points in the send path.
>
> o Since the Write list and Reply chunk marshaling code is being
> replaced, I took the opportunity to replace some of the C
> structure-based XDR encoding code with more portable code that
> instead uses pointer arithmetic.
>
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
To be honest its difficult to review this patch, but its probably
difficult to split it too...
> +
> +/* One Write chunk is copied from Call transport header to Reply
> + * transport header. Each segment's length field is updated to
> + * reflect number of bytes consumed in the segment.
> + *
> + * Returns number of segments in this chunk.
> + */
> +static unsigned int xdr_encode_write_chunk(__be32 *dst, __be32 *src,
> + unsigned int remaining)
Is this only for data-in-reply (send operation)? I don't see why you
would need to modify that for RDMA operations.
Perhaps I'd try to split the data-in-reply code from the actual rdma
conversion. It might be helpful to comprehend.
> +{
> + unsigned int i, nsegs;
> + u32 seg_len;
> +
> + /* Write list discriminator */
> + *dst++ = *src++;
I had to actually run a test program to understand the precedence
here so parenthesis would've helped :)
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> new file mode 100644
> index 0000000..1e76227
> --- /dev/null
> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> @@ -0,0 +1,785 @@
> +/*
> + * Copyright (c) 2016 Oracle. All rights reserved.
> + *
> + * Use the core R/W API to move RPC-over-RDMA Read and Write chunks.
> + */
> +
> +#include <linux/sunrpc/rpc_rdma.h>
> +#include <linux/sunrpc/svc_rdma.h>
> +#include <linux/sunrpc/debug.h>
> +
> +#include <rdma/rw.h>
> +
> +#define RPCDBG_FACILITY RPCDBG_SVCXPRT
> +
> +/* Each R/W context contains state for one chain of RDMA Read or
> + * Write Work Requests (one RDMA segment to be read from or written
> + * back to the client).
> + *
> + * Each WR chain handles a single contiguous server-side buffer,
> + * because some registration modes (eg. FRWR) do not support a
> + * discontiguous scatterlist.
> + *
> + * Each WR chain handles only one R_key. Each RPC-over-RDMA segment
> + * from a client may contain a unique R_key, so each WR chain moves
> + * one segment (or less) at a time.
> + *
> + * The scatterlist makes this data structure just over 8KB in size
> + * on 4KB-page platforms. As the size of this structure increases
> + * past one page, it becomes more likely that allocating one of these
> + * will fail. Therefore, these contexts are created on demand, but
> + * cached and reused until the controlling svcxprt_rdma is destroyed.
> + */
> +struct svc_rdma_rw_ctxt {
> + struct list_head rw_list;
> + struct ib_cqe rw_cqe;
> + struct svcxprt_rdma *rw_rdma;
> + int rw_nents;
> + int rw_wrcount;
> + enum dma_data_direction rw_dir;
> + struct svc_rdma_op_ctxt *rw_readctxt;
> + struct rdma_rw_ctx rw_ctx;
> + struct scatterlist rw_sg[RPCSVC_MAXPAGES];
Have you considered using sg_table with sg_alloc_table_chained?
See lib/sg_pool.c and nvme-rdma as a consumer.
> +};
> +
> +static struct svc_rdma_rw_ctxt *
> +svc_rdma_get_rw_ctxt(struct svcxprt_rdma *rdma)
> +{
> + struct svc_rdma_rw_ctxt *ctxt;
> +
> + svc_xprt_get(&rdma->sc_xprt);
> +
> + spin_lock(&rdma->sc_rw_ctxt_lock);
> + if (list_empty(&rdma->sc_rw_ctxts))
> + goto out_empty;
> +
> + ctxt = list_first_entry(&rdma->sc_rw_ctxts,
> + struct svc_rdma_rw_ctxt, rw_list);
> + list_del_init(&ctxt->rw_list);
> + spin_unlock(&rdma->sc_rw_ctxt_lock);
> +
> +out:
> + ctxt->rw_dir = DMA_NONE;
> + return ctxt;
> +
> +out_empty:
> + spin_unlock(&rdma->sc_rw_ctxt_lock);
> +
> + ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
> + if (!ctxt) {
> + svc_xprt_put(&rdma->sc_xprt);
> + return NULL;
> + }
> +
> + ctxt->rw_rdma = rdma;
> + INIT_LIST_HEAD(&ctxt->rw_list);
> + sg_init_table(ctxt->rw_sg, ARRAY_SIZE(ctxt->rw_sg));
> + goto out;
> +}
> +
> +static void svc_rdma_put_rw_ctxt(struct svc_rdma_rw_ctxt *ctxt)
> +{
> + struct svcxprt_rdma *rdma = ctxt->rw_rdma;
> +
> + if (ctxt->rw_dir != DMA_NONE)
> + rdma_rw_ctx_destroy(&ctxt->rw_ctx, rdma->sc_qp,
> + rdma->sc_port_num,
> + ctxt->rw_sg, ctxt->rw_nents,
> + ctxt->rw_dir);
> +
its a bit odd to see put_rw_ctxt that also destroys the context
which isn't exactly pairs with get_rw_ctxt.
Maybe it'd be useful to explicitly do that outside the put.
> +/**
> + * svc_rdma_destroy_rw_ctxts - Free write contexts
> + * @rdma: xprt about to be destroyed
> + *
> + */
> +void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma)
> +{
> + struct svc_rdma_rw_ctxt *ctxt;
> +
> + while (!list_empty(&rdma->sc_rw_ctxts)) {
> + ctxt = list_first_entry(&rdma->sc_rw_ctxts,
> + struct svc_rdma_rw_ctxt, rw_list);
> + list_del(&ctxt->rw_list);
> + kfree(ctxt);
> + }
> +}
> +
> +/**
> + * svc_rdma_wc_write_ctx - Handle completion of an RDMA Write ctx
> + * @cq: controlling Completion Queue
> + * @wc: Work Completion
> + *
> + * Invoked in soft IRQ context.
> + *
> + * Assumptions:
> + * - Write completion is not responsible for freeing pages under
> + * I/O.
> + */
> +static void svc_rdma_wc_write_ctx(struct ib_cq *cq, struct ib_wc *wc)
> +{
> + struct ib_cqe *cqe = wc->wr_cqe;
> + struct svc_rdma_rw_ctxt *ctxt =
> + container_of(cqe, struct svc_rdma_rw_ctxt, rw_cqe);
> + struct svcxprt_rdma *rdma = ctxt->rw_rdma;
> +
> + atomic_add(ctxt->rw_wrcount, &rdma->sc_sq_avail);
> + wake_up(&rdma->sc_send_wait);
> +
> + if (wc->status != IB_WC_SUCCESS)
> + goto flush;
> +
> +out:
> + svc_rdma_put_rw_ctxt(ctxt);
> + return;
> +
> +flush:
> + set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
> + if (wc->status != IB_WC_WR_FLUSH_ERR)
> + pr_err("svcrdma: write ctx: %s (%u/0x%x)\n",
> + ib_wc_status_msg(wc->status),
> + wc->status, wc->vendor_err);
> + goto out;
> +}
> +
> +/**
> + * svc_rdma_wc_read_ctx - Handle completion of an RDMA Read ctx
> + * @cq: controlling Completion Queue
> + * @wc: Work Completion
> + *
> + * Invoked in soft IRQ context.
? in soft IRQ?
> + */
> +static void svc_rdma_wc_read_ctx(struct ib_cq *cq, struct ib_wc *wc)
> +{
> + struct ib_cqe *cqe = wc->wr_cqe;
> + struct svc_rdma_rw_ctxt *ctxt =
> + container_of(cqe, struct svc_rdma_rw_ctxt, rw_cqe);
> + struct svcxprt_rdma *rdma = ctxt->rw_rdma;
> + struct svc_rdma_op_ctxt *head;
> +
> + atomic_add(ctxt->rw_wrcount, &rdma->sc_sq_avail);
> + wake_up(&rdma->sc_send_wait);
> +
> + if (wc->status != IB_WC_SUCCESS)
> + goto flush;
> +
> + head = ctxt->rw_readctxt;
> + if (!head)
> + goto out;
> +
> + spin_lock(&rdma->sc_rq_dto_lock);
> + list_add_tail(&head->list, &rdma->sc_read_complete_q);
> + spin_unlock(&rdma->sc_rq_dto_lock);
Not sure what sc_read_complete_q does... what post processing is
needed for completed reads?
> +/* This function sleeps when the transport's Send Queue is congested.
Is this easy to trigger?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2017-03-22 14:17 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-16 15:52 [PATCH v1 00/14] Server-side NFS/RDMA changes for v4.12 Chuck Lever
[not found] ` <20170316154132.4482.56769.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-16 15:52 ` [PATCH v1 01/14] svcrdma: Move send_wr to svc_rdma_op_ctxt Chuck Lever
[not found] ` <20170316155234.4482.94225.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 17:49 ` Sagi Grimberg
2017-03-16 15:52 ` [PATCH v1 02/14] svcrdma: Add svc_rdma_map_reply_hdr() Chuck Lever
[not found] ` <20170316155242.4482.64809.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 17:54 ` Sagi Grimberg
[not found] ` <f5000e25-6ca1-fc24-35c0-6089cf50923c-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 18:40 ` Chuck Lever
[not found] ` <A18F9D5E-09BA-4268-9AA6-3E5866101F76-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-22 13:07 ` Sagi Grimberg
2017-03-16 15:52 ` [PATCH v1 03/14] svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT Chuck Lever
[not found] ` <20170316155250.4482.49638.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 17:58 ` Sagi Grimberg
[not found] ` <46eb6195-a542-b35c-4902-a2bebb38feba-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 18:44 ` Chuck Lever
[not found] ` <391F0D90-2A46-4B2F-BCF0-B3BE7D48A3EF-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-22 13:09 ` Sagi Grimberg
[not found] ` <ec82feb4-d6b9-7fb4-5b11-b8007e313845-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-22 13:36 ` Chuck Lever
[not found] ` <C9D8A91C-DE08-4A41-A07D-1F4C42DD9B97-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-22 19:06 ` Sagi Grimberg
[not found] ` <68e3eda0-90f4-5bca-28be-b2cf494ed172-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-22 19:30 ` Chuck Lever
2017-03-16 15:52 ` [PATCH v1 04/14] svcrdma: Add helper to save pages under I/O Chuck Lever
[not found] ` <20170316155258.4482.69182.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 18:01 ` Sagi Grimberg
2017-03-16 15:53 ` [PATCH v1 05/14] svcrdma: Introduce local rdma_rw API helpers Chuck Lever
[not found] ` <20170316155306.4482.68041.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:17 ` Sagi Grimberg [this message]
[not found] ` <cfa49433-ab26-d2f0-27d4-2a96ff0adaba-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-22 15:41 ` Chuck Lever
[not found] ` <1CAD2542-A121-47ED-A47C-624E188EB54F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-24 22:19 ` Chuck Lever
2017-03-16 15:53 ` [PATCH v1 06/14] svcrdma: Use rdma_rw API in RPC reply path Chuck Lever
2017-03-16 15:53 ` [PATCH v1 07/14] svcrdma: Clean up RDMA_ERROR path Chuck Lever
[not found] ` <20170316155323.4482.8051.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:18 ` Sagi Grimberg
2017-03-16 15:53 ` [PATCH v1 08/14] svcrdma: Report Write/Reply chunk overruns Chuck Lever
[not found] ` <20170316155331.4482.7734.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:20 ` Sagi Grimberg
2017-03-16 15:53 ` [PATCH v1 09/14] svcrdma: Clean up RPC-over-RDMA backchannel reply processing Chuck Lever
2017-03-16 15:53 ` [PATCH v1 10/14] svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt Chuck Lever
[not found] ` <20170316155347.4482.74652.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:21 ` Sagi Grimberg
2017-03-16 15:53 ` [PATCH v1 11/14] svcrdma: Remove old RDMA Write completion handlers Chuck Lever
[not found] ` <20170316155355.4482.35026.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:22 ` Sagi Grimberg
2017-03-16 15:54 ` [PATCH v1 12/14] svcrdma: Remove the req_map cache Chuck Lever
[not found] ` <20170316155403.4482.2040.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:22 ` Sagi Grimberg
2017-03-16 15:54 ` [PATCH v1 13/14] svcrdma: Clean out old XDR encoders Chuck Lever
[not found] ` <20170316155411.4482.37224.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:23 ` Sagi Grimberg
2017-03-16 15:54 ` [PATCH v1 14/14] svcrdma: Clean up svc_rdma_post_recv() error handling Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cfa49433-ab26-d2f0-27d4-2a96ff0adaba@grimberg.me \
--to=sagi-nqwnxtmzq1alnmji0ikvqw@public.gmane.org \
--cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).