linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
To: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v1 05/14] svcrdma: Introduce local rdma_rw API helpers
Date: Wed, 22 Mar 2017 16:17:38 +0200	[thread overview]
Message-ID: <cfa49433-ab26-d2f0-27d4-2a96ff0adaba@grimberg.me> (raw)
In-Reply-To: <20170316155306.4482.68041.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>


> The plan is to replace the local bespoke code that constructs and
> posts RDMA Read and Write Work Requests with calls to the rdma_rw
> API. This shares code with other RDMA-enabled ULPs that manages the
> gory details of buffer registration and posting Work Requests.
>
> Some design notes:
>
>  o svc_xprt reference counting is modified, since one rdma_rw_ctx
>    generates one completion, no matter how many Write WRs are
>    posted. To accommodate the new reference counting scheme, a new
>    version of svc_rdma_send() is introduced.
>
>  o The structure of RPC-over-RDMA transport headers is flexible,
>    allowing multiple segments per Reply with arbitrary alignment.
>    Thus I did not take the further step of chaining Write WRs with
>    the Send WR containing the RPC Reply message. The Write and Send
>    WRs continue to be built by separate pieces of code.
>
>  o The current code builds the transport header as it is construct-
>    ing Write WRs. I've replaced that with marshaling of transport
>    header data items in a separate step. This is because the exact
>    structure of client-provided segments may not align with the
>    components of the server's reply xdr_buf, or the pages in the
>    page list. Thus parts of each client-provided segment may be
>    written at different points in the send path.
>
>  o Since the Write list and Reply chunk marshaling code is being
>    replaced, I took the opportunity to replace some of the C
>    structure-based XDR encoding code with more portable code that
>    instead uses pointer arithmetic.
>
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

To be honest its difficult to review this patch, but its probably
difficult to split it too...

> +
> +/* One Write chunk is copied from Call transport header to Reply
> + * transport header. Each segment's length field is updated to
> + * reflect number of bytes consumed in the segment.
> + *
> + * Returns number of segments in this chunk.
> + */
> +static unsigned int xdr_encode_write_chunk(__be32 *dst, __be32 *src,
> +					   unsigned int remaining)

Is this only for data-in-reply (send operation)? I don't see why you
would need to modify that for RDMA operations.

Perhaps I'd try to split the data-in-reply code from the actual rdma
conversion. It might be helpful to comprehend.

> +{
> +	unsigned int i, nsegs;
> +	u32 seg_len;
> +
> +	/* Write list discriminator */
> +	*dst++ = *src++;

I had to actually run a test program to understand the precedence
here so parenthesis would've helped :)

> diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> new file mode 100644
> index 0000000..1e76227
> --- /dev/null
> +++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
> @@ -0,0 +1,785 @@
> +/*
> + * Copyright (c) 2016 Oracle.  All rights reserved.
> + *
> + * Use the core R/W API to move RPC-over-RDMA Read and Write chunks.
> + */
> +
> +#include <linux/sunrpc/rpc_rdma.h>
> +#include <linux/sunrpc/svc_rdma.h>
> +#include <linux/sunrpc/debug.h>
> +
> +#include <rdma/rw.h>
> +
> +#define RPCDBG_FACILITY	RPCDBG_SVCXPRT
> +
> +/* Each R/W context contains state for one chain of RDMA Read or
> + * Write Work Requests (one RDMA segment to be read from or written
> + * back to the client).
> + *
> + * Each WR chain handles a single contiguous server-side buffer,
> + * because some registration modes (eg. FRWR) do not support a
> + * discontiguous scatterlist.
> + *
> + * Each WR chain handles only one R_key. Each RPC-over-RDMA segment
> + * from a client may contain a unique R_key, so each WR chain moves
> + * one segment (or less) at a time.
> + *
> + * The scatterlist makes this data structure just over 8KB in size
> + * on 4KB-page platforms. As the size of this structure increases
> + * past one page, it becomes more likely that allocating one of these
> + * will fail. Therefore, these contexts are created on demand, but
> + * cached and reused until the controlling svcxprt_rdma is destroyed.
> + */
> +struct svc_rdma_rw_ctxt {
> +	struct list_head	rw_list;
> +	struct ib_cqe		rw_cqe;
> +	struct svcxprt_rdma	*rw_rdma;
> +	int			rw_nents;
> +	int			rw_wrcount;
> +	enum dma_data_direction	rw_dir;
> +	struct svc_rdma_op_ctxt	*rw_readctxt;
> +	struct rdma_rw_ctx	rw_ctx;
> +	struct scatterlist	rw_sg[RPCSVC_MAXPAGES];

Have you considered using sg_table with sg_alloc_table_chained?

See lib/sg_pool.c and nvme-rdma as a consumer.

> +};
> +
> +static struct svc_rdma_rw_ctxt *
> +svc_rdma_get_rw_ctxt(struct svcxprt_rdma *rdma)
> +{
> +	struct svc_rdma_rw_ctxt *ctxt;
> +
> +	svc_xprt_get(&rdma->sc_xprt);
> +
> +	spin_lock(&rdma->sc_rw_ctxt_lock);
> +	if (list_empty(&rdma->sc_rw_ctxts))
> +		goto out_empty;
> +
> +	ctxt = list_first_entry(&rdma->sc_rw_ctxts,
> +				struct svc_rdma_rw_ctxt, rw_list);
> +	list_del_init(&ctxt->rw_list);
> +	spin_unlock(&rdma->sc_rw_ctxt_lock);
> +
> +out:
> +	ctxt->rw_dir = DMA_NONE;
> +	return ctxt;
> +
> +out_empty:
> +	spin_unlock(&rdma->sc_rw_ctxt_lock);
> +
> +	ctxt = kmalloc(sizeof(*ctxt), GFP_KERNEL);
> +	if (!ctxt) {
> +		svc_xprt_put(&rdma->sc_xprt);
> +		return NULL;
> +	}
> +
> +	ctxt->rw_rdma = rdma;
> +	INIT_LIST_HEAD(&ctxt->rw_list);
> +	sg_init_table(ctxt->rw_sg, ARRAY_SIZE(ctxt->rw_sg));
> +	goto out;
> +}
> +
> +static void svc_rdma_put_rw_ctxt(struct svc_rdma_rw_ctxt *ctxt)
> +{
> +	struct svcxprt_rdma *rdma = ctxt->rw_rdma;
> +
> +	if (ctxt->rw_dir != DMA_NONE)
> +		rdma_rw_ctx_destroy(&ctxt->rw_ctx, rdma->sc_qp,
> +				    rdma->sc_port_num,
> +				    ctxt->rw_sg, ctxt->rw_nents,
> +				    ctxt->rw_dir);
> +

its a bit odd to see put_rw_ctxt that also destroys the context
which isn't exactly pairs with get_rw_ctxt.

Maybe it'd be useful to explicitly do that outside the put.

> +/**
> + * svc_rdma_destroy_rw_ctxts - Free write contexts
> + * @rdma: xprt about to be destroyed
> + *
> + */
> +void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma)
> +{
> +	struct svc_rdma_rw_ctxt *ctxt;
> +
> +	while (!list_empty(&rdma->sc_rw_ctxts)) {
> +		ctxt = list_first_entry(&rdma->sc_rw_ctxts,
> +					struct svc_rdma_rw_ctxt, rw_list);
> +		list_del(&ctxt->rw_list);
> +		kfree(ctxt);
> +	}
> +}
> +
> +/**
> + * svc_rdma_wc_write_ctx - Handle completion of an RDMA Write ctx
> + * @cq: controlling Completion Queue
> + * @wc: Work Completion
> + *
> + * Invoked in soft IRQ context.
> + *
> + * Assumptions:
> + * - Write completion is not responsible for freeing pages under
> + *   I/O.
> + */
> +static void svc_rdma_wc_write_ctx(struct ib_cq *cq, struct ib_wc *wc)
> +{
> +	struct ib_cqe *cqe = wc->wr_cqe;
> +	struct svc_rdma_rw_ctxt *ctxt =
> +			container_of(cqe, struct svc_rdma_rw_ctxt, rw_cqe);
> +	struct svcxprt_rdma *rdma = ctxt->rw_rdma;
> +
> +	atomic_add(ctxt->rw_wrcount, &rdma->sc_sq_avail);
> +	wake_up(&rdma->sc_send_wait);
> +
> +	if (wc->status != IB_WC_SUCCESS)
> +		goto flush;
> +
> +out:
> +	svc_rdma_put_rw_ctxt(ctxt);
> +	return;
> +
> +flush:
> +	set_bit(XPT_CLOSE, &rdma->sc_xprt.xpt_flags);
> +	if (wc->status != IB_WC_WR_FLUSH_ERR)
> +		pr_err("svcrdma: write ctx: %s (%u/0x%x)\n",
> +		       ib_wc_status_msg(wc->status),
> +		       wc->status, wc->vendor_err);
> +	goto out;
> +}
> +
> +/**
> + * svc_rdma_wc_read_ctx - Handle completion of an RDMA Read ctx
> + * @cq: controlling Completion Queue
> + * @wc: Work Completion
> + *
> + * Invoked in soft IRQ context.

? in soft IRQ?

> + */
> +static void svc_rdma_wc_read_ctx(struct ib_cq *cq, struct ib_wc *wc)
> +{
> +	struct ib_cqe *cqe = wc->wr_cqe;
> +	struct svc_rdma_rw_ctxt *ctxt =
> +			container_of(cqe, struct svc_rdma_rw_ctxt, rw_cqe);
> +	struct svcxprt_rdma *rdma = ctxt->rw_rdma;
> +	struct svc_rdma_op_ctxt *head;
> +
> +	atomic_add(ctxt->rw_wrcount, &rdma->sc_sq_avail);
> +	wake_up(&rdma->sc_send_wait);
> +
> +	if (wc->status != IB_WC_SUCCESS)
> +		goto flush;
> +
> +	head = ctxt->rw_readctxt;
> +	if (!head)
> +		goto out;
> +
> +	spin_lock(&rdma->sc_rq_dto_lock);
> +	list_add_tail(&head->list, &rdma->sc_read_complete_q);
> +	spin_unlock(&rdma->sc_rq_dto_lock);

Not sure what sc_read_complete_q does... what post processing is
needed for completed reads?

> +/* This function sleeps when the transport's Send Queue is congested.

Is this easy to trigger?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2017-03-22 14:17 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-16 15:52 [PATCH v1 00/14] Server-side NFS/RDMA changes for v4.12 Chuck Lever
     [not found] ` <20170316154132.4482.56769.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-16 15:52   ` [PATCH v1 01/14] svcrdma: Move send_wr to svc_rdma_op_ctxt Chuck Lever
     [not found]     ` <20170316155234.4482.94225.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 17:49       ` Sagi Grimberg
2017-03-16 15:52   ` [PATCH v1 02/14] svcrdma: Add svc_rdma_map_reply_hdr() Chuck Lever
     [not found]     ` <20170316155242.4482.64809.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 17:54       ` Sagi Grimberg
     [not found]         ` <f5000e25-6ca1-fc24-35c0-6089cf50923c-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 18:40           ` Chuck Lever
     [not found]             ` <A18F9D5E-09BA-4268-9AA6-3E5866101F76-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-22 13:07               ` Sagi Grimberg
2017-03-16 15:52   ` [PATCH v1 03/14] svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT Chuck Lever
     [not found]     ` <20170316155250.4482.49638.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 17:58       ` Sagi Grimberg
     [not found]         ` <46eb6195-a542-b35c-4902-a2bebb38feba-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 18:44           ` Chuck Lever
     [not found]             ` <391F0D90-2A46-4B2F-BCF0-B3BE7D48A3EF-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-22 13:09               ` Sagi Grimberg
     [not found]                 ` <ec82feb4-d6b9-7fb4-5b11-b8007e313845-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-22 13:36                   ` Chuck Lever
     [not found]                     ` <C9D8A91C-DE08-4A41-A07D-1F4C42DD9B97-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-22 19:06                       ` Sagi Grimberg
     [not found]                         ` <68e3eda0-90f4-5bca-28be-b2cf494ed172-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-22 19:30                           ` Chuck Lever
2017-03-16 15:52   ` [PATCH v1 04/14] svcrdma: Add helper to save pages under I/O Chuck Lever
     [not found]     ` <20170316155258.4482.69182.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-21 18:01       ` Sagi Grimberg
2017-03-16 15:53   ` [PATCH v1 05/14] svcrdma: Introduce local rdma_rw API helpers Chuck Lever
     [not found]     ` <20170316155306.4482.68041.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:17       ` Sagi Grimberg [this message]
     [not found]         ` <cfa49433-ab26-d2f0-27d4-2a96ff0adaba-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-22 15:41           ` Chuck Lever
     [not found]             ` <1CAD2542-A121-47ED-A47C-624E188EB54F-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-03-24 22:19               ` Chuck Lever
2017-03-16 15:53   ` [PATCH v1 06/14] svcrdma: Use rdma_rw API in RPC reply path Chuck Lever
2017-03-16 15:53   ` [PATCH v1 07/14] svcrdma: Clean up RDMA_ERROR path Chuck Lever
     [not found]     ` <20170316155323.4482.8051.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:18       ` Sagi Grimberg
2017-03-16 15:53   ` [PATCH v1 08/14] svcrdma: Report Write/Reply chunk overruns Chuck Lever
     [not found]     ` <20170316155331.4482.7734.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:20       ` Sagi Grimberg
2017-03-16 15:53   ` [PATCH v1 09/14] svcrdma: Clean up RPC-over-RDMA backchannel reply processing Chuck Lever
2017-03-16 15:53   ` [PATCH v1 10/14] svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt Chuck Lever
     [not found]     ` <20170316155347.4482.74652.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:21       ` Sagi Grimberg
2017-03-16 15:53   ` [PATCH v1 11/14] svcrdma: Remove old RDMA Write completion handlers Chuck Lever
     [not found]     ` <20170316155355.4482.35026.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:22       ` Sagi Grimberg
2017-03-16 15:54   ` [PATCH v1 12/14] svcrdma: Remove the req_map cache Chuck Lever
     [not found]     ` <20170316155403.4482.2040.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:22       ` Sagi Grimberg
2017-03-16 15:54   ` [PATCH v1 13/14] svcrdma: Clean out old XDR encoders Chuck Lever
     [not found]     ` <20170316155411.4482.37224.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-03-22 14:23       ` Sagi Grimberg
2017-03-16 15:54   ` [PATCH v1 14/14] svcrdma: Clean up svc_rdma_post_recv() error handling Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cfa49433-ab26-d2f0-27d4-2a96ff0adaba@grimberg.me \
    --to=sagi-nqwnxtmzq1alnmji0ikvqw@public.gmane.org \
    --cc=chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
    --cc=linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).