Re: [PATCH v1 02/11] svcrdma: Use all allocated Send Queue entries

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: Chuck Lever <cel@kernel.org>
Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [PATCH v1 02/11] svcrdma: Use all allocated Send Queue entries
Date: Mon, 29 Jan 2024 12:26:14 -0500	[thread overview]
Message-ID: <ZbffthImvL3YspVT@tissot.1015granger.net> (raw)
In-Reply-To: <170653984365.24162.652127313173673494.stgit@manet.1015granger.net>

On Mon, Jan 29, 2024 at 09:50:43AM -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> For upper layer protocols that request rw_ctxs, ib_create_qp()
> adjusts ib_qp_init_attr::max_send_wr to accommodate the WQEs those
> rw_ctxs will consume. See rdma_rw_init_qp() for details.
> 
> To actually use those additional WQEs, svc_rdma_accept() needs to
> retrieve the corrected SQ depth after calling rdma_create_qp() and
> set newxprt->sc_sq_depth and  newxprt->sc_sq_avail so that
> svc_rdma_send() and svc_rdma_post_chunk_ctxt() can utilize those
> WQEs.
> 
> The NVMe target driver, for example, already does this properly.
> 
> Fixes: 26fb2254dd33 ("svcrdma: Estimate Send Queue depth properly")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 ++++++++++++++++++------------
>  1 file changed, 22 insertions(+), 14 deletions(-)

After applying this one, workload tests trigger this set of errors
on my test NFS server (CX-5 in RoCE mode):

mlx5_core 0000:02:00.0: cq_err_event_notifier:517:(pid 4374): CQ error on CQN 0x407, syndrome 0x1
rocep2s0/1: QP 137 error: unrecognized status (0x41 0x0 0x0)

I think syndrome 0x1 is IB_EVENT_QP_FATAL ?

But the "unrecognized status" comes from mlx5_ib_qp_err_syndrome(),
which does this:

 344         pr_err("%s/%d: QP %d error: %s (0x%x 0x%x 0x%x)\n",
 345                ibqp->device->name, ibqp->port, ibqp->qp_num,
 346                ib_wc_status_msg(
 347                        MLX5_GET(cqe_error_syndrome, err_syn, syndrome)),
 348                MLX5_GET(cqe_error_syndrome, err_syn, vendor_error_syndrome),
 349                MLX5_GET(cqe_error_syndrome, err_syn, hw_syndrome_type),
 350                MLX5_GET(cqe_error_syndrome, err_syn, hw_error_syndrome));

I don't think the "syndrome" field contains a WC status code, so
invoking ib_wc_status_msg() to get a symbolic string seems wrong.

Anyway,

 - Can someone with an mlx5 decoder ring tell me what 0x41 is?
 - If someone sees an obvious error with how this patch has
   set up the SQ and Send CQ, please hit me with a clue bat.


> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 4a038c7e86f9..75f1481fbca0 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -370,12 +370,12 @@ static struct svc_xprt *svc_rdma_create(struct svc_serv *serv,
>   */
>  static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  {
> +	unsigned int ctxts, rq_depth, sq_depth;
>  	struct svcxprt_rdma *listen_rdma;
>  	struct svcxprt_rdma *newxprt = NULL;
>  	struct rdma_conn_param conn_param;
>  	struct rpcrdma_connect_private pmsg;
>  	struct ib_qp_init_attr qp_attr;
> -	unsigned int ctxts, rq_depth;
>  	struct ib_device *dev;
>  	int ret = 0;
>  	RPC_IFDEBUG(struct sockaddr *sap);
> @@ -422,24 +422,29 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  		newxprt->sc_max_requests = rq_depth - 2;
>  		newxprt->sc_max_bc_requests = 2;
>  	}
> +	sq_depth = rq_depth;
> +
>  	ctxts = rdma_rw_mr_factor(dev, newxprt->sc_port_num, RPCSVC_MAXPAGES);
>  	ctxts *= newxprt->sc_max_requests;
> -	newxprt->sc_sq_depth = rq_depth + ctxts;
> -	if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr)
> -		newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
> -	atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
>  
>  	newxprt->sc_pd = ib_alloc_pd(dev, 0);
>  	if (IS_ERR(newxprt->sc_pd)) {
>  		trace_svcrdma_pd_err(newxprt, PTR_ERR(newxprt->sc_pd));
>  		goto errout;
>  	}
> -	newxprt->sc_sq_cq = ib_alloc_cq_any(dev, newxprt, newxprt->sc_sq_depth,
> +
> +	/* The Completion Queue depth is the maximum number of signaled
> +	 * WRs expected to be in flight. Every Send WR is signaled, and
> +	 * each rw_ctx has a chain of WRs, but only one WR in each chain
> +	 * is signaled.
> +	 */
> +	newxprt->sc_sq_cq = ib_alloc_cq_any(dev, newxprt, sq_depth + ctxts,
>  					    IB_POLL_WORKQUEUE);
>  	if (IS_ERR(newxprt->sc_sq_cq))
>  		goto errout;
> -	newxprt->sc_rq_cq =
> -		ib_alloc_cq_any(dev, newxprt, rq_depth, IB_POLL_WORKQUEUE);
> +	/* Every Receive WR is signaled. */
> +	newxprt->sc_rq_cq = ib_alloc_cq_any(dev, newxprt, rq_depth,
> +					    IB_POLL_WORKQUEUE);
>  	if (IS_ERR(newxprt->sc_rq_cq))
>  		goto errout;
>  
> @@ -448,7 +453,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  	qp_attr.qp_context = &newxprt->sc_xprt;
>  	qp_attr.port_num = newxprt->sc_port_num;
>  	qp_attr.cap.max_rdma_ctxs = ctxts;
> -	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
> +	qp_attr.cap.max_send_wr = sq_depth;
>  	qp_attr.cap.max_recv_wr = rq_depth;
>  	qp_attr.cap.max_send_sge = newxprt->sc_max_send_sges;
>  	qp_attr.cap.max_recv_sge = 1;
> @@ -456,17 +461,20 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>  	qp_attr.qp_type = IB_QPT_RC;
>  	qp_attr.send_cq = newxprt->sc_sq_cq;
>  	qp_attr.recv_cq = newxprt->sc_rq_cq;
> -	dprintk("    cap.max_send_wr = %d, cap.max_recv_wr = %d\n",
> -		qp_attr.cap.max_send_wr, qp_attr.cap.max_recv_wr);
> -	dprintk("    cap.max_send_sge = %d, cap.max_recv_sge = %d\n",
> -		qp_attr.cap.max_send_sge, qp_attr.cap.max_recv_sge);
> -
>  	ret = rdma_create_qp(newxprt->sc_cm_id, newxprt->sc_pd, &qp_attr);
>  	if (ret) {
>  		trace_svcrdma_qp_err(newxprt, ret);
>  		goto errout;
>  	}
> +	dprintk("svcrdma: cap.max_send_wr = %d, cap.max_recv_wr = %d\n",
> +		qp_attr.cap.max_send_wr, qp_attr.cap.max_recv_wr);
> +	dprintk("    cap.max_send_sge = %d, cap.max_recv_sge = %d\n",
> +		qp_attr.cap.max_send_sge, qp_attr.cap.max_recv_sge);
> +	dprintk("    send CQ depth = %d, recv CQ depth = %d\n",
> +		sq_depth, rq_depth);
>  	newxprt->sc_qp = newxprt->sc_cm_id->qp;
> +	newxprt->sc_sq_depth = qp_attr.cap.max_send_wr;
> +	atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
>  
>  	if (!(dev->attrs.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS))
>  		newxprt->sc_snd_w_inv = false;
> 
> 
> 

-- 
Chuck Lever

next prev parent reply	other threads:[~2024-01-29 17:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-29 14:50 [PATCH v1 00/11] NFSD RDMA transport improvements Chuck Lever
2024-01-29 14:50 ` [PATCH v1 01/11] svcrdma: Reserve an extra WQE for ib_drain_rq() Chuck Lever
2024-01-29 14:50 ` [PATCH v1 02/11] svcrdma: Use all allocated Send Queue entries Chuck Lever
2024-01-29 17:26   ` Chuck Lever [this message]
2024-01-29 14:50 ` [PATCH v1 03/11] svcrdma: Increase the per-transport rw_ctx count Chuck Lever
2024-01-29 14:50 ` [PATCH v1 04/11] svcrdma: Fix SQ wake-ups Chuck Lever
2024-01-29 14:51 ` [PATCH v1 05/11] svcrdma: Prevent a UAF in svc_rdma_send() Chuck Lever
2024-01-29 14:51 ` [PATCH v1 06/11] svcrdma: Fix retry loop " Chuck Lever
2024-01-29 14:51 ` [PATCH v1 07/11] svcrdma: Post Send WR chain Chuck Lever
2024-01-29 14:51 ` [PATCH v1 08/11] svcrdma: Move write_info for Reply chunks into struct svc_rdma_send_ctxt Chuck Lever
2024-01-29 14:51 ` [PATCH v1 09/11] svcrdma: Post the Reply chunk and Send WR together Chuck Lever
2024-01-29 14:51 ` [PATCH v1 10/11] svcrdma: Post WRs for Write chunks in svc_rdma_sendto() Chuck Lever
2024-01-29 14:51 ` [PATCH v1 11/11] svcrdma: Add Write chunk WRs to the RPC's Send WR chain Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbffthImvL3YspVT@tissot.1015granger.net \
    --to=chuck.lever@oracle.com \
    --cc=cel@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).