From: Steve Wise <swise@opengridcomputing.com>
To: Devesh Sharma <Devesh.Sharma@Emulex.Com>,
Chuck Lever <chuck.lever@oracle.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Cc: "Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>
Subject: Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth
Date: Fri, 16 May 2014 09:14:00 -0500 [thread overview]
Message-ID: <53761D28.3070704@opengridcomputing.com> (raw)
In-Reply-To: <53761C63.4050908@opengridcomputing.com>
By the way, Devesh: Is the device advertising FRMR support, yet setting
the max page list len to zero? That's a driver bug...
On 5/16/2014 9:10 AM, Steve Wise wrote:
> I guess the client code doesn't verify that the device supports the
> chosen memreg mode. That's not good. Lemme fix this and respin this
> patch.
>
>
> On 5/16/2014 2:08 AM, Devesh Sharma wrote:
>> Chuck
>>
>> This patch is causing a CPU soft-lockup if underlying vendor reports
>> devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy =
>> FRMR (Default option).
>> I think there is need to refer to device capability flags. If
>> strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0
>> then flash an error and fail RPC with -EIO.
>>
>> See inline:
>>
>>> -----Original Message-----
>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>>> Sent: Thursday, May 01, 2014 1:00 AM
>>> To: linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
>>> Cc: Anna.Schumaker@netapp.com
>>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
>>> page list depth
>>>
>>> From: Steve Wise <swise@opengridcomputing.com>
>>>
>>> Some rdma devices don't support a fast register page list depth of
>>> at least
>>> RPCRDMA_MAX_DATA_SEGS. So xprtrdma needs to chunk its fast register
>>> regions according to the minimum of the device max supported depth or
>>> RPCRDMA_MAX_DATA_SEGS.
>>>
>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>
>>> net/sunrpc/xprtrdma/rpc_rdma.c | 4 ---
>>> net/sunrpc/xprtrdma/verbs.c | 47 +++++++++++++++++++++++++++++-
>>> ---------
>>> net/sunrpc/xprtrdma/xprt_rdma.h | 1 +
>>> 3 files changed, 36 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst,
>>> struct
>>> xdr_buf *target,
>>> /* success. all failures return above */
>>> req->rl_nchunks = nchunks;
>>>
>>> - BUG_ON(nchunks == 0);
>>> - BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>>> - && (nchunks > 3));
>>> -
>>> /*
>>> * finish off header. If write, marshal discrim and nchunks.
>>> */
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 9372656..55fb09a 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>>> sockaddr *addr, int memreg)
>>> __func__);
>>> memreg = RPCRDMA_REGISTER;
>>> #endif
>>> + } else {
>>> + /* Mind the ia limit on FRMR page list depth */
>>> + ia->ri_max_frmr_depth = min_t(unsigned int,
>>> + RPCRDMA_MAX_DATA_SEGS,
>>> + devattr.max_fast_reg_page_list_len);
>>> }
>>> break;
>>> }
>>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>>> rpcrdma_ia *ia,
>>> ep->rep_attr.srq = NULL;
>>> ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>> switch (ia->ri_memreg_strategy) {
>>> - case RPCRDMA_FRMR:
>>> + case RPCRDMA_FRMR: {
>>> + int depth = 7;
>>> +
>>> /* Add room for frmr register and invalidate WRs.
>>> * 1. FRMR reg WR for head
>>> * 2. FRMR invalidate WR for head
>>> - * 3. FRMR reg WR for pagelist
>>> - * 4. FRMR invalidate WR for pagelist
>>> + * 3. N FRMR reg WRs for pagelist
>>> + * 4. N FRMR invalidate WRs for pagelist
>>> * 5. FRMR reg WR for tail
>>> * 6. FRMR invalidate WR for tail
>>> * 7. The RDMA_SEND WR
>>> */
>>> - ep->rep_attr.cap.max_send_wr *= 7;
>>> +
>>> + /* Calculate N if the device max FRMR depth is smaller than
>>> + * RPCRDMA_MAX_DATA_SEGS.
>>> + */
>>> + if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>>> + int delta = RPCRDMA_MAX_DATA_SEGS -
>>> + ia->ri_max_frmr_depth;
>>> +
>>> + do {
>>> + depth += 2; /* FRMR reg + invalidate */
>>> + delta -= ia->ri_max_frmr_depth;
>> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>>
>>> + } while (delta > 0);
>>> +
>>> + }
>>> + ep->rep_attr.cap.max_send_wr *= depth;
>>> if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>>> - cdata->max_requests = devattr.max_qp_wr / 7;
>>> + cdata->max_requests = devattr.max_qp_wr / depth;
>>> if (!cdata->max_requests)
>>> return -EINVAL;
>>> - ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests * 7;
>>> + ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests *
>>> + depth;
>>> }
>>> break;
>>> + }
>>> case RPCRDMA_MEMWINDOWS_ASYNC:
>>> case RPCRDMA_MEMWINDOWS:
>>> /* Add room for mw_binds+unbinds - overkill! */ @@ -
>>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>>> struct rpcrdma_ep *ep,
>>> case RPCRDMA_FRMR:
>>> for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>>> ) {
>>> r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> + ia->ri_max_frmr_depth);
>>> if (IS_ERR(r->r.frmr.fr_mr)) {
>>> rc = PTR_ERR(r->r.frmr.fr_mr);
>>> dprintk("RPC: %s: ib_alloc_fast_reg_mr"
>>> " failed %i\n", __func__, rc);
>>> goto out;
>>> }
>>> - r->r.frmr.fr_pgl =
>>> - ib_alloc_fast_reg_page_list(ia->ri_id-
>>>> device,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> + r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>>> + ia->ri_id->device,
>>> + ia->ri_max_frmr_depth);
>>> if (IS_ERR(r->r.frmr.fr_pgl)) {
>>> rc = PTR_ERR(r->r.frmr.fr_pgl);
>>> dprintk("RPC: %s: "
>>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>>> rpcrdma_mr_seg *seg,
>>> seg1->mr_offset -= pageoff; /* start of page */
>>> seg1->mr_len += pageoff;
>>> len = -pageoff;
>>> - if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>>> - *nsegs = RPCRDMA_MAX_DATA_SEGS;
>>> + if (*nsegs > ia->ri_max_frmr_depth)
>>> + *nsegs = ia->ri_max_frmr_depth;
>>> for (page_no = i = 0; i < *nsegs;) {
>>> rpcrdma_map_one(ia, seg, writing);
>>> pa = seg->mr_dma;
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>> struct completion ri_done;
>>> int ri_async_rc;
>>> enum rpcrdma_memreg ri_memreg_strategy;
>>> + unsigned int ri_max_frmr_depth;
>>> };
>>>
>>> /*
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe
>>> linux-rdma" in the
>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>> N�����r��y���b�X��ǧv�^�){.n�+����{���"��^n�r���z�\x1a ��h����&��\x1e
>> �G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2014-05-16 14:14 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-30 19:29 [PATCH V3 00/17] NFS/RDMA client-side patches Chuck Lever
2014-04-30 19:29 ` [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth Chuck Lever
2014-05-16 7:08 ` Devesh Sharma
2014-05-16 14:10 ` Steve Wise
2014-05-16 14:14 ` Steve Wise [this message]
2014-05-16 14:29 ` Steve Wise
2014-05-17 8:23 ` Devesh Sharma
2014-04-30 19:29 ` [PATCH V3 02/17] nfs-rdma: Fix for FMR leaks Chuck Lever
2014-04-30 19:29 ` [PATCH V3 03/17] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context Chuck Lever
2014-04-30 19:30 ` [PATCH V3 04/17] xprtrdma: Remove BOUNCEBUFFERS memory registration mode Chuck Lever
2014-04-30 19:30 ` [PATCH V3 05/17] xprtrdma: Remove MEMWINDOWS registration modes Chuck Lever
2014-04-30 19:30 ` [PATCH V3 06/17] xprtrdma: Remove REGISTER memory registration mode Chuck Lever
2014-04-30 19:30 ` [PATCH V3 07/17] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported Chuck Lever
2014-04-30 19:30 ` [PATCH V3 08/17] xprtrdma: mount reports "Invalid mount option" if memreg mode " Chuck Lever
2014-04-30 19:30 ` [PATCH V3 09/17] xprtrdma: Simplify rpcrdma_deregister_external() synopsis Chuck Lever
2014-04-30 19:30 ` [PATCH V3 10/17] xprtrdma: Make rpcrdma_ep_destroy() return void Chuck Lever
2014-04-30 19:31 ` [PATCH V3 11/17] xprtrdma: Split the completion queue Chuck Lever
2014-04-30 19:31 ` [PATCH V3 12/17] xprtrmda: Reduce lock contention in completion handlers Chuck Lever
2014-04-30 19:31 ` [PATCH V3 13/17] xprtrmda: Reduce calls to ib_poll_cq() " Chuck Lever
2014-04-30 19:31 ` [PATCH V3 14/17] xprtrdma: Limit work done by completion handler Chuck Lever
2014-04-30 19:31 ` [PATCH V3 15/17] xprtrdma: Reduce the number of hardway buffer allocations Chuck Lever
2014-04-30 19:31 ` [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting Chuck Lever
2014-04-30 19:31 ` [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting Chuck Lever
2014-05-01 7:36 ` Hal Rosenstock
[not found] ` <20140430191433.5663.16217.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-05-02 19:27 ` [PATCH V3 00/17] NFS/RDMA client-side patches Doug Ledford
2014-05-02 19:27 ` Doug Ledford
2014-05-02 19:27 ` Doug Ledford
[not found] ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-02 20:20 ` Chuck Lever
2014-05-02 22:34 ` Doug Ledford
2014-05-02 22:34 ` Doug Ledford
2014-05-02 22:34 ` Doug Ledford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53761D28.3070704@opengridcomputing.com \
--to=swise@opengridcomputing.com \
--cc=Anna.Schumaker@netapp.com \
--cc=Devesh.Sharma@Emulex.Com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).