linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steve Wise <swise@opengridcomputing.com>
To: Devesh Sharma <Devesh.Sharma@Emulex.Com>,
	Chuck Lever <chuck.lever@oracle.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Cc: "Anna.Schumaker@netapp.com" <Anna.Schumaker@netapp.com>
Subject: Re: [PATCH V3 01/17] xprtrdma: mind the device's max fast register page	list depth
Date: Fri, 16 May 2014 09:14:00 -0500	[thread overview]
Message-ID: <53761D28.3070704@opengridcomputing.com> (raw)
In-Reply-To: <53761C63.4050908@opengridcomputing.com>

By the way, Devesh:  Is the device advertising FRMR support, yet setting 
the max page list len to zero?  That's a driver bug...



On 5/16/2014 9:10 AM, Steve Wise wrote:
> I guess the client code doesn't verify that the device supports the 
> chosen memreg mode.  That's not good.   Lemme fix this and respin this 
> patch.
>
>
> On 5/16/2014 2:08 AM, Devesh Sharma wrote:
>> Chuck
>>
>> This patch is causing a CPU soft-lockup if underlying vendor reports 
>> devattr.max_fast_reg_pagr_list_len = 0 and ia->ri_memreg_strategy = 
>> FRMR (Default option).
>> I think there is need to refer to device capability flags. If 
>> strategy = FRMR is forced and devattr.max_fast_reg_pagr_list_len=0 
>> then flash an error and fail RPC with -EIO.
>>
>> See inline:
>>
>>> -----Original Message-----
>>> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-
>>> owner@vger.kernel.org] On Behalf Of Chuck Lever
>>> Sent: Thursday, May 01, 2014 1:00 AM
>>> To: linux-nfs@vger.kernel.org; linux-rdma@vger.kernel.org
>>> Cc: Anna.Schumaker@netapp.com
>>> Subject: [PATCH V3 01/17] xprtrdma: mind the device's max fast register
>>> page list depth
>>>
>>> From: Steve Wise <swise@opengridcomputing.com>
>>>
>>> Some rdma devices don't support a fast register page list depth of 
>>> at least
>>> RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast register
>>> regions according to the minimum of the device max supported depth or
>>> RPCRDMA_MAX_DATA_SEGS.
>>>
>>> Signed-off-by: Steve Wise <swise@opengridcomputing.com>
>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>
>>>   net/sunrpc/xprtrdma/rpc_rdma.c  |    4 ---
>>>   net/sunrpc/xprtrdma/verbs.c     |   47 +++++++++++++++++++++++++++++-
>>> ---------
>>>   net/sunrpc/xprtrdma/xprt_rdma.h |    1 +
>>>   3 files changed, 36 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> b/net/sunrpc/xprtrdma/rpc_rdma.c index 96ead52..400aa1b 100644
>>> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
>>> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
>>> @@ -248,10 +248,6 @@ rpcrdma_create_chunks(struct rpc_rqst *rqst, 
>>> struct
>>> xdr_buf *target,
>>>       /* success. all failures return above */
>>>       req->rl_nchunks = nchunks;
>>>
>>> -    BUG_ON(nchunks == 0);
>>> -    BUG_ON((r_xprt->rx_ia.ri_memreg_strategy == RPCRDMA_FRMR)
>>> -           && (nchunks > 3));
>>> -
>>>       /*
>>>        * finish off header. If write, marshal discrim and nchunks.
>>>        */
>>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>>> index 9372656..55fb09a 100644
>>> --- a/net/sunrpc/xprtrdma/verbs.c
>>> +++ b/net/sunrpc/xprtrdma/verbs.c
>>> @@ -539,6 +539,11 @@ rpcrdma_ia_open(struct rpcrdma_xprt *xprt, struct
>>> sockaddr *addr, int memreg)
>>>                   __func__);
>>>               memreg = RPCRDMA_REGISTER;
>>>   #endif
>>> +        } else {
>>> +            /* Mind the ia limit on FRMR page list depth */
>>> +            ia->ri_max_frmr_depth = min_t(unsigned int,
>>> +                RPCRDMA_MAX_DATA_SEGS,
>>> +                devattr.max_fast_reg_page_list_len);
>>>           }
>>>           break;
>>>       }
>>> @@ -659,24 +664,42 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct
>>> rpcrdma_ia *ia,
>>>       ep->rep_attr.srq = NULL;
>>>       ep->rep_attr.cap.max_send_wr = cdata->max_requests;
>>>       switch (ia->ri_memreg_strategy) {
>>> -    case RPCRDMA_FRMR:
>>> +    case RPCRDMA_FRMR: {
>>> +        int depth = 7;
>>> +
>>>           /* Add room for frmr register and invalidate WRs.
>>>            * 1. FRMR reg WR for head
>>>            * 2. FRMR invalidate WR for head
>>> -         * 3. FRMR reg WR for pagelist
>>> -         * 4. FRMR invalidate WR for pagelist
>>> +         * 3. N FRMR reg WRs for pagelist
>>> +         * 4. N FRMR invalidate WRs for pagelist
>>>            * 5. FRMR reg WR for tail
>>>            * 6. FRMR invalidate WR for tail
>>>            * 7. The RDMA_SEND WR
>>>            */
>>> -        ep->rep_attr.cap.max_send_wr *= 7;
>>> +
>>> +        /* Calculate N if the device max FRMR depth is smaller than
>>> +         * RPCRDMA_MAX_DATA_SEGS.
>>> +         */
>>> +        if (ia->ri_max_frmr_depth < RPCRDMA_MAX_DATA_SEGS) {
>>> +            int delta = RPCRDMA_MAX_DATA_SEGS -
>>> +                    ia->ri_max_frmr_depth;
>>> +
>>> +            do {
>>> +                depth += 2; /* FRMR reg + invalidate */
>>> +                delta -= ia->ri_max_frmr_depth;
>> If ia->ri_max_frmr_depth is = 0. This loop becomes infinite loop.
>>
>>> +            } while (delta > 0);
>>> +
>>> +        }
>>> +        ep->rep_attr.cap.max_send_wr *= depth;
>>>           if (ep->rep_attr.cap.max_send_wr > devattr.max_qp_wr) {
>>> -            cdata->max_requests = devattr.max_qp_wr / 7;
>>> +            cdata->max_requests = devattr.max_qp_wr / depth;
>>>               if (!cdata->max_requests)
>>>                   return -EINVAL;
>>> -            ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests * 7;
>>> +            ep->rep_attr.cap.max_send_wr = cdata-
>>>> max_requests *
>>> +                               depth;
>>>           }
>>>           break;
>>> +    }
>>>       case RPCRDMA_MEMWINDOWS_ASYNC:
>>>       case RPCRDMA_MEMWINDOWS:
>>>           /* Add room for mw_binds+unbinds - overkill! */ @@ -
>>> 1043,16 +1066,16 @@ rpcrdma_buffer_create(struct rpcrdma_buffer *buf,
>>> struct rpcrdma_ep *ep,
>>>       case RPCRDMA_FRMR:
>>>           for (i = buf->rb_max_requests * RPCRDMA_MAX_SEGS; i; i--
>>> ) {
>>>               r->r.frmr.fr_mr = ib_alloc_fast_reg_mr(ia->ri_pd,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> +                        ia->ri_max_frmr_depth);
>>>               if (IS_ERR(r->r.frmr.fr_mr)) {
>>>                   rc = PTR_ERR(r->r.frmr.fr_mr);
>>>                   dprintk("RPC:       %s: ib_alloc_fast_reg_mr"
>>>                       " failed %i\n", __func__, rc);
>>>                   goto out;
>>>               }
>>> -            r->r.frmr.fr_pgl =
>>> -                ib_alloc_fast_reg_page_list(ia->ri_id-
>>>> device,
>>> -
>>> RPCRDMA_MAX_SEGS);
>>> +            r->r.frmr.fr_pgl = ib_alloc_fast_reg_page_list(
>>> +                        ia->ri_id->device,
>>> +                        ia->ri_max_frmr_depth);
>>>               if (IS_ERR(r->r.frmr.fr_pgl)) {
>>>                   rc = PTR_ERR(r->r.frmr.fr_pgl);
>>>                   dprintk("RPC:       %s: "
>>> @@ -1498,8 +1521,8 @@ rpcrdma_register_frmr_external(struct
>>> rpcrdma_mr_seg *seg,
>>>       seg1->mr_offset -= pageoff;    /* start of page */
>>>       seg1->mr_len += pageoff;
>>>       len = -pageoff;
>>> -    if (*nsegs > RPCRDMA_MAX_DATA_SEGS)
>>> -        *nsegs = RPCRDMA_MAX_DATA_SEGS;
>>> +    if (*nsegs > ia->ri_max_frmr_depth)
>>> +        *nsegs = ia->ri_max_frmr_depth;
>>>       for (page_no = i = 0; i < *nsegs;) {
>>>           rpcrdma_map_one(ia, seg, writing);
>>>           pa = seg->mr_dma;
>>> diff --git a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> b/net/sunrpc/xprtrdma/xprt_rdma.h index cc1445d..98340a3 100644
>>> --- a/net/sunrpc/xprtrdma/xprt_rdma.h
>>> +++ b/net/sunrpc/xprtrdma/xprt_rdma.h
>>> @@ -66,6 +66,7 @@ struct rpcrdma_ia {
>>>       struct completion    ri_done;
>>>       int            ri_async_rc;
>>>       enum rpcrdma_memreg    ri_memreg_strategy;
>>> +    unsigned int        ri_max_frmr_depth;
>>>   };
>>>
>>>   /*
>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> linux-rdma" in the
>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>> N�����r��y���b�X��ǧv�^�)޺{.n�+����{���"��^n�r���z�\x1a ��h����&��\x1e 
>> �G���h�\x03(�階�ݢj"��\x1a�^[m�����z�ޖ���f���h���~�mml==
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2014-05-16 14:14 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-30 19:29 [PATCH V3 00/17] NFS/RDMA client-side patches Chuck Lever
2014-04-30 19:29 ` [PATCH V3 01/17] xprtrdma: mind the device's max fast register page list depth Chuck Lever
2014-05-16  7:08   ` Devesh Sharma
2014-05-16 14:10     ` Steve Wise
2014-05-16 14:14       ` Steve Wise [this message]
2014-05-16 14:29         ` Steve Wise
2014-05-17  8:23           ` Devesh Sharma
2014-04-30 19:29 ` [PATCH V3 02/17] nfs-rdma: Fix for FMR leaks Chuck Lever
2014-04-30 19:29 ` [PATCH V3 03/17] xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context Chuck Lever
2014-04-30 19:30 ` [PATCH V3 04/17] xprtrdma: Remove BOUNCEBUFFERS memory registration mode Chuck Lever
2014-04-30 19:30 ` [PATCH V3 05/17] xprtrdma: Remove MEMWINDOWS registration modes Chuck Lever
2014-04-30 19:30 ` [PATCH V3 06/17] xprtrdma: Remove REGISTER memory registration mode Chuck Lever
2014-04-30 19:30 ` [PATCH V3 07/17] xprtrdma: Fall back to MTHCAFMR when FRMR is not supported Chuck Lever
2014-04-30 19:30 ` [PATCH V3 08/17] xprtrdma: mount reports "Invalid mount option" if memreg mode " Chuck Lever
2014-04-30 19:30 ` [PATCH V3 09/17] xprtrdma: Simplify rpcrdma_deregister_external() synopsis Chuck Lever
2014-04-30 19:30 ` [PATCH V3 10/17] xprtrdma: Make rpcrdma_ep_destroy() return void Chuck Lever
2014-04-30 19:31 ` [PATCH V3 11/17] xprtrdma: Split the completion queue Chuck Lever
2014-04-30 19:31 ` [PATCH V3 12/17] xprtrmda: Reduce lock contention in completion handlers Chuck Lever
2014-04-30 19:31 ` [PATCH V3 13/17] xprtrmda: Reduce calls to ib_poll_cq() " Chuck Lever
2014-04-30 19:31 ` [PATCH V3 14/17] xprtrdma: Limit work done by completion handler Chuck Lever
2014-04-30 19:31 ` [PATCH V3 15/17] xprtrdma: Reduce the number of hardway buffer allocations Chuck Lever
2014-04-30 19:31 ` [PATCH V3 16/17] xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting Chuck Lever
2014-04-30 19:31 ` [PATCH V3 17/17] xprtrdma: Remove Tavor MTU setting Chuck Lever
2014-05-01  7:36   ` Hal Rosenstock
     [not found] ` <20140430191433.5663.16217.stgit-FYjufvaPoItvLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2014-05-02 19:27   ` [PATCH V3 00/17] NFS/RDMA client-side patches Doug Ledford
2014-05-02 19:27   ` Doug Ledford
2014-05-02 19:27 ` Doug Ledford
     [not found] ` <5363f223.e39f420a.4af6.6fc9SMTPIN_ADDED_BROKEN@mx.google.com>
2014-05-02 20:20   ` Chuck Lever
2014-05-02 22:34     ` Doug Ledford
2014-05-02 22:34     ` Doug Ledford
2014-05-02 22:34     ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53761D28.3070704@opengridcomputing.com \
    --to=swise@opengridcomputing.com \
    --cc=Anna.Schumaker@netapp.com \
    --cc=Devesh.Sharma@Emulex.Com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).