From: "Steve Wise" <swise@opengridcomputing.com>
To: 'Potnuri Bharat Teja' <bharat@chelsio.com>,
'Sagi Grimberg' <sagi@grimberg.me>
Cc: target-devel@vger.kernel.org, nab@linux-iscsi.org,
linux-rdma@vger.kernel.org
Subject: RE: RQ overflow seen running isert traffic
Date: Mon, 17 Oct 2016 13:29:41 -0500 [thread overview]
Message-ID: <021001d228a4$6cd6a6c0$4683f440$@opengridcomputing.com> (raw)
In-Reply-To: <20161017111655.GA21245@chelsio.com>
> On Wednesday, October 10/05/16, 2016 at 11:44:12 +0530, Sagi Grimberg wrote:
> >
> > > Hi Sagi,
> >
> > Hey Baharat,
> >
> > Sorry for the late response, its the holiday
> > season in Israel...
> >
> > > I've been trying to understand the isert functionality with respect to
> > > RDMA Receive Queue sizing and Queue full handling. Here is the problem
> > > is see with iw_cxgb4:
> > >
> > > After running few minutes of iSER traffic with iw_cxgb4, I am seeing
> > > post receive failures due to receive queue full returning -ENOMEM.
> > > In case of iw_cxgb4 the RQ size is 130 with qp attribute max_recv_wr =
129,
> > > passed down by isert to iw_cxgb4.isert decides on max_recv_wr as 129 based
> > > on (ISERT_QP_MAX_RECV_DTOS = ISCSI_DEF_XMIT_CMDS_MAX = 128) + 1.
> >
> > That's correct.
>
> Hi Sagi,
> Sorry for the late reply, I had to recheck my findings before I reply
> you back.
>
> My interpretation of the queue full issue was not complete, got carried
> away by the recieve queue and missed the SQ failure due to full among
> the debug logs.
>
> Here is what was happening:
> It was the SQ full first and fails to get posted with ENOMEM, due to this
> the command is queued to queue full list this will schedule it to post at
> later instance and this repeated try for posting will cause the rq to get
> full. For everyfurther try sq post will any way fail and an extra rq wr is
> posted as a part of datain() leading it to be full too.
>
> This happened a bit earlier in my case, since I corrected the the
> iser_put_datain() to return error to LIO.
>
> Here is the failure log on a fresh 4.8 kernel:
> isert: isert_rdma_rw_ctx_post: Cmd: ffff882ec8c96e60 failed to post RDMA
> res <===here is the post send failure due to ENOMEM.
> ABORT_TASK: Found referenced iSCSI task_tag: 33
> ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 33
> isert: isert_post_recv: ib_post_recv() failed with ret: -22
> isert: isert_post_response: ib_post_recv failed with -22
> isert: isert_post_recv: ib_post_recv() failed with ret: -22
> isert: isert_post_response: ib_post_recv failed with -22
> isert: isert_post_recv: ib_post_recv() failed with ret: -22
> isert: isert_post_response: ib_post_recv failed with -22
> isert: isert_post_recv: ib_post_recv() failed with ret: -22
> isert: isert_post_response: ib_post_recv failed with -22
> isert: isert_post_recv: ib_post_recv() failed with ret: -22
> isert: isert_post_response: ib_post_recv failed with -22
> cxgb4 0000:84:00.4: AE qpid 1026 opcode 10 status 0x1 type 1 len 0x0
> wrid.hi 0x0 wrid.lo 0x4c01
> isert: isert_qp_event_callback: QP access error (3): conn
> ffff8817ddc46000
> iSCSI Login timeout on Network Portal 10.40.40.198:3260
> INFO: task iscsi_np:14744 blocked for more than 120 seconds.
>
> Here are the QP and CQ stats before my assert for queue full:
> qhp->attr.sq_num_entries 523
> qhp->attr.rq_num_entries 129
> qhp->wq.sq.qid 1026
> qhp->wq.rq.qid 1027
> qhp->wq.sq.in_use 523 <=== SQ to the brim
> qhp->wq.sq.size 524 <=== SQ size
> qhp->wq.sq.cidx 391
> qhp->wq.sq.pidx 390
> qhp->wq.sq.wq_pidx 202
> qhp->wq.sq.wq_pidx_inc 0
> qhp->wq.sq.flush_cidx 391
> qhp->wq.rq.in_use 128
> qhp->wq.rq.size 130
> qhp->wq.rq.cidx 112
> qhp->wq.rq.pidx 110
> qhp->wq.rq.wq_pidx 240
> qhp->wq.rq.wq_pidx_inc 0
> qhp->wq.flushed 0
> chp->cq.cqid 1024
> chp->cq.size 6335
> chp->cq.cidx 4978
> chp->cq.sw_cidx 4126
> chp->cq.sw_pidx 4126
> chp->cq.sw_in_use 0
> chp->cq.cidx_inc 0
>
> As an experiment I tried increasing the ISCSI_DEF_XMIT_CMDS_MAX to 256
> instead of 128, which incase of iwarp creates SQ with size 1548, the
> issue is not seen.
> I doubt if the SQ is not sized properly incase of iWARP Or factored
> incorrectly in the rdma api for iser IOP.
>
> I am digging for root cause.
>
> Thanks for your time,
> Bharat.
>
Hey Sagi, I'm looking at isert_create_qp() and it appears to not be correctly
sizing the SQ:
...
#define ISERT_QP_MAX_REQ_DTOS (ISCSI_DEF_XMIT_CMDS_MAX + \
ISERT_MAX_TX_MISC_PDUS + \
ISERT_MAX_RX_MISC_PDUS)
...
attr.cap.max_send_wr = ISERT_QP_MAX_REQ_DTOS + 1;
attr.cap.max_recv_wr = ISERT_QP_MAX_RECV_DTOS + 1;
...
I think above snipit assumes a DTO consumes exactly one WR/WQE in the SQ. But
the DTO can be broken into multiple WRs to handle REG_MRs, multiple WRITE or
READ WRs due to limits on local sge depths target sge depths, etc. Yes? Or am
I all wet? Or perhaps isert doesn't require the SQ to be the max possible
because it flow controls the DTO submissions?
Stevo
next prev parent reply other threads:[~2016-10-17 18:29 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-27 7:01 RQ overflow seen running isert traffic Potnuri Bharat Teja
[not found] ` <20160927070157.GA13140-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2016-09-29 14:12 ` Steve Wise
2016-10-05 6:14 ` Sagi Grimberg
2016-10-17 11:16 ` Potnuri Bharat Teja
2016-10-17 18:29 ` Steve Wise [this message]
2016-10-18 8:04 ` Sagi Grimberg
2016-10-18 11:28 ` SQ " Potnuri Bharat Teja
2016-10-18 13:17 ` Sagi Grimberg
[not found] ` <ed7ebb39-be81-00b3-ef23-3f4c0e3afbb1-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2016-10-18 14:34 ` Steve Wise
2016-10-18 16:13 ` Jason Gunthorpe
2016-10-18 19:03 ` Steve Wise
2016-10-20 8:34 ` Sagi Grimberg
[not found] ` <f7a4b395-1786-3c7a-7639-195e830db5ad-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-20 13:05 ` Potnuri Bharat Teja
2017-03-20 15:04 ` Steve Wise
2016-10-31 3:40 ` Nicholas A. Bellinger
2016-11-02 17:03 ` Steve Wise
[not found] ` <1477885208.27946.8.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2016-11-08 10:06 ` Potnuri Bharat Teja
2017-03-20 10:15 ` Potnuri Bharat Teja
2017-03-21 6:32 ` Nicholas A. Bellinger
2017-03-21 7:51 ` Potnuri Bharat Teja
[not found] ` <20170321075131.GA11565-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 13:52 ` Sagi Grimberg
[not found] ` <945e2947-f67a-4202-cd27-d4631fe10f68-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 15:25 ` [SPAMMY (7.002)]Re: " Potnuri Bharat Teja
[not found] ` <20170321152506.GA32655-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2017-03-21 16:38 ` Sagi Grimberg
[not found] ` <4dab6b43-20d3-86f0-765a-be0851e9f4a0-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-03-21 17:50 ` Potnuri Bharat Teja
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='021001d228a4$6cd6a6c0$4683f440$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
--cc=bharat@chelsio.com \
--cc=linux-rdma@vger.kernel.org \
--cc=nab@linux-iscsi.org \
--cc=sagi@grimberg.me \
--cc=target-devel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).