From: "Liu, Changcheng" <changcheng.liu@intel.com>
To: Tom Talpey <tom@talpey.com>
Cc: linux-rdma@vger.kernel.org
Subject: Re: CX314A WCE error: WR_FLUSH_ERR
Date: Wed, 21 Aug 2019 23:38:44 +0800 [thread overview]
Message-ID: <20190821153844.GA4545@jerryopenix> (raw)
In-Reply-To: <6aed3f75-2445-eb6f-0bd8-7c79ea4a0967@talpey.com>
On 09:36 Wed 21 Aug, Tom Talpey wrote:
> On 8/21/2019 8:09 AM, Liu, Changcheng wrote:
> > Hi all,
> > In one system, it always frequently hit "IBV_WC_WR_FLUSH_ERR" in the WCE(work completion element) polled from completion queue bound with RQ(Receive Queue).
> > Does anyone has some idea to debug "IBV_WC_WR_FLUSH_ERR" problem?
> >
> > With CX314A/40Gb NIC, I hit this error when using RC transport type with only Send Operation(IBV_WR_SEND) WR(work request) on SQ(Send Queue).
> > Every WR only has one SGE(scatter/gather element) and all the SGE on RQ has the same size. The SGE size in SQ WR is not greater than the SGE size in RQ WR.
> >
> > There’s one explanation about IBV_WC_WR_FLUSH_ERR on page 114 in the "RDMA Aware Networks Programming User Manual" http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf
> > But I still didn't understand it well. How to trigger this error with a short demo program?
> > "
> > IBV_WC_WR_FLUSH_ERR
> > This event is generated when an invalid remote error is thrown when the responder detects an
> > invalid request. It may be that the operation is not supported by the request queue or there is
> > insufficient buffer space to receive the request.
> > "
>
> The most common reason for a flushed work request is loss of
> the connection to the remote peer. This can be caused by any
> number of conditions.
Good diretion. I'll debug it in this way first.
>
> The second-most common is a programming error in the upper
> layer protocol. A shortage of posted receives on either peer,
> a protection error on some buffer, etc.
Do you mean the protection key such as l_key/r_key isn't set well?
What's kind of protection error could trigger IBV_WC_WR_FLUSH_ERR?
>
> If you're looking to actually trigger this error for testing,
> well, try one of the above. If you're trying to figure out
> why it's happening, that can take some digging, but not in
> the RDMA stack, typically.
Many thanks.
--Changcheng
>
> Tom.
>
next prev parent reply other threads:[~2019-08-21 15:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-21 12:09 CX314A WCE error: WR_FLUSH_ERR Liu, Changcheng
2019-08-21 13:36 ` Tom Talpey
2019-08-21 15:38 ` Liu, Changcheng [this message]
2019-08-21 18:47 ` Doug Ledford
2019-08-22 15:01 ` Liu, Changcheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190821153844.GA4545@jerryopenix \
--to=changcheng.liu@intel.com \
--cc=linux-rdma@vger.kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox