From: "Liu, Changcheng" <changcheng.liu@intel.com>
To: Tom Talpey <tom@talpey.com>
Cc: linux-rdma@vger.kernel.org
Subject: Re: CX314A WCE error: WR_FLUSH_ERR
Date: Wed, 21 Aug 2019 23:38:44 +0800 [thread overview]
Message-ID: <20190821153844.GA4545@jerryopenix> (raw)
In-Reply-To: <6aed3f75-2445-eb6f-0bd8-7c79ea4a0967@talpey.com>
On 09:36 Wed 21 Aug, Tom Talpey wrote:
> On 8/21/2019 8:09 AM, Liu, Changcheng wrote:
> > Hi all,
> > In one system, it always frequently hit "IBV_WC_WR_FLUSH_ERR" in the WCE(work completion element) polled from completion queue bound with RQ(Receive Queue).
> > Does anyone has some idea to debug "IBV_WC_WR_FLUSH_ERR" problem?
> >
> > With CX314A/40Gb NIC, I hit this error when using RC transport type with only Send Operation(IBV_WR_SEND) WR(work request) on SQ(Send Queue).
> > Every WR only has one SGE(scatter/gather element) and all the SGE on RQ has the same size. The SGE size in SQ WR is not greater than the SGE size in RQ WR.
> >
> > There’s one explanation about IBV_WC_WR_FLUSH_ERR on page 114 in the "RDMA Aware Networks Programming User Manual" http://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf
> > But I still didn't understand it well. How to trigger this error with a short demo program?
> > "
> > IBV_WC_WR_FLUSH_ERR
> > This event is generated when an invalid remote error is thrown when the responder detects an
> > invalid request. It may be that the operation is not supported by the request queue or there is
> > insufficient buffer space to receive the request.
> > "
>
> The most common reason for a flushed work request is loss of
> the connection to the remote peer. This can be caused by any
> number of conditions.
Good diretion. I'll debug it in this way first.
>
> The second-most common is a programming error in the upper
> layer protocol. A shortage of posted receives on either peer,
> a protection error on some buffer, etc.
Do you mean the protection key such as l_key/r_key isn't set well?
What's kind of protection error could trigger IBV_WC_WR_FLUSH_ERR?
>
> If you're looking to actually trigger this error for testing,
> well, try one of the above. If you're trying to figure out
> why it's happening, that can take some digging, but not in
> the RDMA stack, typically.
Many thanks.
--Changcheng
>
> Tom.
>
next prev parent reply other threads:[~2019-08-21 15:40 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-21 12:09 CX314A WCE error: WR_FLUSH_ERR Liu, Changcheng
2019-08-21 13:36 ` Tom Talpey
2019-08-21 15:38 ` Liu, Changcheng [this message]
2019-08-21 18:47 ` Doug Ledford
2019-08-22 15:01 ` Liu, Changcheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190821153844.GA4545@jerryopenix \
--to=changcheng.liu@intel.com \
--cc=linux-rdma@vger.kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.