From: Mark Zhang <markzhang@nvidia.com>
To: "Haeuptle, Michael" <michael.haeuptle@hpe.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: rdma_create_qp_ex fails with EINVAL
Date: Mon, 9 Jan 2023 15:29:46 +0800 [thread overview]
Message-ID: <719a7efe-e94e-b910-1935-ed3a3e42390c@nvidia.com> (raw)
In-Reply-To: <DS7PR84MB3110FCA7FD0A05FE103DD85495FB9@DS7PR84MB3110.NAMPRD84.PROD.OUTLOOK.COM>
On 1/7/2023 6:13 AM, Haeuptle, Michael wrote:
> External email: Use caution opening links or attachments
>
>
> Hello,
>
> I'm running into an issue where rdma_create_qp_ex returns EINVAL and I was hoping that someone could help me understand what is going on here.
>
> The function that is actually throwing the EINVAL error is the write() call in rdma_init_qp_attr (which is being called by rdma_create_qp_ex):
> ...
> ret = write(id->channel->fd, &cmd, sizeof cmd);
> ...
>
> It returns -1 and sets errno to 22.
>
> Note, this is an intermittent error and not always reproducible.
>
> The setup and scenario is as follows:
> - SPDK NVMF target on Debian 11.3 with top of tree rdma-core libs
> - NVMe-oF kernel initiator, Debain 11.5 (no change in rdma-core libs)
> - There is a switch between initiator and SPDK NVMF targets
> - The kernel initiator is taking to 2 SPDK NVMF targets via DM and round-robin (I don't think this matters)
> - On the initiator system there is a 512k block size fio load against 48 NMF subsystems (2 target apps with 24 subsystems)
> - When I kill the SPDK target and restart it, then I occasionally get this EINVAL on one of the queue pairs
>
> It's unclear to me why the write call is retuning EINVAL. The file descriptor should be valid since I see the same fd in later qpair creation requests.
>
> Any insights are appreciated.
>
> -- Michael
Maybe the cm is in a state that cannot do init_qp_attr? Do we know what
is QP state and cm state (need to do sniffer to check what is the last
received/sent CM packet). The file descriptor should be irrelevant.
If able to debug kernel maybe debug this function:
drivers/infiniband/core/cma.c::rdma_init_qp_attr()
to see where this EINVAL is returned and why.
next prev parent reply other threads:[~2023-01-09 7:30 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-06 22:13 rdma_create_qp_ex fails with EINVAL Haeuptle, Michael
2023-01-09 7:29 ` Mark Zhang [this message]
2023-01-12 22:28 ` Haeuptle, Michael
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=719a7efe-e94e-b910-1935-ed3a3e42390c@nvidia.com \
--to=markzhang@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=michael.haeuptle@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox