public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Mark Zhang <markzhang@nvidia.com>
To: "Haeuptle, Michael" <michael.haeuptle@hpe.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: rdma_create_qp_ex fails with EINVAL
Date: Mon, 9 Jan 2023 15:29:46 +0800	[thread overview]
Message-ID: <719a7efe-e94e-b910-1935-ed3a3e42390c@nvidia.com> (raw)
In-Reply-To: <DS7PR84MB3110FCA7FD0A05FE103DD85495FB9@DS7PR84MB3110.NAMPRD84.PROD.OUTLOOK.COM>

On 1/7/2023 6:13 AM, Haeuptle, Michael wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hello,
> 
> I'm running into an issue where rdma_create_qp_ex returns EINVAL and I was hoping that someone could help me understand what is going on here.
> 
> The function that is actually throwing the EINVAL error is the write() call in rdma_init_qp_attr (which is being called by rdma_create_qp_ex):
> ...
>      ret = write(id->channel->fd, &cmd, sizeof cmd);
> ...
> 
> It returns -1 and sets errno to 22.
> 
> Note, this is an intermittent error and not always reproducible.
> 
> The setup and scenario is as follows:
> - SPDK NVMF target on Debian 11.3 with top of tree rdma-core libs
> - NVMe-oF kernel initiator, Debain 11.5 (no change in rdma-core libs)
> - There is a switch between initiator and SPDK NVMF targets
> - The kernel initiator is taking to 2 SPDK NVMF targets via DM and round-robin (I don't think this matters)
> - On the initiator system there is a 512k block size fio load against 48 NMF subsystems (2 target apps with 24 subsystems)
> - When I kill the SPDK target and restart it, then I occasionally get this EINVAL on one of the queue pairs
> 
> It's unclear to me why the write call is retuning EINVAL. The file descriptor should be valid since I see the same fd in later qpair creation requests.
> 
> Any insights are appreciated.
> 
> -- Michael

Maybe the cm is in a state that cannot do init_qp_attr? Do we know what 
is QP state and cm state (need to do sniffer to check what is the last 
received/sent CM packet). The file descriptor should be irrelevant.
If able to debug kernel maybe debug this function:
   drivers/infiniband/core/cma.c::rdma_init_qp_attr()
to see where this EINVAL is returned and why.



  reply	other threads:[~2023-01-09  7:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 22:13 rdma_create_qp_ex fails with EINVAL Haeuptle, Michael
2023-01-09  7:29 ` Mark Zhang [this message]
2023-01-12 22:28   ` Haeuptle, Michael

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=719a7efe-e94e-b910-1935-ed3a3e42390c@nvidia.com \
    --to=markzhang@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=michael.haeuptle@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox