From: sagi@grimberg.me (Sagi Grimberg)
Subject: Crash in nvmet_req_init() - null req->rsp pointer
Date: Thu, 30 Aug 2018 17:31:52 -0700 [thread overview]
Message-ID: <0ce90ceb-ebe7-4162-ce07-4ddd27207ecd@grimberg.me> (raw)
In-Reply-To: <0bcbcb2e-0b01-57ef-0b9c-9ac3e56322cb@opengridcomputing.com>
>> Hey Sagi, it hits the empty rsp list path often with your debug patch.
>> I added code to BUG_ON() after 10 times and I have a crash dump I'm
>> looking at.
>>
>> Isn't the rsp list supposed to be sized such that it will never be empty
>> when a new rsp is needed? I wonder if there is a leak.
Doesn't look from my scan..
> I do see that during this heavy load, the rdma send queue "full"
> condition gets hit often:
>
> static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
> {
> struct nvmet_rdma_queue *queue = rsp->queue;
>
> if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
> &queue->sq_wr_avail) < 0)) {
> pr_debug("IB send queue full (needed %d): queue %u
> cntlid %u\n",
> 1 + rsp->n_rdma, queue->idx,
> queue->nvme_sq.ctrl->cntlid);
> atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
> return false;
> }
>
> ...
>
> So commands are getting added to the wr_wait list:
>
> static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
> struct nvmet_rdma_rsp *cmd)
> {
> ...
> if (unlikely(!nvmet_rdma_execute_command(cmd))) {
> spin_lock(&queue->rsp_wr_wait_lock);
> list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
> spin_unlock(&queue->rsp_wr_wait_lock);
> }
> ...
>
>
> Perhaps there's some bug in the wr_wait_list processing of deferred
> commands? I don't see anything though.
I assume this could happen if under heavy load the device send
completions are slower than the rate incoming commands arrival
(perhaps device and/or sw).
Because we post recv before sending the response back, there is
a window where host can send us a new command before the send completion
arrived, this is why we allocate more.
However, I think that nothing prevents that under heavy load the gap
is growing until we exhaust 2x rsps.
So perhaps this is something we actually need to account for it...
next prev parent reply other threads:[~2018-08-31 0:31 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1eb5a94a-07ed-1c2a-47fe-cc9a4bf32466@opengridcomputing.com>
[not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
2018-08-20 20:47 ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08 ` Steve Wise
2018-08-27 18:24 ` Steve Wise
2018-08-27 20:29 ` Steve Wise
2018-08-31 0:31 ` Sagi Grimberg [this message]
2018-08-31 13:01 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0ce90ceb-ebe7-4162-ce07-4ddd27207ecd@grimberg.me \
--to=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).