From: swise@opengridcomputing.com (Steve Wise)
Subject: Crash in nvmet_req_init() - null req->rsp pointer
Date: Fri, 31 Aug 2018 08:01:08 -0500 [thread overview]
Message-ID: <02b101d4412a$afc71a00$0f554e00$@opengridcomputing.com> (raw)
In-Reply-To: <0ce90ceb-ebe7-4162-ce07-4ddd27207ecd@grimberg.me>
>
>
> >> Hey Sagi, it hits the empty rsp list path often with your debug patch.
> >> I added code to BUG_ON() after 10 times and I have a crash dump I'm
> >> looking at.
> >>
> >> Isn't the rsp list supposed to be sized such that it will never be empty
> >> when a new rsp is needed? I wonder if there is a leak.
>
> Doesn't look from my scan..
>
> > I do see that during this heavy load, the rdma send queue "full"
> > condition gets hit often:
> >
> > static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
> > {
> > struct nvmet_rdma_queue *queue = rsp->queue;
> >
> > if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
> > &queue->sq_wr_avail) < 0)) {
> > pr_debug("IB send queue full (needed %d): queue %u
> > cntlid %u\n",
> > 1 + rsp->n_rdma, queue->idx,
> > queue->nvme_sq.ctrl->cntlid);
> > atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
> > return false;
> > }
> >
> > ...
> >
> > So commands are getting added to the wr_wait list:
> >
> > static void nvmet_rdma_handle_command(struct nvmet_rdma_queue
> *queue,
> > struct nvmet_rdma_rsp *cmd)
> > {
> > ...
> > if (unlikely(!nvmet_rdma_execute_command(cmd))) {
> > spin_lock(&queue->rsp_wr_wait_lock);
> > list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
> > spin_unlock(&queue->rsp_wr_wait_lock);
> > }
> > ...
> >
> >
> > Perhaps there's some bug in the wr_wait_list processing of deferred
> > commands? I don't see anything though.
>
> I assume this could happen if under heavy load the device send
> completions are slower than the rate incoming commands arrival
> (perhaps device and/or sw).
>
> Because we post recv before sending the response back, there is
> a window where host can send us a new command before the send
> completion
> arrived, this is why we allocate more.
>
> However, I think that nothing prevents that under heavy load the gap
> is growing until we exhaust 2x rsps.
>
> So perhaps this is something we actually need to account for it...
Thanks for the explanation. Yes, I believe we do. Will you post the formal patch? If it is the same as the one I already confirmed, you can add my test-by tag.
Thanks,
Steve.
prev parent reply other threads:[~2018-08-31 13:01 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1eb5a94a-07ed-1c2a-47fe-cc9a4bf32466@opengridcomputing.com>
[not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
2018-08-20 20:47 ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08 ` Steve Wise
2018-08-27 18:24 ` Steve Wise
2018-08-27 20:29 ` Steve Wise
2018-08-31 0:31 ` Sagi Grimberg
2018-08-31 13:01 ` Steve Wise [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='02b101d4412a$afc71a00$0f554e00$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).