linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: Crash in nvmet_req_init() - null req->rsp pointer
Date: Fri, 31 Aug 2018 08:01:08 -0500	[thread overview]
Message-ID: <02b101d4412a$afc71a00$0f554e00$@opengridcomputing.com> (raw)
In-Reply-To: <0ce90ceb-ebe7-4162-ce07-4ddd27207ecd@grimberg.me>

> 
> 
> >> Hey Sagi, it hits the empty rsp list path often with your debug patch.
> >> I added code to BUG_ON() after 10 times and I have a crash dump I'm
> >> looking at.
> >>
> >> Isn't the rsp list supposed to be sized such that it will never be empty
> >> when a new rsp is needed?  I wonder if there is a leak.
> 
> Doesn't look from my scan..
> 
> > I do see that during this heavy load, the rdma send queue "full"
> > condition gets hit often:
> >
> > static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
> > {
> >          struct nvmet_rdma_queue *queue = rsp->queue;
> >
> >          if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
> >                          &queue->sq_wr_avail) < 0)) {
> >                  pr_debug("IB send queue full (needed %d): queue %u
> > cntlid %u\n",
> >                                  1 + rsp->n_rdma, queue->idx,
> >                                  queue->nvme_sq.ctrl->cntlid);
> >                  atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
> >                  return false;
> >          }
> >
> > ...
> >
> > So commands are getting added to the wr_wait list:
> >
> > static void nvmet_rdma_handle_command(struct nvmet_rdma_queue
> *queue,
> >                  struct nvmet_rdma_rsp *cmd)
> > {
> > ...
> >          if (unlikely(!nvmet_rdma_execute_command(cmd))) {
> >                  spin_lock(&queue->rsp_wr_wait_lock);
> >                  list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
> >                  spin_unlock(&queue->rsp_wr_wait_lock);
> >          }
> > ...
> >
> >
> > Perhaps there's some bug in the wr_wait_list processing of deferred
> > commands?  I don't see anything though.
> 
> I assume this could happen if under heavy load the device send
> completions are slower than the rate incoming commands arrival
> (perhaps device and/or sw).
> 
> Because we post recv before sending the response back, there is
> a window where host can send us a new command before the send
> completion
> arrived, this is why we allocate more.
> 
> However, I think that nothing prevents that under heavy load the gap
> is growing until we exhaust 2x rsps.
> 
> So perhaps this is something we actually need to account for it...

Thanks for the explanation.  Yes, I believe we do.  Will you post the formal patch?  If it is the same as the one I already confirmed, you can add my test-by tag.

Thanks,

Steve.

      reply	other threads:[~2018-08-31 13:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1eb5a94a-07ed-1c2a-47fe-cc9a4bf32466@opengridcomputing.com>
     [not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
2018-08-20 20:47   ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08     ` Steve Wise
2018-08-27 18:24     ` Steve Wise
2018-08-27 20:29       ` Steve Wise
2018-08-31  0:31         ` Sagi Grimberg
2018-08-31 13:01           ` Steve Wise [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='02b101d4412a$afc71a00$0f554e00$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).