From: swise@opengridcomputing.com (Steve Wise)
Subject: Crash in nvmet_req_init() - null req->rsp pointer
Date: Mon, 27 Aug 2018 15:29:48 -0500 [thread overview]
Message-ID: <0bcbcb2e-0b01-57ef-0b9c-9ac3e56322cb@opengridcomputing.com> (raw)
In-Reply-To: <a165480b-7fcd-d208-8ce0-a1d7cdab24e9@opengridcomputing.com>
On 8/27/2018 1:24 PM, Steve Wise wrote:
>
>
> On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
>>
>>> Resending in plain text...
>>>
>>> ----
>>>
>>> Hey guys,
>>>
>>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>>> running, I see an Oops [1] when the target is processing a new ingress
>>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>>
>>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>>> *ops)
>>> ?? 495?? {
>>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>>> ?? 497?????????? u16 status;
>>> ?? 498
>>> ?? 499?????????? req->cq = cq;
>>> ?? 500?????????? req->sq = sq;
>>> ?? 501?????????? req->ops = ops;
>>> ?? 502?????????? req->sg = NULL;
>>> ?? 503?????????? req->sg_cnt = 0;
>>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>>
>>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>>> But it seems very bad that req->rsp is NULL! :)
>>>
>>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
>>
>> Is it possible that you ran out of rsps and got a corrupted rsp?
>>
>> How about trying out this patch to add more information:
>> --
>
> Hey Sagi, it hits the empty rsp list path often with your debug patch.
> I added code to BUG_ON() after 10 times and I have a crash dump I'm
> looking at.
>
> Isn't the rsp list supposed to be sized such that it will never be empty
> when a new rsp is needed? I wonder if there is a leak.
>
> Steve.
>
I do see that during this heavy load, the rdma send queue "full"
condition gets hit often:
static bool nvmet_rdma_execute_command(struct nvmet_rdma_rsp *rsp)
{
struct nvmet_rdma_queue *queue = rsp->queue;
if (unlikely(atomic_sub_return(1 + rsp->n_rdma,
&queue->sq_wr_avail) < 0)) {
pr_debug("IB send queue full (needed %d): queue %u
cntlid %u\n",
1 + rsp->n_rdma, queue->idx,
queue->nvme_sq.ctrl->cntlid);
atomic_add(1 + rsp->n_rdma, &queue->sq_wr_avail);
return false;
}
...
So commands are getting added to the wr_wait list:
static void nvmet_rdma_handle_command(struct nvmet_rdma_queue *queue,
struct nvmet_rdma_rsp *cmd)
{
...
if (unlikely(!nvmet_rdma_execute_command(cmd))) {
spin_lock(&queue->rsp_wr_wait_lock);
list_add_tail(&cmd->wait_list, &queue->rsp_wr_wait_list);
spin_unlock(&queue->rsp_wr_wait_lock);
}
...
Perhaps there's some bug in the wr_wait_list processing of deferred
commands? I don't see anything though.
next prev parent reply other threads:[~2018-08-27 20:29 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1eb5a94a-07ed-1c2a-47fe-cc9a4bf32466@opengridcomputing.com>
[not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
2018-08-20 20:47 ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08 ` Steve Wise
2018-08-27 18:24 ` Steve Wise
2018-08-27 20:29 ` Steve Wise [this message]
2018-08-31 0:31 ` Sagi Grimberg
2018-08-31 13:01 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0bcbcb2e-0b01-57ef-0b9c-9ac3e56322cb@opengridcomputing.com \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).