linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: [resend] Crash in nvmet_req_init() - null req->rsp pointer
Date: Wed, 22 Aug 2018 10:08:25 -0500	[thread overview]
Message-ID: <4fb0053a-1493-369e-cac5-a91133ad9499@opengridcomputing.com> (raw)
In-Reply-To: <31a08853-e2a7-709a-9251-e6c64fda22dd@grimberg.me>



On 8/20/2018 3:47 PM, Sagi Grimberg wrote:
> 
>> Resending in plain text...
>>
>> ----
>>
>> Hey guys,
>>
>> I'm debugging a nvmet_rdma crash on the linux-4.14.52 stable kernel
>> code.? Under heavy load, including 80 nvmf devices, after 13 hours of
>> running, I see an Oops [1] when the target is processing a new ingress
>> nvme command.? It crashes in nvmet_req_init() because req->rsp is NULL:
>>
>> ?? 493?? bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
>> ?? 494?????????????????? struct nvmet_sq *sq, struct nvmet_fabrics_ops
>> *ops)
>> ?? 495?? {
>> ?? 496?????????? u8 flags = req->cmd->common.flags;
>> ?? 497?????????? u16 status;
>> ?? 498
>> ?? 499?????????? req->cq = cq;
>> ?? 500?????????? req->sq = sq;
>> ?? 501?????????? req->ops = ops;
>> ?? 502?????????? req->sg = NULL;
>> ?? 503?????????? req->sg_cnt = 0;
>> ?? 504?????????? req->rsp->status = 0; <-- HERE
>>
>> The? nvme command opcode is nvme_cmd_write.? The nvmet_rdma_queue state
>> is NVMET_RDMA_Q_LIVE.? The nvmet_req looks valid [2].? IE not garbage.
>> But it seems very bad that req->rsp is NULL! :)
>>
>> Any thoughts?? I didn't see anything like this in recent nvmf fixes...
> 
> Is it possible that you ran out of rsps and got a corrupted rsp?

That is what it looks like.  Because the nvmet_rdma_rsp pointer returned
from nvmet_rdma_get_rsp() was not within the bounds of the response
struct array allocated for that queue.  I wasn't sure how that could
happen?  I would think first_entry() would return NULL if the list was
empty, but I guess not.


> 
> How about trying out this patch to add more information:

I'll try this out.  Thanks Sagi!

> -- 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index e7f43d1e1779..890d9c45ca33 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -66,6 +66,7 @@ struct nvmet_rdma_rsp {
> 
> ??????? struct nvmet_req??????? req;
> 
> +?????? bool??????????????????? allocated;
> ??????? u8????????????????????? n_rdma;
> ??????? u32???????????????????? flags;
> ??????? u32???????????????????? invalidate_rkey;
> @@ -174,11 +175,20 @@ nvmet_rdma_get_rsp(struct nvmet_rdma_queue *queue)
> ??????? unsigned long flags;
> 
> ??????? spin_lock_irqsave(&queue->rsps_lock, flags);
> -?????? rsp = list_first_entry(&queue->free_rsps,
> +?????? rsp = list_first_entry_or_null(&queue->free_rsps,
> ??????????????????????????????? struct nvmet_rdma_rsp, free_list);
> -?????? list_del(&rsp->free_list);
> +?????? if (rsp) {
> +?????????????? list_del(&rsp->free_list);
> +?????????????? rsp->allocated = false;
> +?????? }
> ??????? spin_unlock_irqrestore(&queue->rsps_lock, flags);
> 
> +?????? if (!rsp) {
> +?????????????? pr_debug("dynamically allocated rsp\n");
> +?????????????? rsp = kmalloc(sizeof(*rsp), GFP_KERNEL);
> +?????????????? rsp->allocated = true;
> +?????? }
> +
> ??????? return rsp;
> ?}
> 
> @@ -187,6 +197,11 @@ nvmet_rdma_put_rsp(struct nvmet_rdma_rsp *rsp)
> ?{
> ??????? unsigned long flags;
> 
> +?????? if (rsp->allocated) {
> +?????????????? kfree(rsp);
> +?????????????? return;
> +?????? }
> +
> ??????? spin_lock_irqsave(&rsp->queue->rsps_lock, flags);
> ??????? list_add_tail(&rsp->free_list, &rsp->queue->free_rsps);
> ??????? spin_unlock_irqrestore(&rsp->queue->rsps_lock, flags);
> -- 
> 

  reply	other threads:[~2018-08-22 15:08 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1eb5a94a-07ed-1c2a-47fe-cc9a4bf32466@opengridcomputing.com>
     [not found] ` <d35d93e6-2e7b-a681-039b-75edaea7a9a2@opengridcomputing.com>
2018-08-20 20:47   ` [resend] Crash in nvmet_req_init() - null req->rsp pointer Sagi Grimberg
2018-08-22 15:08     ` Steve Wise [this message]
2018-08-27 18:24     ` Steve Wise
2018-08-27 20:29       ` Steve Wise
2018-08-31  0:31         ` Sagi Grimberg
2018-08-31 13:01           ` Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4fb0053a-1493-369e-cac5-a91133ad9499@opengridcomputing.com \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).