From: mlin@kernel.org (Ming Lin)
Subject: crash on device removal
Date: Tue, 12 Jul 2016 13:40:54 -0700 [thread overview]
Message-ID: <1468356054.5426.1.camel@ssi> (raw)
In-Reply-To: <00cb01d1dc5b$51c05970$f5410c50$@opengridcomputing.com>
On Tue, 2016-07-12@11:34 -0500, Steve Wise wrote:
> Hey Christoph,
>
> I see a crash when shutting down a nvme host node via 'reboot' that has 1 target
> device attached. The shutdown causes iw_cxgb4 to be removed which triggers the
> device removal logic in the nvmf rdma transport. The crash is here:
>
> (gdb) list *nvme_rdma_free_qe+0x18
> 0x1e8 is in nvme_rdma_free_qe (drivers/nvme/host/rdma.c:196).
> 191 }
> 192
> 193 static void nvme_rdma_free_qe(struct ib_device *ibdev, struct
> nvme_rdma_qe *qe,
> 194 size_t capsule_size, enum dma_data_direction dir)
> 195 {
> 196 ib_dma_unmap_single(ibdev, qe->dma, capsule_size, dir);
> 197 kfree(qe->data);
> 198 }
> 199
> 200 static int nvme_rdma_alloc_qe(struct ib_device *ibdev, struct
> nvme_rdma_qe *qe,
>
> Apparently qe is NULL.
>
> Looking at the device removal path, the logic appears correct (see
> nvme_rdma_device_unplug() and the nice function comment :) ). I'm wondering if
> concurrently to the host device removal path cleaning up queues, the target is
> disconnecting all of its queues due to the first disconnect event from the host
> causing some cleanup race on the host side? Although since the removal path
> executing in the cma event handler upcall, I don't think another thread would be
> handling a disconnect event. Maybe the qp async event handler flow?
>
> Thoughts?
We actually missed a kref_get in nvme_get_ns_from_disk().
This should fix it. Could you help to verify?
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4babdf0..b146f52 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -183,6 +183,8 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
}
spin_unlock(&dev_list_lock);
+ kref_get(&ns->ctrl->kref);
+
return ns;
fail_put_ns:
WARNING: multiple messages have this Message-ID (diff)
From: Ming Lin <mlin-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Steve Wise <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Cc: Christoph Hellwig <hch-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
sagig <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: Re: crash on device removal
Date: Tue, 12 Jul 2016 13:40:54 -0700 [thread overview]
Message-ID: <1468356054.5426.1.camel@ssi> (raw)
In-Reply-To: <00cb01d1dc5b$51c05970$f5410c50$@opengridcomputing.com>
On Tue, 2016-07-12 at 11:34 -0500, Steve Wise wrote:
> Hey Christoph,
>
> I see a crash when shutting down a nvme host node via 'reboot' that has 1 target
> device attached. The shutdown causes iw_cxgb4 to be removed which triggers the
> device removal logic in the nvmf rdma transport. The crash is here:
>
> (gdb) list *nvme_rdma_free_qe+0x18
> 0x1e8 is in nvme_rdma_free_qe (drivers/nvme/host/rdma.c:196).
> 191 }
> 192
> 193 static void nvme_rdma_free_qe(struct ib_device *ibdev, struct
> nvme_rdma_qe *qe,
> 194 size_t capsule_size, enum dma_data_direction dir)
> 195 {
> 196 ib_dma_unmap_single(ibdev, qe->dma, capsule_size, dir);
> 197 kfree(qe->data);
> 198 }
> 199
> 200 static int nvme_rdma_alloc_qe(struct ib_device *ibdev, struct
> nvme_rdma_qe *qe,
>
> Apparently qe is NULL.
>
> Looking at the device removal path, the logic appears correct (see
> nvme_rdma_device_unplug() and the nice function comment :) ). I'm wondering if
> concurrently to the host device removal path cleaning up queues, the target is
> disconnecting all of its queues due to the first disconnect event from the host
> causing some cleanup race on the host side? Although since the removal path
> executing in the cma event handler upcall, I don't think another thread would be
> handling a disconnect event. Maybe the qp async event handler flow?
>
> Thoughts?
We actually missed a kref_get in nvme_get_ns_from_disk().
This should fix it. Could you help to verify?
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4babdf0..b146f52 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -183,6 +183,8 @@ static struct nvme_ns *nvme_get_ns_from_disk(struct gendisk *disk)
}
spin_unlock(&dev_list_lock);
+ kref_get(&ns->ctrl->kref);
+
return ns;
fail_put_ns:
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-07-12 20:40 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-12 16:34 crash on device removal Steve Wise
2016-07-12 16:34 ` Steve Wise
2016-07-12 20:40 ` Ming Lin [this message]
2016-07-12 20:40 ` Ming Lin
2016-07-12 21:09 ` Steve Wise
2016-07-12 21:09 ` Steve Wise
2016-07-12 21:47 ` Ming Lin
2016-07-12 21:47 ` Ming Lin
2016-07-12 22:17 ` Steve Wise
2016-07-12 22:17 ` Steve Wise
2016-07-13 10:06 ` Sagi Grimberg
2016-07-13 10:06 ` Sagi Grimberg
2016-07-13 10:02 ` Sagi Grimberg
2016-07-13 10:02 ` Sagi Grimberg
2016-07-13 15:05 ` Steve Wise
2016-07-13 15:05 ` Steve Wise
2016-07-13 16:17 ` Steve Wise
2016-07-13 16:17 ` Steve Wise
[not found] <00cc01d1dc5b$51c7fa90$f557efb0$@opengridcomputing.com>
2016-07-12 16:38 ` Steve Wise
2016-07-12 16:38 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1468356054.5426.1.camel@ssi \
--to=mlin@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.