From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH] nvme-fabrics: get ctrl reference in nvmf_dev_write
Date: Wed, 13 Jul 2016 09:45:48 -0500 [thread overview]
Message-ID: <005301d1dd15$3e7769c0$bb663d40$@opengridcomputing.com> (raw)
In-Reply-To: <1468392841.23662.5.camel@kernel.org>
> On Wed, 2016-07-13@04:18 +0200, Christoph Hellwig wrote:
> > On Tue, Jul 12, 2016@03:38:42PM -0700, Ming Lin wrote:
> > > From: Ming Lin <ming.l at samsung.com>
> > >
> > > Below crash was triggered when shutting down a nvme host node
> > > via 'reboot' that has 1 target device attached.
> > >
> > > That's because nvmf_dev_release() put the ctrl reference, but
> > > we didn't get the reference in nvmf_dev_write().
> > >
> > > So the ctrl was freed in nvme_rdma_free_ctrl() before
> > > nvme_rdma_free_ring()
> > > was called.
> >
> > The ->create_ctrl methods do a kref_init for the main refererence,
> > and a kref_get for the reference that nvmf_dev_release drops,
> > so I'm a bit confused how this case could happen. I think we'll need
> > to
> > dig a bit deeper on what's actually happening here.
>
> You are right.
>
> I added some debug info.
>
> [31948.771952] MYDEBUG: init kref: nvme_init_ctrl
> [31948.798589] MYDEBUG: get: nvme_rdma_create_ctrl
> [31948.803765] MYDEBUG: put: nvmf_dev_release
> [31948.808734] MYDEBUG: get: nvme_alloc_ns
> [31948.884775] MYDEBUG: put: nvme_free_ns
> [31948.890155] MYDEBUG in nvme_rdma_destroy_queue_ib: queue
> ffff8800cdc81470: io queue
> [31948.900539] MYDEBUG: put: nvme_rdma_del_ctrl_work
> [31948.909469] MYDEBUG: nvme_rdma_free_ctrl called
> [31948.915379] MYDEBUG in nvme_rdma_destroy_queue_ib: queue
> ffff8800cdc81400: admin queue
>
> So nvme_rdma_destroy_queue_ib() was called for admin queue after ctrl was
> already freed.
>
> With below patch, the debug info shows:
>
> [32139.379831] MYDEBUG: get/init: nvme_init_ctrl
> [32139.407166] MYDEBUG: get: nvme_rdma_create_ctrl
> [32139.412463] MYDEBUG: put: nvmf_dev_release
> [32139.417697] MYDEBUG: get: nvme_alloc_ns
> [32139.418422] MYDEBUG: get: nvme_rdma_device_unplug
> [32139.474154] MYDEBUG: put: nvme_free_ns
> [32139.479406] MYDEBUG in nvme_rdma_destroy_queue_ib: queue
> ffff8800347c6470: io queue
> [32139.489532] MYDEBUG: put: nvme_rdma_del_ctrl_work
> [32139.496048] MYDEBUG in nvme_rdma_destroy_queue_ib: queue
> ffff8800347c6400: admin queue
> [32139.739089] MYDEBUG: put: nvme_rdma_device_unplug
> [32139.748175] MYDEBUG: nvme_rdma_free_ctrl called
>
> and the crash was fixed.
>
> What do you think?
>
> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> index e1205c0..284d980 100644
> --- a/drivers/nvme/host/rdma.c
> +++ b/drivers/nvme/host/rdma.c
> @@ -1323,6 +1323,12 @@ static int nvme_rdma_device_unplug(struct
> nvme_rdma_queue *queue)
> if (!test_and_clear_bit(NVME_RDMA_Q_CONNECTED, &queue->flags))
> goto out;
>
> + /*
> + * Grab a reference so the ctrl won't be freed before we free
> + * the last queue
> + */
> + kref_get(&ctrl->ctrl.kref);
> +
> /* delete the controller */
> ret = __nvme_rdma_del_ctrl(ctrl);
> if (!ret) {
> @@ -1339,6 +1345,8 @@ static int nvme_rdma_device_unplug(struct
> nvme_rdma_queue *queue)
> nvme_rdma_destroy_queue_ib(queue);
> }
>
> + nvme_put_ctrl(&ctrl->ctrl);
> +
> out:
> return ctrl_deleted;
> }
>
This change again avoids the first crash, but I still see the __ib_process_cq() crash.
next prev parent reply other threads:[~2016-07-13 14:45 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-12 22:38 [PATCH] nvme-fabrics: get ctrl reference in nvmf_dev_write Ming Lin
2016-07-13 2:18 ` Christoph Hellwig
2016-07-13 6:54 ` Ming Lin
2016-07-13 14:45 ` Steve Wise [this message]
2016-07-13 15:01 ` Ming Lin
2016-07-13 15:06 ` Steve Wise
[not found] ` <57862150.6070304@grimberg.me>
2016-07-13 15:03 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='005301d1dd15$3e7769c0$bb663d40$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).