From: swise@opengridcomputing.com (Steve Wise)
Subject: nvmet_rdma crash - DISCONNECT event with NULL queue
Date: Tue, 1 Nov 2016 11:37:06 -0500 [thread overview]
Message-ID: <01d101d2345e$2f054390$8d0fcab0$@opengridcomputing.com> (raw)
In-Reply-To: <4cc25277-429a-4ab9-470c-b3af1428ce93@grimberg.me>
>
> >>> I just hit an nvmf target NULL pointer deref BUG after a few hours of
> > keep-alive
> >>> timeout testing. It appears that nvmet_rdma_cm_handler() was called with
> >>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left
as
> >>> NULL. But then nvmet_rdma_queue_disconnect() is called with queue == NULL
> >> which
> >>> causes the crash.
> >>
> >> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
> >> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
> >> paths does not nullify cm_id->qp...
> >
> > rdma_destroy_qp() nulls out cm_id->qp.
>
> pphh, somehow managed to miss it...
>
> So we have a case where we can call rdma_destroy_qp and
> then rdma_destroy_id but still get events on the cm_id...
> Not very nice...
>
> So I think that the patch from Bart a few weeks ago was correct:
>
Not quite. It just guards against a null queue for TIMEWAIT_EXIT, which is only
generated by the IB_CM.
> ---
> drivers/nvme/target/rdma.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index d1aea17..a61e47f 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct
> rdma_cm_id *cm_id,
> break;
> case RDMA_CM_EVENT_ADDR_CHANGE:
> case RDMA_CM_EVENT_DISCONNECTED:
> - case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> nvmet_rdma_queue_disconnect(queue);
> break;
> + case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> + if (queue)
> + nvmet_rdma_queue_disconnect(queue);
> + break;
> case RDMA_CM_EVENT_DEVICE_REMOVAL:
> ret = nvmet_rdma_device_removal(cm_id, queue);
> break;
> ---
>
> In case this fixes the issue (as expected) I'll queue it up
> with a change log and a code comment on why we need to do
> this (and include all the relevant cases around it)...
WARNING: multiple messages have this Message-ID (diff)
From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: 'Sagi Grimberg' <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
'Christoph Hellwig' <hch-jcswGhMUV9g@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: RE: nvmet_rdma crash - DISCONNECT event with NULL queue
Date: Tue, 1 Nov 2016 11:37:06 -0500 [thread overview]
Message-ID: <01d101d2345e$2f054390$8d0fcab0$@opengridcomputing.com> (raw)
In-Reply-To: <4cc25277-429a-4ab9-470c-b3af1428ce93-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
>
> >>> I just hit an nvmf target NULL pointer deref BUG after a few hours of
> > keep-alive
> >>> timeout testing. It appears that nvmet_rdma_cm_handler() was called with
> >>> cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left
as
> >>> NULL. But then nvmet_rdma_queue_disconnect() is called with queue == NULL
> >> which
> >>> causes the crash.
> >>
> >> AFAICT, the only way cm_id->qp is NULL is for a scenario we didn't even
> >> get to allocate a queue-pair (e.g. calling rdma_create_qp). The teardown
> >> paths does not nullify cm_id->qp...
> >
> > rdma_destroy_qp() nulls out cm_id->qp.
>
> pphh, somehow managed to miss it...
>
> So we have a case where we can call rdma_destroy_qp and
> then rdma_destroy_id but still get events on the cm_id...
> Not very nice...
>
> So I think that the patch from Bart a few weeks ago was correct:
>
Not quite. It just guards against a null queue for TIMEWAIT_EXIT, which is only
generated by the IB_CM.
> ---
> drivers/nvme/target/rdma.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index d1aea17..a61e47f 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -1354,9 +1354,12 @@ static int nvmet_rdma_cm_handler(struct
> rdma_cm_id *cm_id,
> break;
> case RDMA_CM_EVENT_ADDR_CHANGE:
> case RDMA_CM_EVENT_DISCONNECTED:
> - case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> nvmet_rdma_queue_disconnect(queue);
> break;
> + case RDMA_CM_EVENT_TIMEWAIT_EXIT:
> + if (queue)
> + nvmet_rdma_queue_disconnect(queue);
> + break;
> case RDMA_CM_EVENT_DEVICE_REMOVAL:
> ret = nvmet_rdma_device_removal(cm_id, queue);
> break;
> ---
>
> In case this fixes the issue (as expected) I'll queue it up
> with a change log and a code comment on why we need to do
> this (and include all the relevant cases around it)...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2016-11-01 16:37 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-01 15:57 nvmet_rdma crash - DISCONNECT event with NULL queue Steve Wise
2016-11-01 15:57 ` Steve Wise
2016-11-01 16:15 ` Sagi Grimberg
2016-11-01 16:15 ` Sagi Grimberg
2016-11-01 16:20 ` Steve Wise
2016-11-01 16:20 ` Steve Wise
2016-11-01 16:34 ` Sagi Grimberg
2016-11-01 16:34 ` Sagi Grimberg
2016-11-01 16:37 ` Steve Wise [this message]
2016-11-01 16:37 ` Steve Wise
2016-11-01 16:44 ` Sagi Grimberg
2016-11-01 16:44 ` Sagi Grimberg
2016-11-01 16:49 ` Steve Wise
2016-11-01 16:49 ` Steve Wise
2016-11-01 17:41 ` Sagi Grimberg
2016-11-01 17:41 ` Sagi Grimberg
[not found] ` <025201d23476$66812290$338367b0$@opengridcomputing.com>
2016-11-01 19:42 ` Steve Wise
2016-11-01 19:42 ` Steve Wise
[not found] ` <024e01d23476$6668b890$333a29b0$@opengridcomputing.com>
2016-11-01 22:34 ` Sagi Grimberg
2016-11-01 22:34 ` Sagi Grimberg
2016-11-02 15:07 ` Steve Wise
2016-11-02 15:07 ` Steve Wise
2016-11-02 15:15 ` 'Christoph Hellwig'
2016-11-02 15:15 ` 'Christoph Hellwig'
2016-11-06 7:35 ` Sagi Grimberg
2016-11-06 7:35 ` Sagi Grimberg
2016-11-07 18:29 ` J Freyensee
2016-11-07 18:29 ` J Freyensee
2016-11-07 18:41 ` 'Christoph Hellwig'
2016-11-07 18:41 ` 'Christoph Hellwig'
2016-11-07 18:50 ` J Freyensee
2016-11-07 18:50 ` J Freyensee
2016-11-07 18:51 ` 'Christoph Hellwig'
2016-11-07 18:51 ` 'Christoph Hellwig'
[not found] ` <004701d2351a$d9e4ad70$8dae0850$@opengridcomputing.com>
2016-11-02 19:18 ` Steve Wise
2016-11-02 19:18 ` Steve Wise
2016-11-06 8:51 ` Sagi Grimberg
2016-11-06 8:51 ` Sagi Grimberg
2016-11-08 20:45 ` Steve Wise
2016-11-08 20:45 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='01d101d2345e$2f054390$8d0fcab0$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.