From: swise@opengridcomputing.com (Steve Wise)
Subject: nvmet_rdma crash - DISCONNECT event with NULL queue
Date: Tue, 1 Nov 2016 10:57:44 -0500 [thread overview]
Message-ID: <01b401d23458$af277210$0d765630$@opengridcomputing.com> (raw)
Hey guys,
I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
timeout testing. It appears that nvmet_rdma_cm_handler() was called with
cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
NULL. But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
causes the crash.
In the log, I see that the target side keep-alive fired:
[20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
[20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
Then all the queues are freed followed by the crash.
[20677.080066] nvmet_rdma: freeing queue 222
[20677.080074] nvmet_rdma: sending cmd response failed
[20677.080351] nvmet_rdma: freeing queue 227
[20677.080775] nvmet_rdma: freeing queue 230
[20677.081137] nvmet_rdma: freeing queue 232
[20677.081371] nvmet_rdma: freeing queue 234
[20677.081604] nvmet_rdma: freeing queue 236
[20677.081835] nvmet_rdma: freeing queue 237
[20677.082062] nvmet_rdma: freeing queue 238
[20677.082106] nvmet_rdma: freeing queue 239
[20677.082366] nvmet_rdma: freeing queue 240
[20677.082570] nvmet_rdma: freeing queue 241
[20677.082995] nvmet_rdma: freeing queue 242
[20677.083222] nvmet_rdma: freeing queue 243
[20677.083475] nvmet_rdma: freeing queue 244
[20677.083522] nvmet_rdma: freeing queue 245
[20677.083801] nvmet_rdma: freeing queue 246
[20677.084264] nvmet_rdma: freeing queue 247
[20677.084307] nvmet_rdma: freeing queue 248
[20677.084501] nvmet_rdma: freeing queue 249
[20677.084846] nvmet_rdma: freeing queue 250
[20677.085184] nvmet_rdma: freeing queue 252
[20677.085500] nvmet_rdma: freeing queue 254
[20677.085733] nvmet_rdma: freeing queue 256
[20677.085997] nvmet_rdma: freeing queue 258
[20677.086224] nvmet_rdma: freeing queue 260
[20677.086517] nvmet_rdma: freeing queue 262
[20677.086768] nvmet_rdma: freeing queue 264
[20677.087031] nvmet_rdma: freeing queue 266
[20677.087359] nvmet_rdma: freeing queue 268
[20677.087567] nvmet_rdma: freeing queue 270
[20677.087821] nvmet_rdma: freeing queue 272
[20677.088162] nvmet_rdma: freeing queue 274
[20677.088402] nvmet_rdma: freeing queue 276
[20677.090981] BUG: unable to handle kernel NULL pointer dereference at
0000000000000120
[20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]
So maybe there is just a race in that keep-alive can free the queue and yet a
DISCONNECTED event still received on the cm_id after the queue is freed?
Steve.
WARNING: multiple messages have this Message-ID (diff)
From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
To: sagig <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>,
Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: nvmet_rdma crash - DISCONNECT event with NULL queue
Date: Tue, 1 Nov 2016 10:57:44 -0500 [thread overview]
Message-ID: <01b401d23458$af277210$0d765630$@opengridcomputing.com> (raw)
Hey guys,
I just hit an nvmf target NULL pointer deref BUG after a few hours of keep-alive
timeout testing. It appears that nvmet_rdma_cm_handler() was called with
cm_id->qp == NULL, so the local nvmet_rdma_queue * variable queue is left as
NULL. But then nvmet_rdma_queue_disconnect() is called with queue == NULL which
causes the crash.
In the log, I see that the target side keep-alive fired:
[20676.867545] eth2: link up, 40Gbps, full-duplex, Tx/Rx PAUSE
[20677.079669] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
[20677.079684] nvmet: ctrl 1 keep-alive timer (15 seconds) expired!
Then all the queues are freed followed by the crash.
[20677.080066] nvmet_rdma: freeing queue 222
[20677.080074] nvmet_rdma: sending cmd response failed
[20677.080351] nvmet_rdma: freeing queue 227
[20677.080775] nvmet_rdma: freeing queue 230
[20677.081137] nvmet_rdma: freeing queue 232
[20677.081371] nvmet_rdma: freeing queue 234
[20677.081604] nvmet_rdma: freeing queue 236
[20677.081835] nvmet_rdma: freeing queue 237
[20677.082062] nvmet_rdma: freeing queue 238
[20677.082106] nvmet_rdma: freeing queue 239
[20677.082366] nvmet_rdma: freeing queue 240
[20677.082570] nvmet_rdma: freeing queue 241
[20677.082995] nvmet_rdma: freeing queue 242
[20677.083222] nvmet_rdma: freeing queue 243
[20677.083475] nvmet_rdma: freeing queue 244
[20677.083522] nvmet_rdma: freeing queue 245
[20677.083801] nvmet_rdma: freeing queue 246
[20677.084264] nvmet_rdma: freeing queue 247
[20677.084307] nvmet_rdma: freeing queue 248
[20677.084501] nvmet_rdma: freeing queue 249
[20677.084846] nvmet_rdma: freeing queue 250
[20677.085184] nvmet_rdma: freeing queue 252
[20677.085500] nvmet_rdma: freeing queue 254
[20677.085733] nvmet_rdma: freeing queue 256
[20677.085997] nvmet_rdma: freeing queue 258
[20677.086224] nvmet_rdma: freeing queue 260
[20677.086517] nvmet_rdma: freeing queue 262
[20677.086768] nvmet_rdma: freeing queue 264
[20677.087031] nvmet_rdma: freeing queue 266
[20677.087359] nvmet_rdma: freeing queue 268
[20677.087567] nvmet_rdma: freeing queue 270
[20677.087821] nvmet_rdma: freeing queue 272
[20677.088162] nvmet_rdma: freeing queue 274
[20677.088402] nvmet_rdma: freeing queue 276
[20677.090981] BUG: unable to handle kernel NULL pointer dereference at
0000000000000120
[20677.090988] IP: [<ffffffffa084b6b4>] nvmet_rdma_queue_disconnect+0x24/0x90
[nvmet_rdma]
So maybe there is just a race in that keep-alive can free the queue and yet a
DISCONNECTED event still received on the cm_id after the queue is freed?
Steve.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next reply other threads:[~2016-11-01 15:57 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-01 15:57 Steve Wise [this message]
2016-11-01 15:57 ` nvmet_rdma crash - DISCONNECT event with NULL queue Steve Wise
2016-11-01 16:15 ` Sagi Grimberg
2016-11-01 16:15 ` Sagi Grimberg
2016-11-01 16:20 ` Steve Wise
2016-11-01 16:20 ` Steve Wise
2016-11-01 16:34 ` Sagi Grimberg
2016-11-01 16:34 ` Sagi Grimberg
2016-11-01 16:37 ` Steve Wise
2016-11-01 16:37 ` Steve Wise
2016-11-01 16:44 ` Sagi Grimberg
2016-11-01 16:44 ` Sagi Grimberg
2016-11-01 16:49 ` Steve Wise
2016-11-01 16:49 ` Steve Wise
2016-11-01 17:41 ` Sagi Grimberg
2016-11-01 17:41 ` Sagi Grimberg
[not found] ` <025201d23476$66812290$338367b0$@opengridcomputing.com>
2016-11-01 19:42 ` Steve Wise
2016-11-01 19:42 ` Steve Wise
[not found] ` <024e01d23476$6668b890$333a29b0$@opengridcomputing.com>
2016-11-01 22:34 ` Sagi Grimberg
2016-11-01 22:34 ` Sagi Grimberg
2016-11-02 15:07 ` Steve Wise
2016-11-02 15:07 ` Steve Wise
2016-11-02 15:15 ` 'Christoph Hellwig'
2016-11-02 15:15 ` 'Christoph Hellwig'
2016-11-06 7:35 ` Sagi Grimberg
2016-11-06 7:35 ` Sagi Grimberg
2016-11-07 18:29 ` J Freyensee
2016-11-07 18:29 ` J Freyensee
2016-11-07 18:41 ` 'Christoph Hellwig'
2016-11-07 18:41 ` 'Christoph Hellwig'
2016-11-07 18:50 ` J Freyensee
2016-11-07 18:50 ` J Freyensee
2016-11-07 18:51 ` 'Christoph Hellwig'
2016-11-07 18:51 ` 'Christoph Hellwig'
[not found] ` <004701d2351a$d9e4ad70$8dae0850$@opengridcomputing.com>
2016-11-02 19:18 ` Steve Wise
2016-11-02 19:18 ` Steve Wise
2016-11-06 8:51 ` Sagi Grimberg
2016-11-06 8:51 ` Sagi Grimberg
2016-11-08 20:45 ` Steve Wise
2016-11-08 20:45 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='01b401d23458$af277210$0d765630$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.