From: swise@opengridcomputing.com (Steve Wise)
Subject: nvmf/rdma host crash during heavy load and keep alive recovery
Date: Thu, 8 Sep 2016 15:47:02 -0500 [thread overview]
Message-ID: <020f01d20a12$26f846a0$74e8d3e0$@opengridcomputing.com> (raw)
In-Reply-To: <7f09e373-6316-26a3-ae81-dab1205d88ab@grimberg.me>
> >> Does this happen if you change the reconnect delay to be something
> >> different than 10 seconds? (say 30?)
> >>
> >
> > Yes. But I noticed something when performing this experiment that is an
> > important point, I think: if I just bring the network interface down and
leave
> > it down, we don't crash. During this state, I see the host continually
> > reconnecting after the reconnect delay time, timing out trying to reconnect,
and
> > retrying after another reconnect_delay period. I see this for all 10
targets of
> > course. The crash only happens when I bring the interface back up, and the
> > targets begin to reconnect. So the process of successfully reconnecting
the
> > RDMA QPs, and restarting the nvme queues is somehow triggering running an
> nvme
> > request too soon (or perhaps on the wrong queue).
>
> Interesting. Given this is easy to reproduce, can you record the:
> (request_tag, *queue, *qp) for each request submitted?
>
> I'd like to see that the *queue stays the same for each tag
> but the *qp indeed changes.
>
I tried this, and didn't hit the BUG_ON(), yet still hit the crash. I believe
this verifies that *queue never changed...
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index c075ea5..a77729e 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -76,6 +76,7 @@ struct nvme_rdma_request {
struct ib_reg_wr reg_wr;
struct ib_cqe reg_cqe;
struct nvme_rdma_queue *queue;
+ struct nvme_rdma_queue *save_queue;
struct sg_table sg_table;
struct scatterlist first_sgl[];
};
@@ -354,6 +355,8 @@ static int __nvme_rdma_init_request(struct nvme_rdma_ctrl
*ctrl,
}
req->queue = queue;
+ if (!req->save_queue)
+ req->save_queue = queue;
return 0;
@@ -1434,6 +1436,9 @@ static int nvme_rdma_queue_rq(struct blk_mq_hw_ctx *hctx,
WARN_ON_ONCE(rq->tag < 0);
+ BUG_ON(queue != req->queue);
+ BUG_ON(queue != req->save_queue);
+
dev = queue->device->dev;
ib_dma_sync_single_for_cpu(dev, sqe->dma,
sizeof(struct nvme_command), DMA_TO_DEVICE);
next prev parent reply other threads:[~2016-09-08 20:47 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-29 21:40 nvmf/rdma host crash during heavy load and keep alive recovery Steve Wise
2016-08-01 11:06 ` Christoph Hellwig
2016-08-01 14:26 ` Steve Wise
2016-08-01 21:38 ` Steve Wise
[not found] ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com>
2016-08-10 15:46 ` Steve Wise
[not found] ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com>
2016-08-10 16:00 ` Steve Wise
[not found] ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com>
2016-08-10 17:20 ` Steve Wise
2016-08-10 18:59 ` Steve Wise
2016-08-11 6:27 ` Sagi Grimberg
2016-08-11 13:58 ` Steve Wise
2016-08-11 14:19 ` Steve Wise
2016-08-11 14:40 ` Steve Wise
2016-08-11 15:53 ` Steve Wise
[not found] ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com>
2016-08-15 14:39 ` Steve Wise
2016-08-16 9:26 ` Sagi Grimberg
2016-08-16 21:17 ` Steve Wise
2016-08-17 18:57 ` Sagi Grimberg
2016-08-17 19:07 ` Steve Wise
2016-09-01 19:14 ` Steve Wise
2016-09-04 9:17 ` Sagi Grimberg
2016-09-07 21:08 ` Steve Wise
2016-09-08 7:45 ` Sagi Grimberg
2016-09-08 20:47 ` Steve Wise [this message]
2016-09-08 21:00 ` Steve Wise
[not found] ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me>
[not found] ` <021201d20a14$0 f203b80$2d60b280$@opengridcomputing.com>
[not found] ` <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com>
2016-09-08 21:21 ` Steve Wise
[not found] ` <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com>
[not found] ` <021501d20a19$327ba5b0$9772f110$@opengrid computing.com>
2016-09-08 21:37 ` Steve Wise
2016-09-09 15:50 ` Steve Wise
2016-09-12 20:10 ` Steve Wise
[not found] ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me>
2016-09-15 9:53 ` Sagi Grimberg
2016-09-15 14:44 ` Steve Wise
2016-09-15 15:10 ` Steve Wise
2016-09-15 15:53 ` Steve Wise
2016-09-15 16:45 ` Steve Wise
2016-09-15 20:58 ` Steve Wise
2016-09-16 11:04 ` 'Christoph Hellwig'
2016-09-18 17:02 ` Sagi Grimberg
2016-09-19 15:38 ` Steve Wise
2016-09-21 21:20 ` Steve Wise
2016-09-23 23:57 ` Sagi Grimberg
2016-09-26 15:12 ` 'Christoph Hellwig'
2016-09-26 22:29 ` 'Christoph Hellwig'
2016-09-27 15:11 ` Steve Wise
2016-09-27 15:31 ` Steve Wise
2016-09-27 14:07 ` Steve Wise
2016-09-15 14:00 ` Gabriel Krisman Bertazi
2016-09-15 14:31 ` Steve Wise
2016-09-07 21:33 ` Steve Wise
2016-09-08 8:22 ` Sagi Grimberg
2016-09-08 17:19 ` Steve Wise
2016-09-09 15:57 ` Steve Wise
[not found] ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me>
[not found] ` <01bc01d209f5$1 b7d7510$52785f30$@opengridcomputing.com>
[not found] ` <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com>
2016-09-08 19:15 ` Steve Wise
[not found] ` <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com>
2016-09-08 19:26 ` Steve Wise
[not found] ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com>
2016-09-08 20:44 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='020f01d20a12$26f846a0$74e8d3e0$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).