From: swise@opengridcomputing.com (Steve Wise)
Subject: nvmf/rdma host crash during heavy load and keep alive recovery
Date: Mon, 19 Sep 2016 10:38:46 -0500 [thread overview]
Message-ID: <02db01d2128b$e9244c70$bb6ce550$@opengridcomputing.com> (raw)
In-Reply-To: <8fc2cefe-76b6-b0a3-12af-701833c286f7@grimberg.me>
> >> This stack is creating hctx queues for the namespace created for this
target
> >> device.
> >>
> >> Sagi,
> >>
> >> Should nvme_rdma_error_recovery_work() be stopping the hctx queues for
> >> ctrl->ctrl.connect_q too?
> >
> > Oh. Actually we'll probably need to take care of the connect_q just
> > about anywhere we do anything to the other queues..
>
> Why should we?
>
> We control the IOs on the connect_q (we only submit connect to it) and
> we only submit to it if our queue is established.
>
> I still don't see how this explains why Steves is seeing bogus
> queue/hctx mappings...
I don't think I'm seeing bogus mappings necessarily. I think my debug code
uncovered (to me at least) that connect_q hctx's use the same nvme_rdma_queues
as the ioq hctxs. And I thought that was not a valid configuration, but
apparently its normal. So I still don't know how/why a pending request gets run
on an nvme_rdma_queue that has blown away its rdma qp and cm_id. It _could_ be
due to queue/hctx bogus mappings, but I haven't proven it. I'm not sure how to
prove it (or how to further debug this issue)...
next prev parent reply other threads:[~2016-09-19 15:38 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-29 21:40 nvmf/rdma host crash during heavy load and keep alive recovery Steve Wise
2016-08-01 11:06 ` Christoph Hellwig
2016-08-01 14:26 ` Steve Wise
2016-08-01 21:38 ` Steve Wise
[not found] ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com>
2016-08-10 15:46 ` Steve Wise
[not found] ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com>
2016-08-10 16:00 ` Steve Wise
[not found] ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com>
2016-08-10 17:20 ` Steve Wise
2016-08-10 18:59 ` Steve Wise
2016-08-11 6:27 ` Sagi Grimberg
2016-08-11 13:58 ` Steve Wise
2016-08-11 14:19 ` Steve Wise
2016-08-11 14:40 ` Steve Wise
2016-08-11 15:53 ` Steve Wise
[not found] ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com>
2016-08-15 14:39 ` Steve Wise
2016-08-16 9:26 ` Sagi Grimberg
2016-08-16 21:17 ` Steve Wise
2016-08-17 18:57 ` Sagi Grimberg
2016-08-17 19:07 ` Steve Wise
2016-09-01 19:14 ` Steve Wise
2016-09-04 9:17 ` Sagi Grimberg
2016-09-07 21:08 ` Steve Wise
2016-09-08 7:45 ` Sagi Grimberg
2016-09-08 20:47 ` Steve Wise
2016-09-08 21:00 ` Steve Wise
[not found] ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me>
[not found] ` <021201d20a14$0 f203b80$2d60b280$@opengridcomputing.com>
[not found] ` <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com>
2016-09-08 21:21 ` Steve Wise
[not found] ` <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com>
[not found] ` <021501d20a19$327ba5b0$9772f110$@opengrid computing.com>
2016-09-08 21:37 ` Steve Wise
2016-09-09 15:50 ` Steve Wise
2016-09-12 20:10 ` Steve Wise
[not found] ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me>
2016-09-15 9:53 ` Sagi Grimberg
2016-09-15 14:44 ` Steve Wise
2016-09-15 15:10 ` Steve Wise
2016-09-15 15:53 ` Steve Wise
2016-09-15 16:45 ` Steve Wise
2016-09-15 20:58 ` Steve Wise
2016-09-16 11:04 ` 'Christoph Hellwig'
2016-09-18 17:02 ` Sagi Grimberg
2016-09-19 15:38 ` Steve Wise [this message]
2016-09-21 21:20 ` Steve Wise
2016-09-23 23:57 ` Sagi Grimberg
2016-09-26 15:12 ` 'Christoph Hellwig'
2016-09-26 22:29 ` 'Christoph Hellwig'
2016-09-27 15:11 ` Steve Wise
2016-09-27 15:31 ` Steve Wise
2016-09-27 14:07 ` Steve Wise
2016-09-15 14:00 ` Gabriel Krisman Bertazi
2016-09-15 14:31 ` Steve Wise
2016-09-07 21:33 ` Steve Wise
2016-09-08 8:22 ` Sagi Grimberg
2016-09-08 17:19 ` Steve Wise
2016-09-09 15:57 ` Steve Wise
[not found] ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me>
[not found] ` <01bc01d209f5$1 b7d7510$52785f30$@opengridcomputing.com>
[not found] ` <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com>
2016-09-08 19:15 ` Steve Wise
[not found] ` <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com>
2016-09-08 19:26 ` Steve Wise
[not found] ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com>
2016-09-08 20:44 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='02db01d2128b$e9244c70$bb6ce550$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.