From: sagi@grimberg.me (Sagi Grimberg)
Subject: nvmf/rdma host crash during heavy load and keep alive recovery
Date: Wed, 17 Aug 2016 21:57:34 +0300 [thread overview]
Message-ID: <69c0e819-76d9-286b-c4fb-22f087f36ff1@grimberg.me> (raw)
In-Reply-To: <045601d1f803$a9d73a20$fd85ae60$@opengridcomputing.com>
>> If that is the case, I think we need to have a closer look at
>> nvme_stop_queues...
>>
>
> request_queue->queue_flags does have QUEUE_FLAG_STOPPED set:
>
> #define QUEUE_FLAG_STOPPED 2 /* queue is stopped */
>
> crash> request_queue.queue_flags -x 0xffff880397a13d28
> queue_flags = 0x1f07a04
> crash> request_queue.mq_ops 0xffff880397a13d28
> mq_ops = 0xffffffffa084b140 <nvme_rdma_mq_ops>
>
> So it appears the queue is stopped, yet a request is being processed for that
> queue. Perhaps there is a race where QUEUE_FLAG_STOPPED is set after a request
> is scheduled?
Umm. When the keep-alive timeout triggers we stop the queues. only 10
seconds (or reconnect_delay) later we free the queues and reestablish
them, so I find it hard to believe that a request was queued, and spent
so long in queue_rq until we freed the queue-pair.
From you description of the sequence it seems that after 10 seconds we
attempt a reconnect and during that time an IO request crashes the
party.
I assume this means you ran traffic during the sequence yes?
next prev parent reply other threads:[~2016-08-17 18:57 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-07-29 21:40 nvmf/rdma host crash during heavy load and keep alive recovery Steve Wise
2016-08-01 11:06 ` Christoph Hellwig
2016-08-01 14:26 ` Steve Wise
2016-08-01 21:38 ` Steve Wise
[not found] ` <015801d1ec3d$0ca07ea0$25e17be0$@opengridcomputing.com>
2016-08-10 15:46 ` Steve Wise
[not found] ` <010f01d1f31e$50c8cb40$f25a61c0$@opengridcomputing.com>
2016-08-10 16:00 ` Steve Wise
[not found] ` <013701d1f320$57b185d0$07149170$@opengridcomputing.com>
2016-08-10 17:20 ` Steve Wise
2016-08-10 18:59 ` Steve Wise
2016-08-11 6:27 ` Sagi Grimberg
2016-08-11 13:58 ` Steve Wise
2016-08-11 14:19 ` Steve Wise
2016-08-11 14:40 ` Steve Wise
2016-08-11 15:53 ` Steve Wise
[not found] ` <00fe01d1f3e8$8992b330$9cb81990$@opengridcomputing.com>
2016-08-15 14:39 ` Steve Wise
2016-08-16 9:26 ` Sagi Grimberg
2016-08-16 21:17 ` Steve Wise
2016-08-17 18:57 ` Sagi Grimberg [this message]
2016-08-17 19:07 ` Steve Wise
2016-09-01 19:14 ` Steve Wise
2016-09-04 9:17 ` Sagi Grimberg
2016-09-07 21:08 ` Steve Wise
2016-09-08 7:45 ` Sagi Grimberg
2016-09-08 20:47 ` Steve Wise
2016-09-08 21:00 ` Steve Wise
[not found] ` <7f09e373-6316-26a3-ae81-dab1205d88ab@grimbe rg.me>
[not found] ` <021201d20a14$0 f203b80$2d60b280$@opengridcomputing.com>
[not found] ` <021201d20a14$0f203b80$2d60b280$@opengridcomputing.com>
2016-09-08 21:21 ` Steve Wise
[not found] ` <021401d20a16$ed60d470$c8227d50$@opengridcomputing.com>
[not found] ` <021501d20a19$327ba5b0$9772f110$@opengrid computing.com>
2016-09-08 21:37 ` Steve Wise
2016-09-09 15:50 ` Steve Wise
2016-09-12 20:10 ` Steve Wise
[not found] ` <da2e918b-0f18-e032-272d-368c6ec49c62@gri mberg.me>
2016-09-15 9:53 ` Sagi Grimberg
2016-09-15 14:44 ` Steve Wise
2016-09-15 15:10 ` Steve Wise
2016-09-15 15:53 ` Steve Wise
2016-09-15 16:45 ` Steve Wise
2016-09-15 20:58 ` Steve Wise
2016-09-16 11:04 ` 'Christoph Hellwig'
2016-09-18 17:02 ` Sagi Grimberg
2016-09-19 15:38 ` Steve Wise
2016-09-21 21:20 ` Steve Wise
2016-09-23 23:57 ` Sagi Grimberg
2016-09-26 15:12 ` 'Christoph Hellwig'
2016-09-26 22:29 ` 'Christoph Hellwig'
2016-09-27 15:11 ` Steve Wise
2016-09-27 15:31 ` Steve Wise
2016-09-27 14:07 ` Steve Wise
2016-09-15 14:00 ` Gabriel Krisman Bertazi
2016-09-15 14:31 ` Steve Wise
2016-09-07 21:33 ` Steve Wise
2016-09-08 8:22 ` Sagi Grimberg
2016-09-08 17:19 ` Steve Wise
2016-09-09 15:57 ` Steve Wise
[not found] ` <9fd1f090-3b86-b496-d8c0-225ac0815fbe@grimbe rg.me>
[not found] ` <01bc01d209f5$1 b7d7510$52785f30$@opengridcomputing.com>
[not found] ` <01bc01d209f5$1b7d7510$52785f30$@opengridcomputing.com>
2016-09-08 19:15 ` Steve Wise
[not found] ` <01f201d20a05$6abde5f0$4039b1d0$@opengridcomputing.com>
2016-09-08 19:26 ` Steve Wise
[not found] ` <01f401d20a06$d4cc8360$7e658a20$@opengridcomputing.com>
2016-09-08 20:44 ` Steve Wise
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69c0e819-76d9-286b-c4fb-22f087f36ff1@grimberg.me \
--to=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).