From: Ming Lei <ming.lei@redhat.com>
To: Keith Busch <keith.busch@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>, Keith Busch <keith.busch@intel.com>,
Laurence Oberman <loberman@redhat.com>,
Sagi Grimberg <sagi@grimberg.me>,
James Smart <james.smart@broadcom.com>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
Jianchao Wang <jianchao.w.wang@oracle.com>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH V6 11/11] nvme: pci: support nested EH
Date: Sat, 19 May 2018 06:26:28 +0800 [thread overview]
Message-ID: <20180518222622.GA18334@ming.t460p> (raw)
In-Reply-To: <20180518135751.GI23555@localhost.localdomain>
On Fri, May 18, 2018 at 07:57:51AM -0600, Keith Busch wrote:
> On Fri, May 18, 2018 at 08:20:05AM +0800, Ming Lei wrote:
> > What I think block/011 is helpful is that it can trigger IO timeout
> > during reset, which can be triggered in reality too.
>
> As I mentioned earlier, there is nothing wrong with the spirit of
> the test. What's wrong with it is the misguided implemention.
>
> Do you underestand why it ever passes? The success happens when the
> enabling part of the loop happens to coincide with the driver's enabling,
> creating the pci_dev->enable_cnt > 1, making subsequent disable parts
> of the loop do absolutely nothing; the exact same as the one-liner
> (non-serious) patch I sent to defeat the test.
>
> A better way to induce the timeout is:
>
> # setpci -s <B:D.f> 4.w=0:6
>
> This will halt the device without messing with the kernel structures,
> just like how a real device failure would occur.
Frankly speaking, I don't care how the test-case is implemented at all.
The big problem is that NVMe driver can't handle IO time-out during
reset context, and finally either the controller becomes DEAD or reset
context hangs forever, and everything can't move on.
The issue can be reproduced easier via io-timeout-fail fault injection.
So could we please face to the real issue instead of working around
test case?
Thanks,
Ming
WARNING: multiple messages have this Message-ID (diff)
From: ming.lei@redhat.com (Ming Lei)
Subject: [PATCH V6 11/11] nvme: pci: support nested EH
Date: Sat, 19 May 2018 06:26:28 +0800 [thread overview]
Message-ID: <20180518222622.GA18334@ming.t460p> (raw)
In-Reply-To: <20180518135751.GI23555@localhost.localdomain>
On Fri, May 18, 2018@07:57:51AM -0600, Keith Busch wrote:
> On Fri, May 18, 2018@08:20:05AM +0800, Ming Lei wrote:
> > What I think block/011 is helpful is that it can trigger IO timeout
> > during reset, which can be triggered in reality too.
>
> As I mentioned earlier, there is nothing wrong with the spirit of
> the test. What's wrong with it is the misguided implemention.
>
> Do you underestand why it ever passes? The success happens when the
> enabling part of the loop happens to coincide with the driver's enabling,
> creating the pci_dev->enable_cnt > 1, making subsequent disable parts
> of the loop do absolutely nothing; the exact same as the one-liner
> (non-serious) patch I sent to defeat the test.
>
> A better way to induce the timeout is:
>
> # setpci -s <B:D.f> 4.w=0:6
>
> This will halt the device without messing with the kernel structures,
> just like how a real device failure would occur.
Frankly speaking, I don't care how the test-case is implemented at all.
The big problem is that NVMe driver can't handle IO time-out during
reset context, and finally either the controller becomes DEAD or reset
context hangs forever, and everything can't move on.
The issue can be reproduced easier via io-timeout-fail fault injection.
So could we please face to the real issue instead of working around
test case?
Thanks,
Ming
next prev parent reply other threads:[~2018-05-18 22:26 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-16 4:03 [PATCH V6 00/11] nvme: pci: fix & improve timeout handling Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 01/11] block: introduce blk_quiesce_timeout() and blk_unquiesce_timeout() Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 02/11] nvme: pci: cover timeout for admin commands running in EH Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-24 15:39 ` Keith Busch
2018-05-24 15:39 ` Keith Busch
2018-05-16 4:03 ` [PATCH V6 03/11] nvme: pci: unquiesce admin queue after controller is shutdown Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 04/11] nvme: pci: set nvmeq->cq_vector after alloc cq/sq Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 05/11] nvme: pci: only wait freezing if queue is frozen Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 06/11] nvme: pci: freeze queue in nvme_dev_disable() in case of error recovery Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 07/11] nvme: pci: prepare for supporting error recovery from resetting context Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 08/11] nvme: pci: move error handling out of nvme_reset_dev() Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 09/11] nvme: pci: don't unfreeze queue until controller state updating succeeds Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 10/11] nvme: core: introduce nvme_force_change_ctrl_state() Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 4:03 ` [PATCH V6 11/11] nvme: pci: support nested EH Ming Lei
2018-05-16 4:03 ` Ming Lei
2018-05-16 14:12 ` Keith Busch
2018-05-16 14:12 ` Keith Busch
2018-05-16 23:10 ` Ming Lei
2018-05-16 23:10 ` Ming Lei
2018-05-17 2:20 ` Keith Busch
2018-05-17 2:20 ` Keith Busch
2018-05-17 8:41 ` Christoph Hellwig
2018-05-17 8:41 ` Christoph Hellwig
2018-05-17 14:20 ` Keith Busch
2018-05-17 14:20 ` Keith Busch
2018-05-17 14:20 ` Keith Busch
2018-05-17 14:23 ` Johannes Thumshirn
2018-05-17 14:23 ` Johannes Thumshirn
2018-05-17 14:23 ` Johannes Thumshirn
2018-05-18 16:28 ` Keith Busch
2018-05-18 16:28 ` Keith Busch
2018-05-18 16:28 ` Keith Busch
2018-05-22 7:35 ` Johannes Thumshirn
2018-05-22 7:35 ` Johannes Thumshirn
2018-05-22 7:35 ` Johannes Thumshirn
2018-05-18 0:20 ` Ming Lei
2018-05-18 0:20 ` Ming Lei
2018-05-18 1:01 ` Ming Lei
2018-05-18 1:01 ` Ming Lei
2018-05-18 13:57 ` Keith Busch
2018-05-18 13:57 ` Keith Busch
2018-05-18 16:58 ` Jens Axboe
2018-05-18 16:58 ` Jens Axboe
2018-05-18 22:26 ` Ming Lei [this message]
2018-05-18 22:26 ` Ming Lei
2018-05-18 23:45 ` Keith Busch
2018-05-18 23:45 ` Keith Busch
2018-05-18 23:51 ` Ming Lei
2018-05-18 23:51 ` Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180518222622.GA18334@ming.t460p \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=james.smart@broadcom.com \
--cc=jianchao.w.wang@oracle.com \
--cc=keith.busch@intel.com \
--cc=keith.busch@linux.intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=loberman@redhat.com \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.