From mboxrd@z Thu Jan 1 00:00:00 1970 From: ming.lei@redhat.com (Ming Lei) Date: Thu, 10 May 2018 10:09:27 +0800 Subject: [PATCH V4 0/7] nvme: pci: fix & improve timeout handling In-Reply-To: References: <20180505135905.18815-1-ming.lei@redhat.com> Message-ID: <20180510020926.GC9327@ming.t460p> On Wed, May 09, 2018@01:46:09PM +0800, jianchao.wang wrote: > Hi ming > > I did some tests on my local. > > [ 598.828578] nvme nvme0: I/O 51 QID 4 timeout, disable controller > > This should be a timeout on nvme_reset_dev->nvme_wait_freeze. > > [ 598.828743] nvme nvme0: EH 1: before shutdown > [ 599.013586] nvme nvme0: EH 1: after shutdown > [ 599.137197] nvme nvme0: EH 1: after recovery > > The EH 1 have mark the state to LIVE > > [ 599.137241] nvme nvme0: failed to mark controller state 1 > > So the EH 0 failed to mark state to LIVE > The card was removed. > This should not be expected by nested EH. Right. > > [ 599.137322] nvme nvme0: Removing after probe failure status: 0 > [ 599.326539] nvme nvme0: EH 0: after recovery > [ 599.326760] nvme0n1: detected capacity change from 128035676160 to 0 > [ 599.457208] nvme nvme0: failed to set APST feature (-19) > > nvme_reset_dev should identify whether it is nested. The above should be caused by race between updating controller state, hope I can find some time in this week to investigate it further. Also maybe we can change to remove controller until nested EH has been tried enough times. Thanks, Ming