* Re: [PATCH]nvme-pci: Fixes EEH failure on ppc [not found] ` <787e4960b62a03b3888c67e73d7e1ee2@linux.vnet.ibm.com> @ 2018-02-07 1:24 ` Ming Lei 2018-02-07 20:19 ` wenxiong 0 siblings, 1 reply; 2+ messages in thread From: Ming Lei @ 2018-02-07 1:24 UTC (permalink / raw) To: wenxiong; +Cc: Keith Busch, wenxiong, linux-nvme, axboe, linux-kernel, wenxiong On Tue, Feb 06, 2018 at 02:01:05PM -0600, wenxiong wrote: > On 2018-02-06 10:33, Keith Busch wrote: > > On Mon, Feb 05, 2018 at 03:49:40PM -0600, wenxiong@vmlinux.vnet.ibm.com > > wrote: > > > @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return > > > nvme_timeout(struct request *req, bool reserved) > > > struct nvme_command cmd; > > > u32 csts = readl(dev->bar + NVME_REG_CSTS); > > > > > > + /* If PCI error recovery process is happening, we cannot reset or > > > + * the recovery mechanism will surely fail. > > > + */ > > > + if (pci_channel_offline(to_pci_dev(dev->dev))) > > > + return BLK_EH_HANDLED; > > > + > > > > This patch will tell the block layer to complete the request and > > consider > > it a success, but it doesn't look like the command actually completed at > > all. You're going to get data corruption this way, right? Is returning > > BLK_EH_HANDLED immediately really the right thing to do here? > > Hi Ming, > > Can you help checking if it is ok if returning BLK_EH_HANDLEDED in this > case? Hi Wenxiong, Looks Keith is correct, and this timed out request will be completed by block layer and NVMe driver if BLK_EH_HANDLED is returned, but this IO isn't completed actually, so either data loss(write) or read failure is caused. Maybe BLK_EH_RESET_TIMER is fine under this situation. Thanks, Ming ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [PATCH]nvme-pci: Fixes EEH failure on ppc 2018-02-07 1:24 ` [PATCH]nvme-pci: Fixes EEH failure on ppc Ming Lei @ 2018-02-07 20:19 ` wenxiong 0 siblings, 0 replies; 2+ messages in thread From: wenxiong @ 2018-02-07 20:19 UTC (permalink / raw) To: Ming Lei; +Cc: axboe, linux-kernel, linux-nvme, Keith Busch, wenxiong, wenxiong On 2018-02-06 19:24, Ming Lei wrote: > On Tue, Feb 06, 2018 at 02:01:05PM -0600, wenxiong wrote: >> On 2018-02-06 10:33, Keith Busch wrote: >> > On Mon, Feb 05, 2018 at 03:49:40PM -0600, wenxiong@vmlinux.vnet.ibm.com >> > wrote: >> > > @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return >> > > nvme_timeout(struct request *req, bool reserved) >> > > struct nvme_command cmd; >> > > u32 csts = readl(dev->bar + NVME_REG_CSTS); >> > > >> > > + /* If PCI error recovery process is happening, we cannot reset or >> > > + * the recovery mechanism will surely fail. >> > > + */ >> > > + if (pci_channel_offline(to_pci_dev(dev->dev))) >> > > + return BLK_EH_HANDLED; >> > > + >> > >> > This patch will tell the block layer to complete the request and >> > consider >> > it a success, but it doesn't look like the command actually completed at >> > all. You're going to get data corruption this way, right? Is returning >> > BLK_EH_HANDLED immediately really the right thing to do here? >> >> Hi Ming, >> >> Can you help checking if it is ok if returning BLK_EH_HANDLEDED in >> this >> case? > > Hi Wenxiong, > > Looks Keith is correct, and this timed out request will be completed by > block layer and NVMe driver if BLK_EH_HANDLED is returned, but this IO > isn't completed actually, so either data loss(write) or read failure is > caused. > > Maybe BLK_EH_RESET_TIMER is fine under this situation. > > Thanks, > Ming > Hi Ming, Thanks! I have tried with BLK_EH_RESET_TIMER and EEH recovery works fine. I am going to resubmit the patch. Thanks, Wendy ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2018-02-07 20:18 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1517867380-18790-1-git-send-email-wenxiong@vmlinux.vnet.ibm.com>
[not found] ` <20180206163347.GG31110@localhost.localdomain>
[not found] ` <787e4960b62a03b3888c67e73d7e1ee2@linux.vnet.ibm.com>
2018-02-07 1:24 ` [PATCH]nvme-pci: Fixes EEH failure on ppc Ming Lei
2018-02-07 20:19 ` wenxiong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox