All of lore.kernel.org
 help / color / mirror / Atom feed
From: ming.lei@redhat.com (Ming Lei)
Subject: [PATCH]nvme-pci: Fixes EEH failure on ppc
Date: Wed, 7 Feb 2018 09:24:37 +0800	[thread overview]
Message-ID: <20180207012353.GD13470@ming.t460p> (raw)
In-Reply-To: <787e4960b62a03b3888c67e73d7e1ee2@linux.vnet.ibm.com>

On Tue, Feb 06, 2018@02:01:05PM -0600, wenxiong wrote:
> On 2018-02-06 10:33, Keith Busch wrote:
> > On Mon, Feb 05, 2018 at 03:49:40PM -0600, wenxiong at vmlinux.vnet.ibm.com
> > wrote:
> > > @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return
> > > nvme_timeout(struct request *req, bool reserved)
> > >  	struct nvme_command cmd;
> > >  	u32 csts = readl(dev->bar + NVME_REG_CSTS);
> > > 
> > > +	/* If PCI error recovery process is happening, we cannot reset or
> > > +	 * the recovery mechanism will surely fail.
> > > +	 */
> > > +	if (pci_channel_offline(to_pci_dev(dev->dev)))
> > > +		return BLK_EH_HANDLED;
> > > +
> > 
> > This patch will tell the block layer to complete the request and
> > consider
> > it a success, but it doesn't look like the command actually completed at
> > all. You're going to get data corruption this way, right? Is returning
> > BLK_EH_HANDLED immediately really the right thing to do here?
> 
> Hi Ming,
> 
> Can you help checking if it is ok if returning BLK_EH_HANDLEDED in this
> case?

Hi Wenxiong,

Looks Keith is correct, and this timed out request will be completed by
block layer and NVMe driver if BLK_EH_HANDLED is returned, but this IO
isn't completed actually, so either data loss(write) or read failure is
caused.

Maybe BLK_EH_RESET_TIMER is fine under this situation.

Thanks,
Ming

WARNING: multiple messages have this Message-ID (diff)
From: Ming Lei <ming.lei@redhat.com>
To: wenxiong <wenxiong@linux.vnet.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>,
	wenxiong@vmlinux.vnet.ibm.com, linux-nvme@lists.infradead.org,
	axboe@fb.com, linux-kernel@vger.kernel.org, wenxiong@us.ibm.com
Subject: Re: [PATCH]nvme-pci: Fixes EEH failure on ppc
Date: Wed, 7 Feb 2018 09:24:37 +0800	[thread overview]
Message-ID: <20180207012353.GD13470@ming.t460p> (raw)
In-Reply-To: <787e4960b62a03b3888c67e73d7e1ee2@linux.vnet.ibm.com>

On Tue, Feb 06, 2018 at 02:01:05PM -0600, wenxiong wrote:
> On 2018-02-06 10:33, Keith Busch wrote:
> > On Mon, Feb 05, 2018 at 03:49:40PM -0600, wenxiong@vmlinux.vnet.ibm.com
> > wrote:
> > > @@ -1189,6 +1183,12 @@ static enum blk_eh_timer_return
> > > nvme_timeout(struct request *req, bool reserved)
> > >  	struct nvme_command cmd;
> > >  	u32 csts = readl(dev->bar + NVME_REG_CSTS);
> > > 
> > > +	/* If PCI error recovery process is happening, we cannot reset or
> > > +	 * the recovery mechanism will surely fail.
> > > +	 */
> > > +	if (pci_channel_offline(to_pci_dev(dev->dev)))
> > > +		return BLK_EH_HANDLED;
> > > +
> > 
> > This patch will tell the block layer to complete the request and
> > consider
> > it a success, but it doesn't look like the command actually completed at
> > all. You're going to get data corruption this way, right? Is returning
> > BLK_EH_HANDLED immediately really the right thing to do here?
> 
> Hi Ming,
> 
> Can you help checking if it is ok if returning BLK_EH_HANDLEDED in this
> case?

Hi Wenxiong,

Looks Keith is correct, and this timed out request will be completed by
block layer and NVMe driver if BLK_EH_HANDLED is returned, but this IO
isn't completed actually, so either data loss(write) or read failure is
caused.

Maybe BLK_EH_RESET_TIMER is fine under this situation.

Thanks,
Ming

  reply	other threads:[~2018-02-07  1:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05 21:49 [PATCH]nvme-pci: Fixes EEH failure on ppc wenxiong
2018-02-06  9:54 ` Sagi Grimberg
2018-02-06 16:33 ` Keith Busch
2018-02-06 16:55   ` wenxiong
2018-02-06 17:02     ` Keith Busch
2018-02-06 17:08       ` wenxiong
2018-02-06 17:15         ` Keith Busch
2018-02-06 18:00           ` wenxiong
2018-02-06 20:01   ` wenxiong
2018-02-07  1:24     ` Ming Lei [this message]
2018-02-07  1:24       ` Ming Lei
2018-02-07 20:19       ` wenxiong
2018-02-07 20:19         ` wenxiong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180207012353.GD13470@ming.t460p \
    --to=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.