* Buffer I/O Errors from Zoned NVME devices @ 2021-02-01 14:36 Jeffrey Lien 2021-02-01 17:53 ` Keith Busch 0 siblings, 1 reply; 6+ messages in thread From: Jeffrey Lien @ 2021-02-01 14:36 UTC (permalink / raw) To: hch@lst.de, kbusch@kernel.org, linux-nvme@lists.infradead.org Christoph, Keith We're seeing a lot of these Buffer I/O errors with our zoned nvme devices. One of the FW developers looked into it and had the following explanation: All these Reads are from the kernel during enumeration and for LBAs that are in last zone's hole hence expected to return boundary error which is getting logged by kernel. [65281.936988] Buffer I/O error on dev nvme1n2, logical block 3800039296, async page read [65281.937165] blk_update_request: I/O error, dev nvme1n2, sector 3800039297 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [65281.937166] Buffer I/O error on dev nvme1n2, logical block 3800039297, async page read [65281.937335] blk_update_request: I/O error, dev nvme1n2, sector 3800039298 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [65281.937336] Buffer I/O error on dev nvme1n2, logical block 3800039298, async page read [65281.937498] blk_update_request: I/O error, dev nvme1n2, sector 3800039299 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 Are you aware of this issue and if so, do you have any recommendations on how to avoid or resolve? Thanks Jeff Lien eSSD Core SW Tools & Drivers Western Digital 2900 37th St NW Building 108-1 Rochester, MN 55901 Email: Jeff.Lien@wdc.com Office: +1-507-322-2416 _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buffer I/O Errors from Zoned NVME devices 2021-02-01 14:36 Buffer I/O Errors from Zoned NVME devices Jeffrey Lien @ 2021-02-01 17:53 ` Keith Busch 2021-02-01 18:02 ` hch 0 siblings, 1 reply; 6+ messages in thread From: Keith Busch @ 2021-02-01 17:53 UTC (permalink / raw) To: Jeffrey Lien; +Cc: hch@lst.de, linux-nvme@lists.infradead.org On Mon, Feb 01, 2021 at 02:36:12PM +0000, Jeffrey Lien wrote: > Christoph, Keith > We're seeing a lot of these Buffer I/O errors with our zoned nvme devices. One of the FW developers looked into it and had the following explanation: > All these Reads are from the kernel during enumeration and for LBAs that are in last zone's hole hence expected to return boundary error which is getting logged by kernel. > > [65281.936988] Buffer I/O error on dev nvme1n2, logical block 3800039296, async page read > [65281.937165] blk_update_request: I/O error, dev nvme1n2, sector 3800039297 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > [65281.937166] Buffer I/O error on dev nvme1n2, logical block 3800039297, async page read > [65281.937335] blk_update_request: I/O error, dev nvme1n2, sector 3800039298 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > [65281.937336] Buffer I/O error on dev nvme1n2, logical block 3800039298, async page read > [65281.937498] blk_update_request: I/O error, dev nvme1n2, sector 3800039299 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > > Are you aware of this issue and if so, do you have any recommendations on how to avoid or resolve? Is this from the partition scanning? We don't partition zoned devices, so I think we can skip it. Does the following resolve the issue? --- diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 1a3cdc6b1036..fafd02ab3a46 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3925,6 +3925,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, unsigned nsid, } } + if (blk_queue_is_zoned(ns->queue)) + disk->flags |= GENHD_FL_NO_PART_SCAN; + down_write(&ctrl->namespaces_rwsem); list_add_tail(&ns->list, &ctrl->namespaces); up_write(&ctrl->namespaces_rwsem); -- _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: Buffer I/O Errors from Zoned NVME devices 2021-02-01 17:53 ` Keith Busch @ 2021-02-01 18:02 ` hch 2021-02-01 21:04 ` Damien Le Moal 0 siblings, 1 reply; 6+ messages in thread From: hch @ 2021-02-01 18:02 UTC (permalink / raw) To: Keith Busch Cc: Damien Le Moal, hch@lst.de, linux-nvme@lists.infradead.org, Jeffrey Lien On Mon, Feb 01, 2021 at 09:53:06AM -0800, Keith Busch wrote: > On Mon, Feb 01, 2021 at 02:36:12PM +0000, Jeffrey Lien wrote: > > Christoph, Keith > > We're seeing a lot of these Buffer I/O errors with our zoned nvme devices. One of the FW developers looked into it and had the following explanation: > > All these Reads are from the kernel during enumeration and for LBAs that are in last zone's hole hence expected to return boundary error which is getting logged by kernel. > > > > [65281.936988] Buffer I/O error on dev nvme1n2, logical block 3800039296, async page read > > [65281.937165] blk_update_request: I/O error, dev nvme1n2, sector 3800039297 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > > [65281.937166] Buffer I/O error on dev nvme1n2, logical block 3800039297, async page read > > [65281.937335] blk_update_request: I/O error, dev nvme1n2, sector 3800039298 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > > [65281.937336] Buffer I/O error on dev nvme1n2, logical block 3800039298, async page read > > [65281.937498] blk_update_request: I/O error, dev nvme1n2, sector 3800039299 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > > > > Are you aware of this issue and if so, do you have any recommendations on how to avoid or resolve? > > Is this from the partition scanning? We don't partition zoned devices, > so I think we can skip it. Does the following resolve the issue? We already have special zoned device handling in the partitioning code. But NVMe should make sure to never span a zone boundary as we set the chunk size to avoid that. What kernel version is this? Is CONFIG_BLK_DEV_ZONED enabled? _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buffer I/O Errors from Zoned NVME devices 2021-02-01 18:02 ` hch @ 2021-02-01 21:04 ` Damien Le Moal 2021-02-02 15:06 ` Jeffrey Lien 0 siblings, 1 reply; 6+ messages in thread From: Damien Le Moal @ 2021-02-01 21:04 UTC (permalink / raw) To: hch@lst.de, Keith Busch; +Cc: linux-nvme@lists.infradead.org, Jeffrey Lien On 2021/02/02 3:03, hch@lst.de wrote: > On Mon, Feb 01, 2021 at 09:53:06AM -0800, Keith Busch wrote: >> On Mon, Feb 01, 2021 at 02:36:12PM +0000, Jeffrey Lien wrote: >>> Christoph, Keith >>> We're seeing a lot of these Buffer I/O errors with our zoned nvme devices. One of the FW developers looked into it and had the following explanation: >>> All these Reads are from the kernel during enumeration and for LBAs that are in last zone's hole hence expected to return boundary error which is getting logged by kernel. >>> >>> [65281.936988] Buffer I/O error on dev nvme1n2, logical block 3800039296, async page read >>> [65281.937165] blk_update_request: I/O error, dev nvme1n2, sector 3800039297 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 >>> [65281.937166] Buffer I/O error on dev nvme1n2, logical block 3800039297, async page read >>> [65281.937335] blk_update_request: I/O error, dev nvme1n2, sector 3800039298 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 >>> [65281.937336] Buffer I/O error on dev nvme1n2, logical block 3800039298, async page read >>> [65281.937498] blk_update_request: I/O error, dev nvme1n2, sector 3800039299 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 >>> >>> Are you aware of this issue and if so, do you have any recommendations on how to avoid or resolve? >> >> Is this from the partition scanning? We don't partition zoned devices, >> so I think we can skip it. Does the following resolve the issue? > > We already have special zoned device handling in the partitioning code. Partitions are ignored and warning printed, but the partition table is still being read... > > But NVMe should make sure to never span a zone boundary as we set the > chunk size to avoid that. > > What kernel version is this? Is CONFIG_BLK_DEV_ZONED enabled? I had a very similar problem doing zonefs tests on Matias machine on a ZNS drive last week. The problem was the firmware... An upgrade to the latest version fixed the issue. Not sure what FW rev you are running here, but upgrading might solve this. > -- Damien Le Moal Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Buffer I/O Errors from Zoned NVME devices 2021-02-01 21:04 ` Damien Le Moal @ 2021-02-02 15:06 ` Jeffrey Lien 2021-02-02 15:22 ` Keith Busch 0 siblings, 1 reply; 6+ messages in thread From: Jeffrey Lien @ 2021-02-02 15:06 UTC (permalink / raw) To: Damien Le Moal, hch@lst.de, Keith Busch; +Cc: linux-nvme@lists.infradead.org Keith, Christoph, Damien, This errors are happening on both the 5.9 and 5.10.7 kernels. CONFIG_BLK_DEV_ZONED is set to y in the .config file. I will try the patch to disable partition scanning that Keith suggested. I'll also get the latest FW loaded and see if that resolves the issue. -----Original Message----- From: Damien Le Moal <Damien.LeMoal@wdc.com> Sent: Monday, February 1, 2021 3:05 PM To: hch@lst.de; Keith Busch <kbusch@kernel.org> Cc: Jeffrey Lien <Jeff.Lien@wdc.com>; linux-nvme@lists.infradead.org Subject: Re: Buffer I/O Errors from Zoned NVME devices On 2021/02/02 3:03, hch@lst.de wrote: > On Mon, Feb 01, 2021 at 09:53:06AM -0800, Keith Busch wrote: >> On Mon, Feb 01, 2021 at 02:36:12PM +0000, Jeffrey Lien wrote: >>> Christoph, Keith >>> We're seeing a lot of these Buffer I/O errors with our zoned nvme devices. One of the FW developers looked into it and had the following explanation: >>> All these Reads are from the kernel during enumeration and for LBAs that are in last zone's hole hence expected to return boundary error which is getting logged by kernel. >>> >>> [65281.936988] Buffer I/O error on dev nvme1n2, logical block >>> 3800039296, async page read [65281.937165] blk_update_request: I/O >>> error, dev nvme1n2, sector 3800039297 op 0x0:(READ) flags 0x0 >>> phys_seg 1 prio class 0 [65281.937166] Buffer I/O error on dev >>> nvme1n2, logical block 3800039297, async page read [65281.937335] >>> blk_update_request: I/O error, dev nvme1n2, sector 3800039298 op >>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [65281.937336] Buffer >>> I/O error on dev nvme1n2, logical block 3800039298, async page read >>> [65281.937498] blk_update_request: I/O error, dev nvme1n2, sector >>> 3800039299 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 >>> >>> Are you aware of this issue and if so, do you have any recommendations on how to avoid or resolve? >> >> Is this from the partition scanning? We don't partition zoned >> devices, so I think we can skip it. Does the following resolve the issue? > > We already have special zoned device handling in the partitioning code. Partitions are ignored and warning printed, but the partition table is still being read... > > But NVMe should make sure to never span a zone boundary as we set the > chunk size to avoid that. > > What kernel version is this? Is CONFIG_BLK_DEV_ZONED enabled? I had a very similar problem doing zonefs tests on Matias machine on a ZNS drive last week. The problem was the firmware... An upgrade to the latest version fixed the issue. Not sure what FW rev you are running here, but upgrading might solve this. > -- Damien Le Moal Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Buffer I/O Errors from Zoned NVME devices 2021-02-02 15:06 ` Jeffrey Lien @ 2021-02-02 15:22 ` Keith Busch 0 siblings, 0 replies; 6+ messages in thread From: Keith Busch @ 2021-02-02 15:22 UTC (permalink / raw) To: Jeffrey Lien; +Cc: Damien Le Moal, hch@lst.de, linux-nvme@lists.infradead.org On Tue, Feb 02, 2021 at 03:06:22PM +0000, Jeffrey Lien wrote: > Keith, Christoph, Damien, > This errors are happening on both the 5.9 and 5.10.7 kernels. CONFIG_BLK_DEV_ZONED is set to y in the .config file. > > I will try the patch to disable partition scanning that Keith suggested. I'll also get the latest FW loaded and see if that resolves the issue. After re-reading FW dev's explanation, it sounds like something is off with the implementation. The spec only allows a "boundary error" if you're crossing zones, but you said the reads are in the last zone, so there's no opprotunity to cross to the next zone. What did you mean by the "zone's hole"? Does this drive have ZCAP less than ZSZE and we're reading from unmapped LBAs? If so, I think we are supposed to be allowed to read these, but we just can't write them. > -----Original Message----- > From: Damien Le Moal <Damien.LeMoal@wdc.com> > Sent: Monday, February 1, 2021 3:05 PM > To: hch@lst.de; Keith Busch <kbusch@kernel.org> > Cc: Jeffrey Lien <Jeff.Lien@wdc.com>; linux-nvme@lists.infradead.org > Subject: Re: Buffer I/O Errors from Zoned NVME devices > > On 2021/02/02 3:03, hch@lst.de wrote: > > On Mon, Feb 01, 2021 at 09:53:06AM -0800, Keith Busch wrote: > >> On Mon, Feb 01, 2021 at 02:36:12PM +0000, Jeffrey Lien wrote: > >>> Christoph, Keith > >>> We're seeing a lot of these Buffer I/O errors with our zoned nvme devices. One of the FW developers looked into it and had the following explanation: > >>> All these Reads are from the kernel during enumeration and for LBAs that are in last zone's hole hence expected to return boundary error which is getting logged by kernel. > >>> > >>> [65281.936988] Buffer I/O error on dev nvme1n2, logical block > >>> 3800039296, async page read [65281.937165] blk_update_request: I/O > >>> error, dev nvme1n2, sector 3800039297 op 0x0:(READ) flags 0x0 > >>> phys_seg 1 prio class 0 [65281.937166] Buffer I/O error on dev > >>> nvme1n2, logical block 3800039297, async page read [65281.937335] > >>> blk_update_request: I/O error, dev nvme1n2, sector 3800039298 op > >>> 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 [65281.937336] Buffer > >>> I/O error on dev nvme1n2, logical block 3800039298, async page read > >>> [65281.937498] blk_update_request: I/O error, dev nvme1n2, sector > >>> 3800039299 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 > >>> > >>> Are you aware of this issue and if so, do you have any recommendations on how to avoid or resolve? > >> > >> Is this from the partition scanning? We don't partition zoned > >> devices, so I think we can skip it. Does the following resolve the issue? > > > > We already have special zoned device handling in the partitioning code. > > Partitions are ignored and warning printed, but the partition table is still being read... > > > > > But NVMe should make sure to never span a zone boundary as we set the > > chunk size to avoid that. > > > > What kernel version is this? Is CONFIG_BLK_DEV_ZONED enabled? > > I had a very similar problem doing zonefs tests on Matias machine on a ZNS drive last week. The problem was the firmware... An upgrade to the latest version fixed the issue. Not sure what FW rev you are running here, but upgrading might solve this. > > > > > > -- > Damien Le Moal > Western Digital Research _______________________________________________ Linux-nvme mailing list Linux-nvme@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2021-02-02 15:23 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-02-01 14:36 Buffer I/O Errors from Zoned NVME devices Jeffrey Lien 2021-02-01 17:53 ` Keith Busch 2021-02-01 18:02 ` hch 2021-02-01 21:04 ` Damien Le Moal 2021-02-02 15:06 ` Jeffrey Lien 2021-02-02 15:22 ` Keith Busch
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.