* [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len
@ 2021-11-26 10:42 Niklas Cassel
2021-11-27 1:08 ` Damien Le Moal
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Niklas Cassel @ 2021-11-26 10:42 UTC (permalink / raw)
To: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg
Cc: damien.lemoal@opensource.wdc.com, Niklas Cassel,
linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org
From: Niklas Cassel <niklas.cassel@wdc.com>
The write pointer in NVMe ZNS is invalid for a zone in zone state full.
The same also holds true for ZAC/ZBC.
The current behavior for NVMe is to simply propagate the wp reported by
the drive, even for full zones. Since the wp is invalid for a full zone,
the wp reported by the drive may be any value.
The way that the sd_zbc driver handles a full zone is to always report
the wp as zone start + zone len, regardless of what the drive reported.
null_blk also follows this convention.
Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write
pointer for a full zone in a consistent way, regardless of the interface
of the underlying zoned block device.
blkzone report before patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8
reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
blkzone report after patch:
start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0
non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)]
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
---
Changes since v1:
- Minor commit message rewording.
- Use if/else instead of setting wp unconditionally and then
conditionally updating it.
drivers/nvme/host/zns.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index bfc259e0d7b8..9f81beb4df4e 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -166,7 +166,10 @@ static int nvme_zone_parse_entry(struct nvme_ns *ns,
zone.len = ns->zsze;
zone.capacity = nvme_lba_to_sect(ns, le64_to_cpu(entry->zcap));
zone.start = nvme_lba_to_sect(ns, le64_to_cpu(entry->zslba));
- zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp));
+ if (zone.cond == BLK_ZONE_COND_FULL)
+ zone.wp = zone.start + zone.len;
+ else
+ zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp));
return cb(&zone, idx, data);
}
--
2.33.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len 2021-11-26 10:42 [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len Niklas Cassel @ 2021-11-27 1:08 ` Damien Le Moal 2021-11-29 11:18 ` Damien Le Moal 2021-12-02 13:35 ` Johannes Thumshirn ` (2 subsequent siblings) 3 siblings, 1 reply; 7+ messages in thread From: Damien Le Moal @ 2021-11-27 1:08 UTC (permalink / raw) To: Niklas Cassel, Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On 2021/11/26 19:42, Niklas Cassel wrote: > From: Niklas Cassel <niklas.cassel@wdc.com> > > The write pointer in NVMe ZNS is invalid for a zone in zone state full. > The same also holds true for ZAC/ZBC. > > The current behavior for NVMe is to simply propagate the wp reported by > the drive, even for full zones. Since the wp is invalid for a full zone, > the wp reported by the drive may be any value. > > The way that the sd_zbc driver handles a full zone is to always report > the wp as zone start + zone len, regardless of what the drive reported. > null_blk also follows this convention. > > Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write > pointer for a full zone in a consistent way, regardless of the interface > of the underlying zoned block device. > > blkzone report before patch: > start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8 > reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] > > blkzone report after patch: > start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0 > non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] > > Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> > --- > Changes since v1: > - Minor commit message rewording. > - Use if/else instead of setting wp unconditionally and then > conditionally updating it. > > drivers/nvme/host/zns.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c > index bfc259e0d7b8..9f81beb4df4e 100644 > --- a/drivers/nvme/host/zns.c > +++ b/drivers/nvme/host/zns.c > @@ -166,7 +166,10 @@ static int nvme_zone_parse_entry(struct nvme_ns *ns, > zone.len = ns->zsze; > zone.capacity = nvme_lba_to_sect(ns, le64_to_cpu(entry->zcap)); > zone.start = nvme_lba_to_sect(ns, le64_to_cpu(entry->zslba)); > - zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); > + if (zone.cond == BLK_ZONE_COND_FULL) > + zone.wp = zone.start + zone.len; > + else > + zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); > > return cb(&zone, idx, data); > } > Looks good. Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Note: read-only zones also have an undefined wp. So I wonder if we should not set the wp similarly to full zones, to match the fact that we cannot write to these zones. Same for offline zones, but these are tricky since they cannot be read either, meaning that wp should be set to the zone start for that case... -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len 2021-11-27 1:08 ` Damien Le Moal @ 2021-11-29 11:18 ` Damien Le Moal 2021-11-29 12:39 ` Niklas Cassel 0 siblings, 1 reply; 7+ messages in thread From: Damien Le Moal @ 2021-11-29 11:18 UTC (permalink / raw) To: Niklas Cassel, Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg Cc: linux-nvme@lists.infradead.org On 2021/11/27 10:14, Damien Le Moal wrote: > On 2021/11/26 19:42, Niklas Cassel wrote: >> From: Niklas Cassel <niklas.cassel@wdc.com> >> >> The write pointer in NVMe ZNS is invalid for a zone in zone state full. >> The same also holds true for ZAC/ZBC. >> >> The current behavior for NVMe is to simply propagate the wp reported by >> the drive, even for full zones. Since the wp is invalid for a full zone, >> the wp reported by the drive may be any value. >> >> The way that the sd_zbc driver handles a full zone is to always report >> the wp as zone start + zone len, regardless of what the drive reported. >> null_blk also follows this convention. >> >> Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write >> pointer for a full zone in a consistent way, regardless of the interface >> of the underlying zoned block device. >> >> blkzone report before patch: >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8 >> reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] >> >> blkzone report after patch: >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0 >> non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] >> >> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> >> --- >> Changes since v1: >> - Minor commit message rewording. >> - Use if/else instead of setting wp unconditionally and then >> conditionally updating it. >> >> drivers/nvme/host/zns.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c >> index bfc259e0d7b8..9f81beb4df4e 100644 >> --- a/drivers/nvme/host/zns.c >> +++ b/drivers/nvme/host/zns.c >> @@ -166,7 +166,10 @@ static int nvme_zone_parse_entry(struct nvme_ns *ns, >> zone.len = ns->zsze; >> zone.capacity = nvme_lba_to_sect(ns, le64_to_cpu(entry->zcap)); >> zone.start = nvme_lba_to_sect(ns, le64_to_cpu(entry->zslba)); >> - zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); >> + if (zone.cond == BLK_ZONE_COND_FULL) >> + zone.wp = zone.start + zone.len; >> + else >> + zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); >> >> return cb(&zone, idx, data); >> } >> > > Looks good. > > Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > > Note: read-only zones also have an undefined wp. So I wonder if we should not > set the wp similarly to full zones, to match the fact that we cannot write to > these zones. Same for offline zones, but these are tricky since they cannot be > read either, meaning that wp should be set to the zone start for that case... Thinking about this some more, I think we should do nothing. Reaction to RO or offline zones will always come from an IO error path, in which case, it should be clear to the user that the zone wp is invalid/undefined. E.g. zonefs has such IO error path. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len 2021-11-29 11:18 ` Damien Le Moal @ 2021-11-29 12:39 ` Niklas Cassel 0 siblings, 0 replies; 7+ messages in thread From: Niklas Cassel @ 2021-11-29 12:39 UTC (permalink / raw) To: Damien Le Moal Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, linux-nvme@lists.infradead.org On Mon, Nov 29, 2021 at 08:18:24PM +0900, Damien Le Moal wrote: > On 2021/11/27 10:14, Damien Le Moal wrote: > > On 2021/11/26 19:42, Niklas Cassel wrote: > >> From: Niklas Cassel <niklas.cassel@wdc.com> > >> > >> The write pointer in NVMe ZNS is invalid for a zone in zone state full. > >> The same also holds true for ZAC/ZBC. > >> > >> The current behavior for NVMe is to simply propagate the wp reported by > >> the drive, even for full zones. Since the wp is invalid for a full zone, > >> the wp reported by the drive may be any value. > >> > >> The way that the sd_zbc driver handles a full zone is to always report > >> the wp as zone start + zone len, regardless of what the drive reported. > >> null_blk also follows this convention. > >> > >> Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write > >> pointer for a full zone in a consistent way, regardless of the interface > >> of the underlying zoned block device. > >> > >> blkzone report before patch: > >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8 > >> reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] > >> > >> blkzone report after patch: > >> start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0 > >> non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] > >> > >> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> > >> --- > >> Changes since v1: > >> - Minor commit message rewording. > >> - Use if/else instead of setting wp unconditionally and then > >> conditionally updating it. > >> > >> drivers/nvme/host/zns.c | 5 ++++- > >> 1 file changed, 4 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c > >> index bfc259e0d7b8..9f81beb4df4e 100644 > >> --- a/drivers/nvme/host/zns.c > >> +++ b/drivers/nvme/host/zns.c > >> @@ -166,7 +166,10 @@ static int nvme_zone_parse_entry(struct nvme_ns *ns, > >> zone.len = ns->zsze; > >> zone.capacity = nvme_lba_to_sect(ns, le64_to_cpu(entry->zcap)); > >> zone.start = nvme_lba_to_sect(ns, le64_to_cpu(entry->zslba)); > >> - zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); > >> + if (zone.cond == BLK_ZONE_COND_FULL) > >> + zone.wp = zone.start + zone.len; > >> + else > >> + zone.wp = nvme_lba_to_sect(ns, le64_to_cpu(entry->wp)); > >> > >> return cb(&zone, idx, data); > >> } > >> > > > > Looks good. > > > > Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> > > > > Note: read-only zones also have an undefined wp. So I wonder if we should not > > set the wp similarly to full zones, to match the fact that we cannot write to > > these zones. Same for offline zones, but these are tricky since they cannot be > > read either, meaning that wp should be set to the zone start for that case... > > Thinking about this some more, I think we should do nothing. Reaction to RO or > offline zones will always come from an IO error path, in which case, it should > be clear to the user that the zone wp is invalid/undefined. E.g. zonefs has such > IO error path. Christoph, Keith, Since there are no longer any outstanding questions on this patch, please (re)consider this patch for inclusion. Kind regards, Niklas ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len 2021-11-26 10:42 [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len Niklas Cassel 2021-11-27 1:08 ` Damien Le Moal @ 2021-12-02 13:35 ` Johannes Thumshirn 2021-12-03 16:00 ` Keith Busch 2021-12-06 7:54 ` Christoph Hellwig 3 siblings, 0 replies; 7+ messages in thread From: Johannes Thumshirn @ 2021-12-02 13:35 UTC (permalink / raw) To: Niklas Cassel, Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg Cc: damien.lemoal@opensource.wdc.com, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Looks good, Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len 2021-11-26 10:42 [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len Niklas Cassel 2021-11-27 1:08 ` Damien Le Moal 2021-12-02 13:35 ` Johannes Thumshirn @ 2021-12-03 16:00 ` Keith Busch 2021-12-06 7:54 ` Christoph Hellwig 3 siblings, 0 replies; 7+ messages in thread From: Keith Busch @ 2021-12-03 16:00 UTC (permalink / raw) To: Niklas Cassel Cc: Jens Axboe, Christoph Hellwig, Sagi Grimberg, damien.lemoal@opensource.wdc.com, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org On Fri, Nov 26, 2021 at 10:42:44AM +0000, Niklas Cassel wrote: > From: Niklas Cassel <niklas.cassel@wdc.com> > > The write pointer in NVMe ZNS is invalid for a zone in zone state full. > The same also holds true for ZAC/ZBC. > > The current behavior for NVMe is to simply propagate the wp reported by > the drive, even for full zones. Since the wp is invalid for a full zone, > the wp reported by the drive may be any value. > > The way that the sd_zbc driver handles a full zone is to always report > the wp as zone start + zone len, regardless of what the drive reported. > null_blk also follows this convention. > > Do the same for NVMe, so that a BLKREPORTZONE ioctl reports the write > pointer for a full zone in a consistent way, regardless of the interface > of the underlying zoned block device. > > blkzone report before patch: > start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0xfffffffffffbfff8 > reset:0 non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] > > blkzone report after patch: > start: 0x000040000, len 0x040000, cap 0x03e000, wptr 0x040000 reset:0 > non-seq:0, zcond:14(fu) [type: 2(SEQ_WRITE_REQUIRED)] Looks good. Reviewed-by: Keith Busch <kbusch@kernel.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len 2021-11-26 10:42 [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len Niklas Cassel ` (2 preceding siblings ...) 2021-12-03 16:00 ` Keith Busch @ 2021-12-06 7:54 ` Christoph Hellwig 3 siblings, 0 replies; 7+ messages in thread From: Christoph Hellwig @ 2021-12-06 7:54 UTC (permalink / raw) To: Niklas Cassel Cc: Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg, damien.lemoal@opensource.wdc.com, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Thanks, applied to nvme-5.16. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2021-12-06 7:54 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-11-26 10:42 [PATCH v2] nvme: report write pointer for a full zone as zone start + zone len Niklas Cassel 2021-11-27 1:08 ` Damien Le Moal 2021-11-29 11:18 ` Damien Le Moal 2021-11-29 12:39 ` Niklas Cassel 2021-12-02 13:35 ` Johannes Thumshirn 2021-12-03 16:00 ` Keith Busch 2021-12-06 7:54 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox