* io_opt fixups
@ 2024-07-01 5:17 Christoph Hellwig
2024-07-01 5:17 ` [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits Christoph Hellwig
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Christoph Hellwig @ 2024-07-01 5:17 UTC (permalink / raw)
To: Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen, linux-block,
linux-nvme
Hi all,
I recently noticed that on my test VMs with the Qemu NVMe emulations I
see a zero max_setors_kb limit in sysfs. It turns out Qemu advertises
a one-LBA optimal write size, which is a bit silly.
This series handles this odd case properly both in the block layer and
the nvme driver as a sort of defense in depth.
Diffstat:
block/blk-settings.c | 5 ++---
drivers/nvme/host/core.c | 3 ++-
2 files changed, 4 insertions(+), 4 deletions(-)
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits
2024-07-01 5:17 io_opt fixups Christoph Hellwig
@ 2024-07-01 5:17 ` Christoph Hellwig
2024-07-01 6:37 ` Chaitanya Kulkarni
2024-07-01 7:06 ` Sagi Grimberg
2024-07-01 5:17 ` [PATCH 2/3] block: don't reduce max_sectors based on io_opt Christoph Hellwig
` (4 subsequent siblings)
5 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2024-07-01 5:17 UTC (permalink / raw)
To: Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen, linux-block,
linux-nvme
If io_min is larger than the cap, it must by definition be non-zero.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-settings.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 2e559cf97cc834..ff8bbc101fedaa 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -279,8 +279,7 @@ static int blk_validate_limits(struct queue_limits *lim)
} else if (lim->io_opt) {
lim->max_sectors =
min(max_hw_sectors, lim->io_opt >> SECTOR_SHIFT);
- } else if (lim->io_min &&
- lim->io_min > (BLK_DEF_MAX_SECTORS_CAP << SECTOR_SHIFT)) {
+ } else if (lim->io_min > (BLK_DEF_MAX_SECTORS_CAP << SECTOR_SHIFT)) {
lim->max_sectors =
min(max_hw_sectors, lim->io_min >> SECTOR_SHIFT);
} else {
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] block: don't reduce max_sectors based on io_opt
2024-07-01 5:17 io_opt fixups Christoph Hellwig
2024-07-01 5:17 ` [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits Christoph Hellwig
@ 2024-07-01 5:17 ` Christoph Hellwig
2024-07-01 6:42 ` Chaitanya Kulkarni
2024-07-01 7:08 ` Sagi Grimberg
2024-07-01 5:17 ` [PATCH 3/3] nvme: don't set io_opt if NOWS is zero Christoph Hellwig
` (3 subsequent siblings)
5 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2024-07-01 5:17 UTC (permalink / raw)
To: Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen, linux-block,
linux-nvme
Don't reduce the max_sectors value below the normal cap when the driver
advertsizes a very low io_opt. This restores the behavior we had before
the recent changes to the max_sectors calculation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-settings.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index ff8bbc101fedaa..9fa4eed4df06b0 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -276,7 +276,7 @@ static int blk_validate_limits(struct queue_limits *lim)
if (lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
return -EINVAL;
lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors);
- } else if (lim->io_opt) {
+ } else if (lim->io_opt > (BLK_DEF_MAX_SECTORS_CAP << SECTOR_SHIFT)) {
lim->max_sectors =
min(max_hw_sectors, lim->io_opt >> SECTOR_SHIFT);
} else if (lim->io_min > (BLK_DEF_MAX_SECTORS_CAP << SECTOR_SHIFT)) {
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] nvme: don't set io_opt if NOWS is zero
2024-07-01 5:17 io_opt fixups Christoph Hellwig
2024-07-01 5:17 ` [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits Christoph Hellwig
2024-07-01 5:17 ` [PATCH 2/3] block: don't reduce max_sectors based on io_opt Christoph Hellwig
@ 2024-07-01 5:17 ` Christoph Hellwig
2024-07-01 7:02 ` Chaitanya Kulkarni
` (2 more replies)
[not found] ` <CGME20240701073115epcas5p3e2758375e272cba80281fe0ac37394d0@epcas5p3.samsung.com>
` (2 subsequent siblings)
5 siblings, 3 replies; 15+ messages in thread
From: Christoph Hellwig @ 2024-07-01 5:17 UTC (permalink / raw)
To: Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen, linux-block,
linux-nvme
NOWS is one of the annoying "0's based values" in NVMe, where 0 means one
and we thus can't detect if it isn't set. Thus a NOWS value of 0 means
that the Namespace Optimal Write Size is a single LBA, which is clearly
bogus. Ignore the value in that case and don't propagate an io_opt
value to the block layer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/nvme/host/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ab0429644fe378..6784fc6e95625e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2025,7 +2025,8 @@ static bool nvme_update_disk_info(struct nvme_ns *ns, struct nvme_id_ns *id,
/* NPWG = Namespace Preferred Write Granularity */
phys_bs = bs * (1 + le16_to_cpu(id->npwg));
/* NOWS = Namespace Optimal Write Size */
- io_opt = bs * (1 + le16_to_cpu(id->nows));
+ if (id->nows)
+ io_opt = bs * (1 + le16_to_cpu(id->nows));
}
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits
2024-07-01 5:17 ` [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits Christoph Hellwig
@ 2024-07-01 6:37 ` Chaitanya Kulkarni
2024-07-01 7:06 ` Sagi Grimberg
1 sibling, 0 replies; 15+ messages in thread
From: Chaitanya Kulkarni @ 2024-07-01 6:37 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org
On 6/30/24 22:17, Christoph Hellwig wrote:
> If io_min is larger than the cap, it must by definition be non-zero.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
given that BLK_DEF_MAX_SECTORS_CAP = 2560u.
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] block: don't reduce max_sectors based on io_opt
2024-07-01 5:17 ` [PATCH 2/3] block: don't reduce max_sectors based on io_opt Christoph Hellwig
@ 2024-07-01 6:42 ` Chaitanya Kulkarni
2024-07-01 7:08 ` Sagi Grimberg
1 sibling, 0 replies; 15+ messages in thread
From: Chaitanya Kulkarni @ 2024-07-01 6:42 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org
On 6/30/24 22:17, Christoph Hellwig wrote:
> Don't reduce the max_sectors value below the normal cap when the driver
> advertsizes a very low io_opt. This restores the behavior we had before
> the recent changes to the max_sectors calculation.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
> ---
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] nvme: don't set io_opt if NOWS is zero
2024-07-01 5:17 ` [PATCH 3/3] nvme: don't set io_opt if NOWS is zero Christoph Hellwig
@ 2024-07-01 7:02 ` Chaitanya Kulkarni
2024-07-01 7:08 ` Sagi Grimberg
2024-07-01 13:55 ` Keith Busch
2 siblings, 0 replies; 15+ messages in thread
From: Chaitanya Kulkarni @ 2024-07-01 7:02 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org
On 6/30/24 22:17, Christoph Hellwig wrote:
> NOWS is one of the annoying "0's based values" in NVMe, where 0 means one
> and we thus can't detect if it isn't set. Thus a NOWS value of 0 means
> that the Namespace Optimal Write Size is a single LBA, which is clearly
> bogus. Ignore the value in that case and don't propagate an io_opt
> value to the block layer.
>
> Signed-off-by: Christoph Hellwig<hch@lst.de>
indeed (Figure 97: Identify -> Namespace Optimal Write Size (NOWS)).
wonder why can't spec use 0xFFFF to indicate not supported for all
zero based values to avoid this confusion ?
Looks good.
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
-ck
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits
2024-07-01 5:17 ` [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits Christoph Hellwig
2024-07-01 6:37 ` Chaitanya Kulkarni
@ 2024-07-01 7:06 ` Sagi Grimberg
1 sibling, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2024-07-01 7:06 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Keith Busch, Martin K . Petersen, linux-block, linux-nvme
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] block: don't reduce max_sectors based on io_opt
2024-07-01 5:17 ` [PATCH 2/3] block: don't reduce max_sectors based on io_opt Christoph Hellwig
2024-07-01 6:42 ` Chaitanya Kulkarni
@ 2024-07-01 7:08 ` Sagi Grimberg
1 sibling, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2024-07-01 7:08 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Keith Busch, Martin K . Petersen, linux-block, linux-nvme
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] nvme: don't set io_opt if NOWS is zero
2024-07-01 5:17 ` [PATCH 3/3] nvme: don't set io_opt if NOWS is zero Christoph Hellwig
2024-07-01 7:02 ` Chaitanya Kulkarni
@ 2024-07-01 7:08 ` Sagi Grimberg
2024-07-01 13:55 ` Keith Busch
2 siblings, 0 replies; 15+ messages in thread
From: Sagi Grimberg @ 2024-07-01 7:08 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe
Cc: Keith Busch, Martin K . Petersen, linux-block, linux-nvme
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: io_opt fixups
[not found] ` <CGME20240701073115epcas5p3e2758375e272cba80281fe0ac37394d0@epcas5p3.samsung.com>
@ 2024-07-01 7:24 ` Nitesh Shetty
0 siblings, 0 replies; 15+ messages in thread
From: Nitesh Shetty @ 2024-07-01 7:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Keith Busch, Sagi Grimberg, Martin K . Petersen,
linux-block, linux-nvme
[-- Attachment #1: Type: text/plain, Size: 570 bytes --]
On 01/07/24 07:17AM, Christoph Hellwig wrote:
>Hi all,
>
>I recently noticed that on my test VMs with the Qemu NVMe emulations I
>see a zero max_setors_kb limit in sysfs. It turns out Qemu advertises
>a one-LBA optimal write size, which is a bit silly.
>
>This series handles this odd case properly both in the block layer and
>the nvme driver as a sort of defense in depth.
>
>Diffstat:
> block/blk-settings.c | 5 ++---
> drivers/nvme/host/core.c | 3 ++-
> 2 files changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: io_opt fixups
2024-07-01 5:17 io_opt fixups Christoph Hellwig
` (3 preceding siblings ...)
[not found] ` <CGME20240701073115epcas5p3e2758375e272cba80281fe0ac37394d0@epcas5p3.samsung.com>
@ 2024-07-01 10:22 ` Johannes Thumshirn
2024-07-01 12:53 ` Jens Axboe
5 siblings, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2024-07-01 10:22 UTC (permalink / raw)
To: hch, Jens Axboe
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen,
linux-block@vger.kernel.org, linux-nvme@lists.infradead.org
Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: io_opt fixups
2024-07-01 5:17 io_opt fixups Christoph Hellwig
` (4 preceding siblings ...)
2024-07-01 10:22 ` Johannes Thumshirn
@ 2024-07-01 12:53 ` Jens Axboe
5 siblings, 0 replies; 15+ messages in thread
From: Jens Axboe @ 2024-07-01 12:53 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Keith Busch, Sagi Grimberg, Martin K . Petersen, linux-block,
linux-nvme
On Mon, 01 Jul 2024 07:17:49 +0200, Christoph Hellwig wrote:
> I recently noticed that on my test VMs with the Qemu NVMe emulations I
> see a zero max_setors_kb limit in sysfs. It turns out Qemu advertises
> a one-LBA optimal write size, which is a bit silly.
>
> This series handles this odd case properly both in the block layer and
> the nvme driver as a sort of defense in depth.
>
> [...]
Applied, thanks!
[1/3] block: remove a duplicate io_min check in blk_validate_limits
commit: f62e8edc0a9fda84fe5bf32d5f5874b489d6c301
[2/3] block: don't reduce max_sectors based on io_opt
commit: 37105615f73125cb0466c09796f277a4c46d9295
[3/3] nvme: don't set io_opt if NOWS is zero
commit: f3bf25d5135539603f24e377c6dec3016fbd9786
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] nvme: don't set io_opt if NOWS is zero
2024-07-01 5:17 ` [PATCH 3/3] nvme: don't set io_opt if NOWS is zero Christoph Hellwig
2024-07-01 7:02 ` Chaitanya Kulkarni
2024-07-01 7:08 ` Sagi Grimberg
@ 2024-07-01 13:55 ` Keith Busch
2024-07-01 15:11 ` Christoph Hellwig
2 siblings, 1 reply; 15+ messages in thread
From: Keith Busch @ 2024-07-01 13:55 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jens Axboe, Sagi Grimberg, Martin K . Petersen, linux-block,
linux-nvme
On Mon, Jul 01, 2024 at 07:17:52AM +0200, Christoph Hellwig wrote:
> NOWS is one of the annoying "0's based values" in NVMe, where 0 means one
> and we thus can't detect if it isn't set.
We can detect if it is set based on the namespace features flags,
though.
> Thus a NOWS value of 0 means
> that the Namespace Optimal Write Size is a single LBA, which is clearly
> bogus. Ignore the value in that case and don't propagate an io_opt
> value to the block layer.
Hm, why is that clearly bogus? Optane SSDs were optimized for
single-sector writes.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] nvme: don't set io_opt if NOWS is zero
2024-07-01 13:55 ` Keith Busch
@ 2024-07-01 15:11 ` Christoph Hellwig
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2024-07-01 15:11 UTC (permalink / raw)
To: Keith Busch
Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg, Martin K . Petersen,
linux-block, linux-nvme
On Mon, Jul 01, 2024 at 07:55:37AM -0600, Keith Busch wrote:
> On Mon, Jul 01, 2024 at 07:17:52AM +0200, Christoph Hellwig wrote:
> > NOWS is one of the annoying "0's based values" in NVMe, where 0 means one
> > and we thus can't detect if it isn't set.
>
> We can detect if it is set based on the namespace features flags,
> though.
Except that the flag covers 5 different flags, for 2 different operations.
> > Thus a NOWS value of 0 means
> > that the Namespace Optimal Write Size is a single LBA, which is clearly
> > bogus. Ignore the value in that case and don't propagate an io_opt
> > value to the block layer.
>
> Hm, why is that clearly bogus? Optane SSDs were optimized for
> single-sector writes.
Because even if they optimize for it, the pure overhead of both
the PCIe physical layer, nvme command overhead and software overhead
will never make it more optimal to split a larger I/O down to this
size, which the optimal write size is about.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-07-01 15:11 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-01 5:17 io_opt fixups Christoph Hellwig
2024-07-01 5:17 ` [PATCH 1/3] block: remove a duplicate io_min check in blk_validate_limits Christoph Hellwig
2024-07-01 6:37 ` Chaitanya Kulkarni
2024-07-01 7:06 ` Sagi Grimberg
2024-07-01 5:17 ` [PATCH 2/3] block: don't reduce max_sectors based on io_opt Christoph Hellwig
2024-07-01 6:42 ` Chaitanya Kulkarni
2024-07-01 7:08 ` Sagi Grimberg
2024-07-01 5:17 ` [PATCH 3/3] nvme: don't set io_opt if NOWS is zero Christoph Hellwig
2024-07-01 7:02 ` Chaitanya Kulkarni
2024-07-01 7:08 ` Sagi Grimberg
2024-07-01 13:55 ` Keith Busch
2024-07-01 15:11 ` Christoph Hellwig
[not found] ` <CGME20240701073115epcas5p3e2758375e272cba80281fe0ac37394d0@epcas5p3.samsung.com>
2024-07-01 7:24 ` io_opt fixups Nitesh Shetty
2024-07-01 10:22 ` Johannes Thumshirn
2024-07-01 12:53 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).