[PATCH] nvme: zns: limit max_zone_append by max

public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed

* [PATCH] nvme: zns: limit max_zone_append by max_segments
@ 2023-07-31 11:46 Shin'ichiro Kawasaki
  2023-07-31 12:03 ` Damien Le Moal
  2023-07-31 13:46 ` Christoph Hellwig
  0 siblings, 2 replies; 13+ messages in thread
From: Shin'ichiro Kawasaki @ 2023-07-31 11:46 UTC (permalink / raw)
  To: linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Damien Le Moal,
	Johannes Thumshirn, Shin'ichiro Kawasaki

BIOs for zone append operation can not be split by nature, since device
returns the written sectors of the BIOs. Then the number of segments of
the BIOs shall not be larger than max_segments limit of devices to avoid
bio split due to the number of segments.

However, the nvme driver sets max_zone_append limit referring the limit
obtained from ZNS devices, regardless of max_segments limit of the
devices. This allows zone append BIOs to have large size, and also to
have number of segments larger than the max_segments limit. This causes
unexpected BIO split. It results in WARN in bio_split() followed by
KASAN issues. The recent commit 16d7fd3cfa72 ("zonefs: use iomap for
synchronous direct writes") triggered such BIO split. The commit
modified BIO preparation for zone append operations in zonefs, then
pages for the BIOs are set up not by bio_add_hw_page() but by
bio_iov_add_page(). This prepared each BIO segment to have one page, and
increased the number of segments. Hence the BIO split.

To avoid the unexpected BIO split, reflect the max_segments limit to the
max_zone_append limit. In the worst case, the safe max_zone_append size
is max_segments multiplied by PAGE_SIZE. Compare it with the
max_zone_append size obtained from ZNS devices, and set the smaller
value as the max_zone_append limit.

Fixes: 240e6ee272c0 ("nvme: support for zoned namespaces")
Cc: stable@vger.kernel.org
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
 drivers/nvme/host/zns.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index ec8557810c21..9ee77626c235 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -10,9 +10,11 @@
 int nvme_revalidate_zones(struct nvme_ns *ns)
 {
 	struct request_queue *q = ns->queue;
+	unsigned int max_sectors = queue_max_segments(q) << PAGE_SECTORS_SHIFT;

 	blk_queue_chunk_sectors(q, ns->zsze);
-	blk_queue_max_zone_append_sectors(q, ns->ctrl->max_zone_append);
+	max_sectors = min(max_sectors, ns->ctrl->max_zone_append);
+	blk_queue_max_zone_append_sectors(q, max_sectors);

 	return blk_revalidate_disk_zones(ns->disk, NULL);
 }
-- 
2.40.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 11:46 [PATCH] nvme: zns: limit max_zone_append by max_segments Shin'ichiro Kawasaki
@ 2023-07-31 12:03 ` Damien Le Moal
  2023-07-31 13:51   ` Christoph Hellwig
  2023-07-31 13:46 ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-31 12:03 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki, linux-nvme
  Cc: Christoph Hellwig, Keith Busch, Johannes Thumshirn

On 7/31/23 20:46, Shin'ichiro Kawasaki wrote:
> BIOs for zone append operation can not be split by nature, since device
> returns the written sectors of the BIOs. Then the number of segments of
> the BIOs shall not be larger than max_segments limit of devices to avoid
> bio split due to the number of segments.
> 
> However, the nvme driver sets max_zone_append limit referring the limit
> obtained from ZNS devices, regardless of max_segments limit of the
> devices. This allows zone append BIOs to have large size, and also to
> have number of segments larger than the max_segments limit. This causes
> unexpected BIO split. It results in WARN in bio_split() followed by
> KASAN issues. The recent commit 16d7fd3cfa72 ("zonefs: use iomap for
> synchronous direct writes") triggered such BIO split. The commit
> modified BIO preparation for zone append operations in zonefs, then
> pages for the BIOs are set up not by bio_add_hw_page() but by
> bio_iov_add_page(). This prepared each BIO segment to have one page, and
> increased the number of segments. Hence the BIO split.
> 
> To avoid the unexpected BIO split, reflect the max_segments limit to the
> max_zone_append limit. In the worst case, the safe max_zone_append size
> is max_segments multiplied by PAGE_SIZE. Compare it with the
> max_zone_append size obtained from ZNS devices, and set the smaller
> value as the max_zone_append limit.
> 
> Fixes: 240e6ee272c0 ("nvme: support for zoned namespaces")
> Cc: stable@vger.kernel.org
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> ---
>  drivers/nvme/host/zns.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
> index ec8557810c21..9ee77626c235 100644
> --- a/drivers/nvme/host/zns.c
> +++ b/drivers/nvme/host/zns.c
> @@ -10,9 +10,11 @@
>  int nvme_revalidate_zones(struct nvme_ns *ns)
>  {
>  	struct request_queue *q = ns->queue;
> +	unsigned int max_sectors = queue_max_segments(q) << PAGE_SECTORS_SHIFT;
>  
>  	blk_queue_chunk_sectors(q, ns->zsze);
> -	blk_queue_max_zone_append_sectors(q, ns->ctrl->max_zone_append);
> +	max_sectors = min(max_sectors, ns->ctrl->max_zone_append);
> +	blk_queue_max_zone_append_sectors(q, max_sectors);

You could combine these 2 lines:

	blk_queue_max_zone_append_sectors(q,
		min(max_sectors, ns->ctrl->max_zone_append));

>  
>  	return blk_revalidate_disk_zones(ns->disk, NULL);
>  }

Christoph,

I feel like a lot of the special casing for zone append bio add page can be
removed from the block layer. This issue was found with zonefs tests on real zns
devices because of this huge (and incorrect) zone append limit that zns has,
combined with the recent zonefs iomap write change which overlooked the fact
that bio add page is done by iomap before the bio op is set to zone append. That
resulted in the large BIO. This problem however does not happen with scsi or
null blk, kind-of proving that the regular bio add page is fine for zone append
as long as the issuer has the correct zone append limit. Thought ?

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 11:46 [PATCH] nvme: zns: limit max_zone_append by max_segments Shin'ichiro Kawasaki
  2023-07-31 12:03 ` Damien Le Moal
@ 2023-07-31 13:46 ` Christoph Hellwig
  2023-07-31 14:02   ` Damien Le Moal
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-07-31 13:46 UTC (permalink / raw)
  To: Shin'ichiro Kawasaki
  Cc: linux-nvme, Christoph Hellwig, Keith Busch, Damien Le Moal,
	Johannes Thumshirn

On Mon, Jul 31, 2023 at 08:46:32PM +0900, Shin'ichiro Kawasaki wrote:
> To avoid the unexpected BIO split, reflect the max_segments limit to the
> max_zone_append limit. In the worst case, the safe max_zone_append size
> is max_segments multiplied by PAGE_SIZE. Compare it with the
> max_zone_append size obtained from ZNS devices, and set the smaller
> value as the max_zone_append limit.

This is true only for NVMe and a hand full of drivers that
set the virt_boundary.  For others the maximum size is completely
unrelated to the maximum number of segments.

Can you reword the commit log a bit to make that more clear?

> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
> index ec8557810c21..9ee77626c235 100644
> --- a/drivers/nvme/host/zns.c
> +++ b/drivers/nvme/host/zns.c
> @@ -10,9 +10,11 @@
>  int nvme_revalidate_zones(struct nvme_ns *ns)
>  {
>  	struct request_queue *q = ns->queue;
> +	unsigned int max_sectors = queue_max_segments(q) << PAGE_SECTORS_SHIFT;
>  
>  	blk_queue_chunk_sectors(q, ns->zsze);
> -	blk_queue_max_zone_append_sectors(q, ns->ctrl->max_zone_append);
> +	max_sectors = min(max_sectors, ns->ctrl->max_zone_append);
> +	blk_queue_max_zone_append_sectors(q, max_sectors);

And while this looks correct, it also feels pretty ugly.  Shouldn't a

	blk_queue_max_zone_append_sectors(q,
		queue_max_sectors(q), ns->ctrl->max_zone_append);

do the same thing in a somewhat more obvious way given that max_segments
is already taken into account for the queue_max_sectors calculation?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 12:03 ` Damien Le Moal
@ 2023-07-31 13:51   ` Christoph Hellwig
  2023-07-31 14:01     ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-07-31 13:51 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Shin'ichiro Kawasaki, linux-nvme, Christoph Hellwig,
	Keith Busch, Johannes Thumshirn

On Mon, Jul 31, 2023 at 09:03:46PM +0900, Damien Le Moal wrote:
> I feel like a lot of the special casing for zone append bio add page can be
> removed from the block layer. This issue was found with zonefs tests on real zns
> devices because of this huge (and incorrect) zone append limit that zns has,
> combined with the recent zonefs iomap write change which overlooked the fact
> that bio add page is done by iomap before the bio op is set to zone append. That
> resulted in the large BIO. This problem however does not happen with scsi or
> null blk, kind-of proving that the regular bio add page is fine for zone append
> as long as the issuer has the correct zone append limit. Thought ?

A zone append limit larger than max_sectors is odd, and maybe the
block layer should assert something.  I think the root cause is that
many NVMe devices have a very large hardware equivalent to max_sectors
(the MDTS field), but Linux still uses a much lower limit due to memory
allocation issues (the PRPs used by NVMe are very inefficient in terms
of memory usage for larger transfers).  So we cap max_sectors to the
ѕoftware limit, but not max_zoned_append_sectors.

Zone Append needs some amount of special casing in the block layer
because the splitting of Zone Append bios must happen in the file system
as the file system needs a completion context per hardware operation.
I think the best way to do that is to first build up a maximum bio
and then use the same bio_split_rw function that the block layer would
use to split it to the hardware limits, just in the the issuer.  This
is what I did in btrfs, and it seems like zonefs actually needs to do
the same, but I missed that during review of the recent direct I/O
changes.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 13:51   ` Christoph Hellwig
@ 2023-07-31 14:01     ` Damien Le Moal
  2023-07-31 14:06       ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-31 14:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shin'ichiro Kawasaki, linux-nvme, Keith Busch,
	Johannes Thumshirn

On 7/31/23 22:51, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 09:03:46PM +0900, Damien Le Moal wrote:
>> I feel like a lot of the special casing for zone append bio add page can be
>> removed from the block layer. This issue was found with zonefs tests on real zns
>> devices because of this huge (and incorrect) zone append limit that zns has,
>> combined with the recent zonefs iomap write change which overlooked the fact
>> that bio add page is done by iomap before the bio op is set to zone append. That
>> resulted in the large BIO. This problem however does not happen with scsi or
>> null blk, kind-of proving that the regular bio add page is fine for zone append
>> as long as the issuer has the correct zone append limit. Thought ?
> 
> A zone append limit larger than max_sectors is odd, and maybe the
> block layer should assert something.  I think the root cause is that
> many NVMe devices have a very large hardware equivalent to max_sectors
> (the MDTS field), but Linux still uses a much lower limit due to memory
> allocation issues (the PRPs used by NVMe are very inefficient in terms
> of memory usage for larger transfers).  So we cap max_sectors to the
> ѕoftware limit, but not max_zoned_append_sectors.
> 
> Zone Append needs some amount of special casing in the block layer
> because the splitting of Zone Append bios must happen in the file system
> as the file system needs a completion context per hardware operation.
> I think the best way to do that is to first build up a maximum bio
> and then use the same bio_split_rw function that the block layer would
> use to split it to the hardware limits, just in the the issuer.  This
> is what I did in btrfs, and it seems like zonefs actually needs to do
> the same, but I missed that during review of the recent direct I/O
> changes.

We cannot do that in zonefs because there is no metadata to handle a possible
reordering of the fragments of a split large zone append. Hence zonefs limits
writes size to max zone append on entry and never tries to do larger writes. But
the ZNS limit bug resulted in the split. At least for zonefs, I think there is
no need to use the special bio add page since with a proper limit, we should
never see a split.


-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 13:46 ` Christoph Hellwig
@ 2023-07-31 14:02   ` Damien Le Moal
  2023-07-31 14:05     ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-31 14:02 UTC (permalink / raw)
  To: Christoph Hellwig, Shin'ichiro Kawasaki
  Cc: linux-nvme, Keith Busch, Johannes Thumshirn

On 7/31/23 22:46, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 08:46:32PM +0900, Shin'ichiro Kawasaki wrote:
>> To avoid the unexpected BIO split, reflect the max_segments limit to the
>> max_zone_append limit. In the worst case, the safe max_zone_append size
>> is max_segments multiplied by PAGE_SIZE. Compare it with the
>> max_zone_append size obtained from ZNS devices, and set the smaller
>> value as the max_zone_append limit.
> 
> This is true only for NVMe and a hand full of drivers that
> set the virt_boundary.  For others the maximum size is completely
> unrelated to the maximum number of segments.
> 
> Can you reword the commit log a bit to make that more clear?
> 
>> diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
>> index ec8557810c21..9ee77626c235 100644
>> --- a/drivers/nvme/host/zns.c
>> +++ b/drivers/nvme/host/zns.c
>> @@ -10,9 +10,11 @@
>>  int nvme_revalidate_zones(struct nvme_ns *ns)
>>  {
>>  	struct request_queue *q = ns->queue;
>> +	unsigned int max_sectors = queue_max_segments(q) << PAGE_SECTORS_SHIFT;
>>  
>>  	blk_queue_chunk_sectors(q, ns->zsze);
>> -	blk_queue_max_zone_append_sectors(q, ns->ctrl->max_zone_append);
>> +	max_sectors = min(max_sectors, ns->ctrl->max_zone_append);
>> +	blk_queue_max_zone_append_sectors(q, max_sectors);
> 
> And while this looks correct, it also feels pretty ugly.  Shouldn't a
> 
> 	blk_queue_max_zone_append_sectors(q,
> 		queue_max_sectors(q), ns->ctrl->max_zone_append);

blk_queue_max_zone_append_sectors() already does cap max zone append to
max_hw_sectors. We could also add the cap to queue_max_segments(q) <<
PAGE_SECTORS_SHIFT in that function as well given that all zoned devices should
have that anyway...

> 
> do the same thing in a somewhat more obvious way given that max_segments
> is already taken into account for the queue_max_sectors calculation?

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 14:02   ` Damien Le Moal
@ 2023-07-31 14:05     ` Christoph Hellwig
  2023-07-31 14:07       ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-07-31 14:05 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Shin'ichiro Kawasaki, linux-nvme,
	Keith Busch, Johannes Thumshirn

On Mon, Jul 31, 2023 at 11:02:35PM +0900, Damien Le Moal wrote:
> > 
> > 	blk_queue_max_zone_append_sectors(q,
> > 		queue_max_sectors(q), ns->ctrl->max_zone_append);
> 
> blk_queue_max_zone_append_sectors() already does cap max zone append to
> max_hw_sectors.

Then we should not need this..

> We could also add the cap to queue_max_segments(q) <<
> PAGE_SECTORS_SHIFT in that function as well given that all zoned devices should
> have that anyway...

No.  That's a NVMe specific thing.  Most SCSI HBAs don't have this
issue.  NVMe devices supporting SGLs don't really have it either, but
for that we'd need to commit to always using SGLs for non-tiny I/O,
which we should probably be doing.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 14:01     ` Damien Le Moal
@ 2023-07-31 14:06       ` Christoph Hellwig
  2023-07-31 14:12         ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-07-31 14:06 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Shin'ichiro Kawasaki, linux-nvme,
	Keith Busch, Johannes Thumshirn

On Mon, Jul 31, 2023 at 11:01:21PM +0900, Damien Le Moal wrote:
> We cannot do that in zonefs because there is no metadata to handle a possible
> reordering of the fragments of a split large zone append. Hence zonefs limits
> writes size to max zone append on entry and never tries to do larger writes. But
> the ZNS limit bug resulted in the split. At least for zonefs, I think there is
> no need to use the special bio add page since with a proper limit, we should
> never see a split.

Then we need to bring back something like the code removed in
8e81aa16a42169faae1ba15cd648cc8bb83eaa48, although I'd kinda hate that,
and I don't really say how that would protect against reordering either.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 14:05     ` Christoph Hellwig
@ 2023-07-31 14:07       ` Damien Le Moal
  2023-07-31 14:08         ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-31 14:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shin'ichiro Kawasaki, linux-nvme, Keith Busch,
	Johannes Thumshirn

On 7/31/23 23:05, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 11:02:35PM +0900, Damien Le Moal wrote:
>>>
>>> 	blk_queue_max_zone_append_sectors(q,
>>> 		queue_max_sectors(q), ns->ctrl->max_zone_append);
>>
>> blk_queue_max_zone_append_sectors() already does cap max zone append to
>> max_hw_sectors.
> 
> Then we should not need this..

The cap is not the soft limit. It is the large hw limit, which I think is mdts
as well for zns.

> 
>> We could also add the cap to queue_max_segments(q) <<
>> PAGE_SECTORS_SHIFT in that function as well given that all zoned devices should
>> have that anyway...
> 
> No.  That's a NVMe specific thing.  Most SCSI HBAs don't have this
> issue.  NVMe devices supporting SGLs don't really have it either, but
> for that we'd need to commit to always using SGLs for non-tiny I/O,
> which we should probably be doing.

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 14:07       ` Damien Le Moal
@ 2023-07-31 14:08         ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2023-07-31 14:08 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Shin'ichiro Kawasaki, linux-nvme,
	Keith Busch, Johannes Thumshirn

On Mon, Jul 31, 2023 at 11:07:30PM +0900, Damien Le Moal wrote:
> On 7/31/23 23:05, Christoph Hellwig wrote:
> > On Mon, Jul 31, 2023 at 11:02:35PM +0900, Damien Le Moal wrote:
> >>>
> >>> 	blk_queue_max_zone_append_sectors(q,
> >>> 		queue_max_sectors(q), ns->ctrl->max_zone_append);
> >>
> >> blk_queue_max_zone_append_sectors() already does cap max zone append to
> >> max_hw_sectors.
> > 
> > Then we should not need this..
> 
> The cap is not the soft limit. It is the large hw limit, which I think is mdts
> as well for zns.

Well, there shouldn't be a strict need to limit to the soft limit.
Although limiting WRITE and not ZONE_APPEND fields a bit weird.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 14:06       ` Christoph Hellwig
@ 2023-07-31 14:12         ` Damien Le Moal
  2023-07-31 14:26           ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Damien Le Moal @ 2023-07-31 14:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shin'ichiro Kawasaki, linux-nvme, Keith Busch,
	Johannes Thumshirn

On 7/31/23 23:06, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 11:01:21PM +0900, Damien Le Moal wrote:
>> We cannot do that in zonefs because there is no metadata to handle a possible
>> reordering of the fragments of a split large zone append. Hence zonefs limits
>> writes size to max zone append on entry and never tries to do larger writes. But
>> the ZNS limit bug resulted in the split. At least for zonefs, I think there is
>> no need to use the special bio add page since with a proper limit, we should
>> never see a split.
> 
> Then we need to bring back something like the code removed in
> 8e81aa16a42169faae1ba15cd648cc8bb83eaa48, although I'd kinda hate that,
> and I don't really say how that would protect against reordering either.

For zonefs, their is never any reordering because their is no split, as long as
max zone append sectors limit indicates the maximum size of a zone append BIO
that can be built without ever needing splitting.

So if we fix the zns limit, zonefs does not really need a fix, eventhough the
code would be a little weird as-is.

-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 14:12         ` Damien Le Moal
@ 2023-07-31 14:26           ` Christoph Hellwig
  2023-07-31 14:33             ` Damien Le Moal
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2023-07-31 14:26 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Shin'ichiro Kawasaki, linux-nvme,
	Keith Busch, Johannes Thumshirn

On Mon, Jul 31, 2023 at 11:12:43PM +0900, Damien Le Moal wrote:
> max zone append sectors limit indicates the maximum size of a zone append BIO
> that can be built without ever needing splitting.
> 
> So if we fix the zns limit, zonefs does not really need a fix, eventhough the
> code would be a little weird as-is.

You still need to fix splitting due to I/O layout and not just size.

E.g. if your do a writev() with two non-page aligned buffers (you'll
probably need a arm64 or power box to trigger it), you need to split
the bio even if it trivially fits into the size limit.  Similar for
things like max_segment_size.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] nvme: zns: limit max_zone_append by max_segments
  2023-07-31 14:26           ` Christoph Hellwig
@ 2023-07-31 14:33             ` Damien Le Moal
  0 siblings, 0 replies; 13+ messages in thread
From: Damien Le Moal @ 2023-07-31 14:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Shin'ichiro Kawasaki, linux-nvme, Keith Busch,
	Johannes Thumshirn

On 7/31/23 23:26, Christoph Hellwig wrote:
> On Mon, Jul 31, 2023 at 11:12:43PM +0900, Damien Le Moal wrote:
>> max zone append sectors limit indicates the maximum size of a zone append BIO
>> that can be built without ever needing splitting.
>>
>> So if we fix the zns limit, zonefs does not really need a fix, eventhough the
>> code would be a little weird as-is.
> 
> You still need to fix splitting due to I/O layout and not just size.
> 
> E.g. if your do a writev() with two non-page aligned buffers (you'll
> probably need a arm64 or power box to trigger it), you need to split
> the bio even if it trivially fits into the size limit.  Similar for
> things like max_segment_size.

Good point. So fix needed.


-- 
Damien Le Moal
Western Digital Research



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-07-31 14:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-31 11:46 [PATCH] nvme: zns: limit max_zone_append by max_segments Shin'ichiro Kawasaki
2023-07-31 12:03 ` Damien Le Moal
2023-07-31 13:51   ` Christoph Hellwig
2023-07-31 14:01     ` Damien Le Moal
2023-07-31 14:06       ` Christoph Hellwig
2023-07-31 14:12         ` Damien Le Moal
2023-07-31 14:26           ` Christoph Hellwig
2023-07-31 14:33             ` Damien Le Moal
2023-07-31 13:46 ` Christoph Hellwig
2023-07-31 14:02   ` Damien Le Moal
2023-07-31 14:05     ` Christoph Hellwig
2023-07-31 14:07       ` Damien Le Moal
2023-07-31 14:08         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox