linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Niklas Cassel <cassel@kernel.org>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, dm-devel@lists.linux.dev,
	Mike Snitzer <snitzer@kernel.org>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Benjamin Marzinski <bmarzins@redhat.com>
Subject: Re: [PATCH v4 1/3] block: Improve checks on zone resource limits
Date: Wed, 5 Jun 2024 19:25:46 +0200	[thread overview]
Message-ID: <ZmCfmlnoo7wjQLTg@ryzen.lan> (raw)
In-Reply-To: <20240605075144.153141-2-dlemoal@kernel.org>

On Wed, Jun 05, 2024 at 04:51:42PM +0900, Damien Le Moal wrote:
> Make sure that the zone resource limits of a zoned block device are
> correct by checking that:
> (a) If the device has a max active zones limit, make sure that the max
>     open zones limit is lower than the max active zones limit.
> (b) If the device has zone resource limits, check that the limits
>     values are lower than the number of sequential zones of the device.
>     If it is not, assume that the zoned device has no limits by setting
>     the limits to 0.
> 
> For (a), a check is added to blk_validate_zoned_limits() and an error
> returned if the max open zones limit exceeds the value of the max active
> zone limit (if there is one).
> 
> For (b), given that we need the number of sequential zones of the zoned
> device, this check is added to disk_update_zone_resources(). This is
> safe to do as that function is executed with the disk queue frozen and
> the check executed after queue_limits_start_update() which takes the
> queue limits lock. Of note is that the early return in this function
> for zoned devices that do not use zone write plugging (e.g. DM devices
> using native zone append) is moved to after the new check and adjustment
> of the zone resource limits so that the check applies to any zoned
> device.
> 
> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
> ---
>  block/blk-settings.c |  8 ++++++++
>  block/blk-zoned.c    | 20 ++++++++++++++++----
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index effeb9a639bb..474c709ea85b 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -80,6 +80,14 @@ static int blk_validate_zoned_limits(struct queue_limits *lim)
>  	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED)))
>  		return -EINVAL;
>  
> +	/*
> +	 * Given that active zones include open zones, the maximum number of
> +	 * open zones cannot be larger than the maximum numbber of active zones.

s/numbber/number/


> +	 */
> +	if (lim->max_active_zones &&
> +	    lim->max_open_zones > lim->max_active_zones)
> +		return -EINVAL;
> +
>  	if (lim->zone_write_granularity < lim->logical_block_size)
>  		lim->zone_write_granularity = lim->logical_block_size;
>  
> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
> index 52abebf56027..8f89705f5e1c 100644
> --- a/block/blk-zoned.c
> +++ b/block/blk-zoned.c
> @@ -1647,8 +1647,22 @@ static int disk_update_zone_resources(struct gendisk *disk,
>  		return -ENODEV;
>  	}
>  
> +	lim = queue_limits_start_update(q);
> +
> +	/*
> +	 * Some devices can advertize zone resource limits that are larger than
> +	 * the number of sequential zones of the zoned block device, e.g. a
> +	 * small ZNS namespace. For such case, assume that the zoned device has
> +	 * no zone resource limits.
> +	 */
> +	nr_seq_zones = disk->nr_zones - nr_conv_zones;
> +	if (lim.max_open_zones >= nr_seq_zones)
> +		lim.max_open_zones = 0;
> +	if (lim.max_active_zones >= nr_seq_zones)
> +		lim.max_active_zones = 0;
> +

Is this really correct to transform to no limits?

The MAR and MOR limits are defined in the I/O Command Set Specific Identify
Namespace Data Structure for the Zoned Namespace Command Set.

However, the user has no ability to control these limits themselves
during a namespace management create ns, or for the format command
(and this still seems to be the case in the latest ZNS spec 1.1d).

Which means that the controller has no way of knowing the number of
resources to allocate to each namespace.

Some (all?) controllers will right now simply report the same MAR/MOR
for all namespaces.


So if I use the namespace management command to create two small
zoned namespaces, the number of sequential zones might be smaller
than the limits in both namespaces, but could together be exceeding
the limit.

How is ignoring the limit that we got from the device better than
actually exposing the limit which we got from the device?

Since AFAICT, this also means that we will expose 0 to sysfs
instead of the value that the device reported.



Perhaps we should only do this optimization if:
- the device is not ZNS, or
- the device is ZNS and does not support NS management, or
- the device is ZNS and supports NS management and implements TP4115
  (Zoned Namespace Resource Management supported bit is set, even if
   that TP does not seem to be part of a Ratified ZNS version yet...)


Kind regards,
Niklas

  parent reply	other threads:[~2024-06-05 17:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05  7:51 [PATCH v4 0/3] Fix DM zone resource limits stacking Damien Le Moal
2024-06-05  7:51 ` [PATCH v4 1/3] block: Improve checks on zone resource limits Damien Le Moal
2024-06-05  7:54   ` Christoph Hellwig
2024-06-05 17:25   ` Niklas Cassel [this message]
2024-06-06  0:06     ` Damien Le Moal
2024-06-06  1:21       ` Niklas Cassel
2024-06-06  2:12         ` Damien Le Moal
2024-06-06  4:41           ` Christoph Hellwig
2024-06-05  7:51 ` [PATCH v4 2/3] dm: Improve zone resource limits handling Damien Le Moal
2024-06-05  7:55   ` Christoph Hellwig
2024-06-05 19:47   ` Benjamin Marzinski
2024-06-05 23:52     ` Damien Le Moal
2024-06-06  4:39       ` Christoph Hellwig
2024-06-06  5:48         ` Damien Le Moal
2024-06-05  7:51 ` [PATCH v4 3/3] dm: Remove unused macro DM_ZONE_INVALID_WP_OFST Damien Le Moal
2024-06-05  7:55   ` Christoph Hellwig
2024-06-05 19:50   ` Benjamin Marzinski
2024-06-05 12:40 ` [PATCH v4 0/3] Fix DM zone resource limits stacking Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmCfmlnoo7wjQLTg@ryzen.lan \
    --to=cassel@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bmarzins@redhat.com \
    --cc=dlemoal@kernel.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).