All of lore.kernel.org
 help / color / mirror / Atom feed
From: Niklas Cassel <cassel@kernel.org>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, dm-devel@lists.linux.dev,
	Mike Snitzer <snitzer@kernel.org>,
	Mikulas Patocka <mpatocka@redhat.com>,
	Christoph Hellwig <hch@lst.de>,
	Benjamin Marzinski <bmarzins@redhat.com>
Subject: Re: [PATCH v4 1/3] block: Improve checks on zone resource limits
Date: Thu, 6 Jun 2024 03:21:26 +0200	[thread overview]
Message-ID: <ZmEPFn9tvZb95fgz@ryzen.lan> (raw)
In-Reply-To: <2e8b1334-61a1-4c1c-a4f7-9550e32e7be6@kernel.org>

On Thu, Jun 06, 2024 at 09:06:58AM +0900, Damien Le Moal wrote:
> On 6/6/24 2:25 AM, Niklas Cassel wrote:
> > On Wed, Jun 05, 2024 at 04:51:42PM +0900, Damien Le Moal wrote:
> 
> The problem you are raising is the reliability of the limits themselves, and
> for NVMe ZNS, given that MOR/MAR are not defined per namespace, we are in the
> same situation as with DM devices sharing the same zoned block dev through
> different targets: even if the user respects the limits, write errors may
> happen due to the backing dev limits (or controller limits for ZNS) being
> exceeded. Nothing much we can do to easily deal with this right now. We would
> need to constantly track zone states and implement a software driven zone state
> machine checking the limits all the time to actually provide guarantees.
> 
> > Since AFAICT, this also means that we will expose 0 to sysfs
> > instead of the value that the device reported.
> 
> Yes. But the value reported by the device is for the whole controller. The
> sysfs attributes are for the block device == namespace.

The limits are defined in the I/O Command Set Specific Identify Namespace
Data Structure for the Zoned Namespace Command Set, so they are per NS,
otherwise they would have been defined in the I/O Command Set Specific
Identify Controller Data Structure for the Zoned Namespace Command Set.


> 
> > Perhaps we should only do this optimization if:
> > - the device is not ZNS, or
> > - the device is ZNS and does not support NS management, or
> > - the device is ZNS and supports NS management and implements TP4115
> >   (Zoned Namespace Resource Management supported bit is set, even if
> >    that TP does not seem to be part of a Ratified ZNS version yet...)
> 
> Right now, this all works the same way for DM and nvme zns, so I think this is
> all good. If anything, we should probably add a warning in the nvme driver
> about the potentially unreliable moz/moz limits if we see a ZNS device with
> multiple zoned namespaces.

Well, it is only a problem for ZNS devices with NS management.

If there are two ZNS namespaces on the device, and the device does not
support NS management, the device vendor would have been seriously silly
to not allocate and set the limits in the I/O Command Set Specific Identify
Namespace Data Structure for the Zoned Namespace Command Set correctly.

But yes, this concern cannot be solved in disk_update_zone_resources(),
which operates on per gendisk (and there is one gendisk per namespace),
so not much this function can do. If we were to do something, it would
have to be done in the nvme driver.


Perhaps if the device is ZNS, and does support NS management, but does
not have the Zoned Namespace Resource Management supported bit is set,
divide the MAR/MOR values reported by each namespace by the number of
ZNS namespaces?


Kind regards,
Niklas

  reply	other threads:[~2024-06-06  1:21 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-05  7:51 [PATCH v4 0/3] Fix DM zone resource limits stacking Damien Le Moal
2024-06-05  7:51 ` [PATCH v4 1/3] block: Improve checks on zone resource limits Damien Le Moal
2024-06-05  7:54   ` Christoph Hellwig
2024-06-05 17:25   ` Niklas Cassel
2024-06-06  0:06     ` Damien Le Moal
2024-06-06  1:21       ` Niklas Cassel [this message]
2024-06-06  2:12         ` Damien Le Moal
2024-06-06  4:41           ` Christoph Hellwig
2024-06-05  7:51 ` [PATCH v4 2/3] dm: Improve zone resource limits handling Damien Le Moal
2024-06-05  7:55   ` Christoph Hellwig
2024-06-05 19:47   ` Benjamin Marzinski
2024-06-05 23:52     ` Damien Le Moal
2024-06-06  4:39       ` Christoph Hellwig
2024-06-06  5:48         ` Damien Le Moal
2024-06-05  7:51 ` [PATCH v4 3/3] dm: Remove unused macro DM_ZONE_INVALID_WP_OFST Damien Le Moal
2024-06-05  7:55   ` Christoph Hellwig
2024-06-05 19:50   ` Benjamin Marzinski
2024-06-05 12:40 ` [PATCH v4 0/3] Fix DM zone resource limits stacking Johannes Thumshirn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZmEPFn9tvZb95fgz@ryzen.lan \
    --to=cassel@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bmarzins@redhat.com \
    --cc=dlemoal@kernel.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=mpatocka@redhat.com \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.