From: Benjamin Marzinski <bmarzins@redhat.com>
To: Damien Le Moal <dlemoal@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
Mike Snitzer <snitzer@redhat.com>, Jens Axboe <axboe@kernel.dk>,
dm-devel@lists.linux.dev, linux-block@vger.kernel.org,
Christoph Hellwig <hch@lst.de>
Subject: Re: [RFC PATCH 7/7] dm: allow devices to revalidate existing zones
Date: Mon, 10 Mar 2025 13:43:53 -0400 [thread overview]
Message-ID: <Z88k2RD6s5KpuxOD@redhat.com> (raw)
In-Reply-To: <7e0a1b47-3c96-4864-80b0-813f357845ad@kernel.org>
On Mon, Mar 10, 2025 at 08:59:26AM +0900, Damien Le Moal wrote:
> On 3/10/25 07:29, Benjamin Marzinski wrote:
> > dm_revalidate_zones() only allowed devices that had no zone resources
> > set up to call blk_revalidate_disk_zones(). If the device already had
> > zone resources, disk->nr_zones would always equal md->nr_zones so
> > dm_revalidate_zones() returned without doing any work. Instead, always
> > call blk_revalidate_disk_zones() if you are loading a new zoned table.
> >
> > However, if the device emulates zone append operations and already has
> > zone append emulation resources, the table size cannot change when
> > loading a new table. Otherwise, all those resources will be garbage.
> >
> > If emulated zone append operations are needed and the zone write pointer
> > offsets of the new table do not match those of the old table, writes to
> > the device will still fail. This patch allows users to safely grow and
> > shrink zone devices. But swapping arbitrary zoned tables will still not
> > work.
>
> I do not think that this patch correctly address the shrinking of dm zoned
> device: blk_revalidate_disk_zones() will look at a smaller set of zones, which
> will leave already hashed zone write plugs outside of that new zone range in the
> disk zwplug hash table. disk_revalidate_zone_resources() does not cleanup and
> reallocate the hash table if the number of zones shrinks.
This is necessary for DM. There could be plugged bios that are on on
these no longer in-range zones. They will obviously fail when they get
reissued, but we need to keep the plugs around so that they *do* get
reissued. A cleaner alternative would be to add code to immediately
error out all the plugged bios on shrinks, but I was trying to avoid
adding a bunch of code to deal with these cases (of course simply
disallowing them adds even less code).
-Ben
> For a physical drive,
> this can only happen if the drive is reformatted with some magic vendor unique
> command, which is why this was never implemented as that is not a valid
> production use case.
>
> To make things simpler, I think we should allow growing/shrinking zoned device
> tables, and much less swapping tables between zoned and not-zoned. I am more
> inclined to avoid all these corner cases by simply not supporting table
> switching for zoned device. That would be much safer I think.
> No-one complained about any issue with table switching until now, which likely
> means that no-one is using this. So what about simply returning an error for
> table switching for a zoned device ? If someone request this support, we can
> revisit this.
>
> >
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> > drivers/md/dm-zone.c | 23 +++++++++++++----------
> > 1 file changed, 13 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
> > index ac86011640c3..7e9ebeee7eac 100644
> > --- a/drivers/md/dm-zone.c
> > +++ b/drivers/md/dm-zone.c
> > @@ -164,16 +164,8 @@ int dm_revalidate_zones(struct dm_table *t, struct request_queue *q)
> > if (!get_capacity(disk))
> > return 0;
> >
> > - /* Revalidate only if something changed. */
> > - if (!disk->nr_zones || disk->nr_zones != md->nr_zones) {
> > - DMINFO("%s using %s zone append",
> > - disk->disk_name,
> > - queue_emulates_zone_append(q) ? "emulated" : "native");
> > - md->nr_zones = 0;
> > - }
> > -
> > - if (md->nr_zones)
> > - return 0;
> > + DMINFO("%s using %s zone append", disk->disk_name,
> > + queue_emulates_zone_append(q) ? "emulated" : "native");
> >
> > /*
> > * Our table is not live yet. So the call to dm_get_live_table()
> > @@ -392,6 +384,17 @@ int dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q,
> > return 0;
> > }
> >
> > + /*
> > + * If the device needs zone append emulation, and the device already has
> > + * zone append emulation resources, make sure that the chunk_sectors
> > + * hasn't changed size. Otherwise those resources will be garbage.
> > + */
> > + if (!lim->max_hw_zone_append_sectors && disk->zone_wplugs_hash &&
> > + q->limits.chunk_sectors != lim->chunk_sectors) {
> > + DMERR("Cannot change zone size when swapping tables");
> > + return -EINVAL;
> > + }
> > +
> > /*
> > * Warn once (when the capacity is not yet set) if the mapped device is
> > * partially using zone resources of the target devices as that leads to
>
>
> --
> Damien Le Moal
> Western Digital Research
next prev parent reply other threads:[~2025-03-10 17:43 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-09 22:28 [RFC PATCH 0/7] dm: fix issues with swapping dm tables Benjamin Marzinski
2025-03-09 22:28 ` [PATCH 1/7] dm: don't change md if dm_table_set_restrictions() fails Benjamin Marzinski
2025-03-09 23:18 ` Damien Le Moal
2025-03-10 16:39 ` Benjamin Marzinski
2025-03-09 22:28 ` [PATCH 2/7] dm: free table mempools if not used in __bind Benjamin Marzinski
2025-03-09 23:19 ` Damien Le Moal
2025-03-09 22:28 ` [PATCH 3/7] dm: handle failures in dm_table_set_restrictions Benjamin Marzinski
2025-03-09 23:25 ` Damien Le Moal
2025-03-10 17:37 ` Benjamin Marzinski
2025-03-10 18:15 ` Benjamin Marzinski
2025-03-10 23:27 ` Damien Le Moal
2025-03-14 13:38 ` Mikulas Patocka
2025-03-14 13:46 ` Mikulas Patocka
2025-03-10 23:16 ` Damien Le Moal
2025-03-09 22:29 ` [PATCH 4/7] dm: fix dm_blk_report_zones Benjamin Marzinski
2025-03-09 23:27 ` Damien Le Moal
2025-03-09 22:29 ` [PATCH 5/7] blk-zoned: clean up zone settings for devices without zwplugs Benjamin Marzinski
2025-03-09 23:31 ` Damien Le Moal
2025-03-09 22:29 ` [RFC PATCH 6/7] blk-zoned: modify blk_revalidate_disk_zones for bio-based drivers Benjamin Marzinski
2025-03-09 22:29 ` [RFC PATCH 7/7] dm: allow devices to revalidate existing zones Benjamin Marzinski
2025-03-09 23:59 ` Damien Le Moal
2025-03-10 17:43 ` Benjamin Marzinski [this message]
2025-03-10 23:19 ` Damien Le Moal
2025-03-10 23:42 ` Benjamin Marzinski
2025-03-11 0:00 ` Damien Le Moal
2025-03-09 23:16 ` [RFC PATCH 0/7] dm: fix issues with swapping dm tables Damien Le Moal
2025-03-10 16:38 ` Benjamin Marzinski
2025-03-10 23:13 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z88k2RD6s5KpuxOD@redhat.com \
--to=bmarzins@redhat.com \
--cc=axboe@kernel.dk \
--cc=dlemoal@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=mpatocka@redhat.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.