From: Mike Snitzer <snitzer@redhat.com>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>,
axboe@kernel.dk, bvanassche@acm.org, pankydev8@gmail.com,
gost.dev@samsung.com, snitzer@kernel.org,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-block@vger.kernel.org, dm-devel@redhat.com,
matias.bjorling@wdc.com, Johannes.Thumshirn@wdc.com,
jaegeuk@kernel.org, hch@lst.de, agk@redhat.com
Subject: Re: [dm-devel] Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes]
Date: Thu, 22 Sep 2022 15:37:01 -0400 [thread overview]
Message-ID: <Yyy5XUUWGkU8B3IP@redhat.com> (raw)
In-Reply-To: <7dd9dbc0-b08b-fa47-5452-d448d86ca56b@opensource.wdc.com>
On Wed, Sep 21 2022 at 7:55P -0400,
Damien Le Moal <damien.lemoal@opensource.wdc.com> wrote:
> On 9/22/22 02:27, Mike Snitzer wrote:
> > On Tue, Sep 20 2022 at 5:11P -0400,
> > Pankaj Raghav <p.raghav@samsung.com> wrote:
> >
> >> - Background and Motivation:
> >>
> >> The zone storage implementation in Linux, introduced since v4.10, first
> >> targetted SMR drives which have a power of 2 (po2) zone size alignment
> >> requirement. The po2 zone size was further imposed implicitly by the
> >> block layer's blk_queue_chunk_sectors(), used to prevent IO merging
> >> across chunks beyond the specified size, since v3.16 through commit
> >> 762380ad9322 ("block: add notion of a chunk size for request merging").
> >> But this same general block layer po2 requirement for blk_queue_chunk_sectors()
> >> was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors'
> >> to be non-power-of-2").
> >>
> >> NAND, which is the media used in newer zoned storage devices, does not
> >> naturally align to po2. In these devices, zone capacity(cap) is not the
> >> same as the po2 zone size. When the zone cap != zone size, then unmapped
> >> LBAs are introduced to cover the space between the zone cap and zone size.
> >> po2 requirement does not make sense for these type of zone storage devices.
> >> This patch series aims to remove these unmapped LBAs for zoned devices when
> >> zone cap is npo2. This is done by relaxing the po2 zone size constraint
> >> in the kernel and allowing zoned device with npo2 zone sizes if zone cap
> >> == zone size.
> >>
> >> Removing the po2 requirement from zone storage should be possible
> >> now provided that no userspace regression and no performance regressions are
> >> introduced. Stop-gap patches have been already merged into f2fs-tools to
> >> proactively not allow npo2 zone sizes until proper support is added [1].
> >>
> >> There were two efforts previously to add support to npo2 devices: 1) via
> >> device level emulation [2] but that was rejected with a final conclusion
> >> to add support for non po2 zoned device in the complete stack[3] 2)
> >> adding support to the complete stack by removing the constraint in the
> >> block layer and NVMe layer with support to btrfs, zonefs, etc which was
> >> rejected with a conclusion to add a dm target for FS support [0]
> >> to reduce the regression impact.
> >>
> >> This series adds support to npo2 zoned devices in the block and nvme
> >> layer and a new **dm target** is added: dm-po2zoned-target. This new
> >> target will be initially used for filesystems such as btrfs and
> >> f2fs until native npo2 zone support is added.
> >
> > As this patchset nears the point of being "ready for merge" and DM's
> > "zoned" oriented targets are multiplying, I need to understand: where
> > are we collectively going? How long are we expecting to support the
> > "stop-gap zoned storage" layers we've constructed?
> >
> > I know https://zonedstorage.io/docs/introduction exists... but it
> > _seems_ stale given the emergence of ZNS and new permutations of zoned
> > hardware. Maybe that isn't quite fair (it does cover A LOT!) but I'm
> > still left wanting (e.g. "bring it all home for me!")...
> >
> > Damien, as the most "zoned storage" oriented engineer I know, can you
> > please kick things off by shedding light on where Linux is now, and
> > where it's going, for "zoned storage"?
>
> Let me first start with what we have seen so far with deployments in the
> field.
<snip>
Thanks for all your insights on zoned storage, very appreciated!
> > In addition, it was my understanding that WDC had yet another zoned DM
> > target called "dm-zap" that is for ZNS based devices... It's all a bit
> > messy in my head (that's on me for not keeping up, but I think we need
> > a recap!)
>
> Since the ZNS specification does not define conventional zones, dm-zoned
> cannot be used as a standalone DM target (read: single block device) with
> NVMe zoned block devices. Furthermore, due to its block mapping scheme,
> dm-zoned does not support devices with zones that have a capacity lower
> than the zone size. So ZNS is really a big *no* for dm-zoned. dm-zap is a
> prototype and in a nutshell is the equivalent of dm-zoned for ZNS. dm-zap
> can deal with the smaller zone capacity and does not require conventional
> zones. We are not trying to push for dm-zap to be merged for now as we are
> still evaluating its potential use cases. We also have a different but
> functionally equivalent approach implemented as a block device driver that
> we are evaluating internally.
>
> Given the above mentioned usage pattern we have seen so far for zoned
> storage, it is not yet clear if something like dm-zap for ZNS is needed
> beside some niche use cases.
OK, good to know. I do think dm-zoned should be trained to _not_
allow use with ZNS NVMe devices (maybe that is in place and I just
missed it?). Because there is some confusion with at least one
customer that is asserting dm-zoned is somehow enabling them to use
ZNS NVMe devices!
Maybe they somehow don't _need_ conventional zones (writes are handled
by some other layer? and dm-zoned access is confined to read only)!?
And might they also be using ZNS NVMe devices to do _not_ have a
zone capacity lower than the zone size?
Or maybe they are mistaken and we should ask more specific questions
of them?
> > So please help me, and others, become more informed as quickly as
> > possible! ;)
>
> I hope the above helps. If you want me to develop further any of the
> points above, feel free to let me know.
You've been extremely helpful, thanks!
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>,
agk@redhat.com, snitzer@kernel.org, axboe@kernel.dk, hch@lst.de,
bvanassche@acm.org, pankydev8@gmail.com, gost.dev@samsung.com,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-block@vger.kernel.org, dm-devel@redhat.com,
Johannes.Thumshirn@wdc.com, jaegeuk@kernel.org,
matias.bjorling@wdc.com
Subject: Re: Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes]
Date: Thu, 22 Sep 2022 15:37:01 -0400 [thread overview]
Message-ID: <Yyy5XUUWGkU8B3IP@redhat.com> (raw)
In-Reply-To: <7dd9dbc0-b08b-fa47-5452-d448d86ca56b@opensource.wdc.com>
On Wed, Sep 21 2022 at 7:55P -0400,
Damien Le Moal <damien.lemoal@opensource.wdc.com> wrote:
> On 9/22/22 02:27, Mike Snitzer wrote:
> > On Tue, Sep 20 2022 at 5:11P -0400,
> > Pankaj Raghav <p.raghav@samsung.com> wrote:
> >
> >> - Background and Motivation:
> >>
> >> The zone storage implementation in Linux, introduced since v4.10, first
> >> targetted SMR drives which have a power of 2 (po2) zone size alignment
> >> requirement. The po2 zone size was further imposed implicitly by the
> >> block layer's blk_queue_chunk_sectors(), used to prevent IO merging
> >> across chunks beyond the specified size, since v3.16 through commit
> >> 762380ad9322 ("block: add notion of a chunk size for request merging").
> >> But this same general block layer po2 requirement for blk_queue_chunk_sectors()
> >> was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors'
> >> to be non-power-of-2").
> >>
> >> NAND, which is the media used in newer zoned storage devices, does not
> >> naturally align to po2. In these devices, zone capacity(cap) is not the
> >> same as the po2 zone size. When the zone cap != zone size, then unmapped
> >> LBAs are introduced to cover the space between the zone cap and zone size.
> >> po2 requirement does not make sense for these type of zone storage devices.
> >> This patch series aims to remove these unmapped LBAs for zoned devices when
> >> zone cap is npo2. This is done by relaxing the po2 zone size constraint
> >> in the kernel and allowing zoned device with npo2 zone sizes if zone cap
> >> == zone size.
> >>
> >> Removing the po2 requirement from zone storage should be possible
> >> now provided that no userspace regression and no performance regressions are
> >> introduced. Stop-gap patches have been already merged into f2fs-tools to
> >> proactively not allow npo2 zone sizes until proper support is added [1].
> >>
> >> There were two efforts previously to add support to npo2 devices: 1) via
> >> device level emulation [2] but that was rejected with a final conclusion
> >> to add support for non po2 zoned device in the complete stack[3] 2)
> >> adding support to the complete stack by removing the constraint in the
> >> block layer and NVMe layer with support to btrfs, zonefs, etc which was
> >> rejected with a conclusion to add a dm target for FS support [0]
> >> to reduce the regression impact.
> >>
> >> This series adds support to npo2 zoned devices in the block and nvme
> >> layer and a new **dm target** is added: dm-po2zoned-target. This new
> >> target will be initially used for filesystems such as btrfs and
> >> f2fs until native npo2 zone support is added.
> >
> > As this patchset nears the point of being "ready for merge" and DM's
> > "zoned" oriented targets are multiplying, I need to understand: where
> > are we collectively going? How long are we expecting to support the
> > "stop-gap zoned storage" layers we've constructed?
> >
> > I know https://zonedstorage.io/docs/introduction exists... but it
> > _seems_ stale given the emergence of ZNS and new permutations of zoned
> > hardware. Maybe that isn't quite fair (it does cover A LOT!) but I'm
> > still left wanting (e.g. "bring it all home for me!")...
> >
> > Damien, as the most "zoned storage" oriented engineer I know, can you
> > please kick things off by shedding light on where Linux is now, and
> > where it's going, for "zoned storage"?
>
> Let me first start with what we have seen so far with deployments in the
> field.
<snip>
Thanks for all your insights on zoned storage, very appreciated!
> > In addition, it was my understanding that WDC had yet another zoned DM
> > target called "dm-zap" that is for ZNS based devices... It's all a bit
> > messy in my head (that's on me for not keeping up, but I think we need
> > a recap!)
>
> Since the ZNS specification does not define conventional zones, dm-zoned
> cannot be used as a standalone DM target (read: single block device) with
> NVMe zoned block devices. Furthermore, due to its block mapping scheme,
> dm-zoned does not support devices with zones that have a capacity lower
> than the zone size. So ZNS is really a big *no* for dm-zoned. dm-zap is a
> prototype and in a nutshell is the equivalent of dm-zoned for ZNS. dm-zap
> can deal with the smaller zone capacity and does not require conventional
> zones. We are not trying to push for dm-zap to be merged for now as we are
> still evaluating its potential use cases. We also have a different but
> functionally equivalent approach implemented as a block device driver that
> we are evaluating internally.
>
> Given the above mentioned usage pattern we have seen so far for zoned
> storage, it is not yet clear if something like dm-zap for ZNS is needed
> beside some niche use cases.
OK, good to know. I do think dm-zoned should be trained to _not_
allow use with ZNS NVMe devices (maybe that is in place and I just
missed it?). Because there is some confusion with at least one
customer that is asserting dm-zoned is somehow enabling them to use
ZNS NVMe devices!
Maybe they somehow don't _need_ conventional zones (writes are handled
by some other layer? and dm-zoned access is confined to read only)!?
And might they also be using ZNS NVMe devices to do _not_ have a
zone capacity lower than the zone size?
Or maybe they are mistaken and we should ask more specific questions
of them?
> > So please help me, and others, become more informed as quickly as
> > possible! ;)
>
> I hope the above helps. If you want me to develop further any of the
> points above, feel free to let me know.
You've been extremely helpful, thanks!
next prev parent reply other threads:[~2022-09-22 19:37 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20220920091120eucas1p2c82c18f552d6298d24547cba2f70b7fc@eucas1p2.samsung.com>
2022-09-20 9:11 ` [dm-devel] [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 01/13] block: make bdev_nr_zones and disk_zone_no generic for npo2 zone size Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 02/13] block: rearrange bdev_{is_zoned, zone_sectors, get_queue} helper in blkdev.h Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 02/13] block: rearrange bdev_{is_zoned,zone_sectors,get_queue} " Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 03/13] block: allow blk-zoned devices to have non-power-of-2 zone size Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 04/13] nvmet: Allow ZNS target to support non-power_of_2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 05/13] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 06/13] null_blk: allow zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 07/13] zonefs: allow non power of 2 zoned devices Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 08/13] dm-zoned: ensure only power of 2 zone sizes are allowed Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 09/13] dm-zone: use generic helpers to calculate offset from zone start Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 10/13] dm-table: allow zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 11/13] dm: call dm_zone_endio after the target endio callback for zoned devices Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 12/13] dm: introduce DM_EMULATED_ZONES target feature flag Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-20 9:11 ` [dm-devel] [PATCH v14 13/13] dm: add power-of-2 target for zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` Pankaj Raghav
2022-09-21 16:37 ` [dm-devel] " Mike Snitzer
2022-09-21 16:37 ` Mike Snitzer
2022-09-21 17:32 ` [dm-devel] " Pankaj Raghav
2022-09-21 17:32 ` Pankaj Raghav
2022-09-21 17:27 ` [dm-devel] Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes] Mike Snitzer
2022-09-21 17:27 ` Mike Snitzer
2022-09-21 23:55 ` [dm-devel] " Damien Le Moal
2022-09-21 23:55 ` Damien Le Moal
2022-09-22 11:53 ` [dm-devel] " Pankaj Raghav
2022-09-22 11:53 ` Pankaj Raghav
2022-09-22 19:37 ` Mike Snitzer [this message]
2022-09-22 19:37 ` Mike Snitzer
2022-09-22 21:49 ` [dm-devel] " Damien Le Moal
2022-09-22 21:49 ` Damien Le Moal
2022-09-22 23:56 ` [dm-devel] " Bart Van Assche
2022-09-22 23:56 ` Bart Van Assche
2022-09-23 6:29 ` [dm-devel] " Matias Bjørling
2022-09-23 6:29 ` Matias Bjørling
2022-09-23 16:19 ` [dm-devel] " Bart Van Assche
2022-09-23 16:19 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yyy5XUUWGkU8B3IP@redhat.com \
--to=snitzer@redhat.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dm-devel@redhat.com \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jaegeuk@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=matias.bjorling@wdc.com \
--cc=p.raghav@samsung.com \
--cc=pankydev8@gmail.com \
--cc=snitzer@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.