From: Mike Snitzer <snitzer@redhat.com>
To: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Pankaj Raghav <p.raghav@samsung.com>,
agk@redhat.com, snitzer@kernel.org, axboe@kernel.dk, hch@lst.de,
bvanassche@acm.org, pankydev8@gmail.com, gost.dev@samsung.com,
linux-kernel@vger.kernel.org, linux-nvme@lists.infradead.org,
linux-block@vger.kernel.org, dm-devel@redhat.com,
Johannes.Thumshirn@wdc.com, jaegeuk@kernel.org,
matias.bjorling@wdc.com
Subject: Re: Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes]
Date: Thu, 22 Sep 2022 15:37:01 -0400 [thread overview]
Message-ID: <Yyy5XUUWGkU8B3IP@redhat.com> (raw)
In-Reply-To: <7dd9dbc0-b08b-fa47-5452-d448d86ca56b@opensource.wdc.com>
On Wed, Sep 21 2022 at 7:55P -0400,
Damien Le Moal <damien.lemoal@opensource.wdc.com> wrote:
> On 9/22/22 02:27, Mike Snitzer wrote:
> > On Tue, Sep 20 2022 at 5:11P -0400,
> > Pankaj Raghav <p.raghav@samsung.com> wrote:
> >
> >> - Background and Motivation:
> >>
> >> The zone storage implementation in Linux, introduced since v4.10, first
> >> targetted SMR drives which have a power of 2 (po2) zone size alignment
> >> requirement. The po2 zone size was further imposed implicitly by the
> >> block layer's blk_queue_chunk_sectors(), used to prevent IO merging
> >> across chunks beyond the specified size, since v3.16 through commit
> >> 762380ad9322 ("block: add notion of a chunk size for request merging").
> >> But this same general block layer po2 requirement for blk_queue_chunk_sectors()
> >> was removed on v5.10 through commit 07d098e6bbad ("block: allow 'chunk_sectors'
> >> to be non-power-of-2").
> >>
> >> NAND, which is the media used in newer zoned storage devices, does not
> >> naturally align to po2. In these devices, zone capacity(cap) is not the
> >> same as the po2 zone size. When the zone cap != zone size, then unmapped
> >> LBAs are introduced to cover the space between the zone cap and zone size.
> >> po2 requirement does not make sense for these type of zone storage devices.
> >> This patch series aims to remove these unmapped LBAs for zoned devices when
> >> zone cap is npo2. This is done by relaxing the po2 zone size constraint
> >> in the kernel and allowing zoned device with npo2 zone sizes if zone cap
> >> == zone size.
> >>
> >> Removing the po2 requirement from zone storage should be possible
> >> now provided that no userspace regression and no performance regressions are
> >> introduced. Stop-gap patches have been already merged into f2fs-tools to
> >> proactively not allow npo2 zone sizes until proper support is added [1].
> >>
> >> There were two efforts previously to add support to npo2 devices: 1) via
> >> device level emulation [2] but that was rejected with a final conclusion
> >> to add support for non po2 zoned device in the complete stack[3] 2)
> >> adding support to the complete stack by removing the constraint in the
> >> block layer and NVMe layer with support to btrfs, zonefs, etc which was
> >> rejected with a conclusion to add a dm target for FS support [0]
> >> to reduce the regression impact.
> >>
> >> This series adds support to npo2 zoned devices in the block and nvme
> >> layer and a new **dm target** is added: dm-po2zoned-target. This new
> >> target will be initially used for filesystems such as btrfs and
> >> f2fs until native npo2 zone support is added.
> >
> > As this patchset nears the point of being "ready for merge" and DM's
> > "zoned" oriented targets are multiplying, I need to understand: where
> > are we collectively going? How long are we expecting to support the
> > "stop-gap zoned storage" layers we've constructed?
> >
> > I know https://zonedstorage.io/docs/introduction exists... but it
> > _seems_ stale given the emergence of ZNS and new permutations of zoned
> > hardware. Maybe that isn't quite fair (it does cover A LOT!) but I'm
> > still left wanting (e.g. "bring it all home for me!")...
> >
> > Damien, as the most "zoned storage" oriented engineer I know, can you
> > please kick things off by shedding light on where Linux is now, and
> > where it's going, for "zoned storage"?
>
> Let me first start with what we have seen so far with deployments in the
> field.
<snip>
Thanks for all your insights on zoned storage, very appreciated!
> > In addition, it was my understanding that WDC had yet another zoned DM
> > target called "dm-zap" that is for ZNS based devices... It's all a bit
> > messy in my head (that's on me for not keeping up, but I think we need
> > a recap!)
>
> Since the ZNS specification does not define conventional zones, dm-zoned
> cannot be used as a standalone DM target (read: single block device) with
> NVMe zoned block devices. Furthermore, due to its block mapping scheme,
> dm-zoned does not support devices with zones that have a capacity lower
> than the zone size. So ZNS is really a big *no* for dm-zoned. dm-zap is a
> prototype and in a nutshell is the equivalent of dm-zoned for ZNS. dm-zap
> can deal with the smaller zone capacity and does not require conventional
> zones. We are not trying to push for dm-zap to be merged for now as we are
> still evaluating its potential use cases. We also have a different but
> functionally equivalent approach implemented as a block device driver that
> we are evaluating internally.
>
> Given the above mentioned usage pattern we have seen so far for zoned
> storage, it is not yet clear if something like dm-zap for ZNS is needed
> beside some niche use cases.
OK, good to know. I do think dm-zoned should be trained to _not_
allow use with ZNS NVMe devices (maybe that is in place and I just
missed it?). Because there is some confusion with at least one
customer that is asserting dm-zoned is somehow enabling them to use
ZNS NVMe devices!
Maybe they somehow don't _need_ conventional zones (writes are handled
by some other layer? and dm-zoned access is confined to read only)!?
And might they also be using ZNS NVMe devices to do _not_ have a
zone capacity lower than the zone size?
Or maybe they are mistaken and we should ask more specific questions
of them?
> > So please help me, and others, become more informed as quickly as
> > possible! ;)
>
> I hope the above helps. If you want me to develop further any of the
> points above, feel free to let me know.
You've been extremely helpful, thanks!
next prev parent reply other threads:[~2022-09-22 19:37 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20220920091120eucas1p2c82c18f552d6298d24547cba2f70b7fc@eucas1p2.samsung.com>
2022-09-20 9:11 ` [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 01/13] block: make bdev_nr_zones and disk_zone_no generic for npo2 zone size Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 02/13] block: rearrange bdev_{is_zoned,zone_sectors,get_queue} helper in blkdev.h Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 03/13] block: allow blk-zoned devices to have non-power-of-2 zone size Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 04/13] nvmet: Allow ZNS target to support non-power_of_2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 05/13] nvme: zns: Allow ZNS drives that have non-power_of_2 zone size Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 06/13] null_blk: allow zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 07/13] zonefs: allow non power of 2 zoned devices Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 08/13] dm-zoned: ensure only power of 2 zone sizes are allowed Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 09/13] dm-zone: use generic helpers to calculate offset from zone start Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 10/13] dm-table: allow zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 11/13] dm: call dm_zone_endio after the target endio callback for zoned devices Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 12/13] dm: introduce DM_EMULATED_ZONES target feature flag Pankaj Raghav
2022-09-20 9:11 ` [PATCH v14 13/13] dm: add power-of-2 target for zoned devices with non power-of-2 zone sizes Pankaj Raghav
2022-09-21 16:37 ` Mike Snitzer
2022-09-21 17:32 ` Pankaj Raghav
2022-09-21 17:27 ` Please further explain Linux's "zoned storage" roadmap [was: Re: [PATCH v14 00/13] support zoned block devices with non-power-of-2 zone sizes] Mike Snitzer
2022-09-21 23:55 ` Damien Le Moal
2022-09-22 11:53 ` Pankaj Raghav
2022-09-22 19:37 ` Mike Snitzer [this message]
2022-09-22 21:49 ` Damien Le Moal
2022-09-22 23:56 ` Bart Van Assche
2022-09-23 6:29 ` Matias Bjørling
2022-09-23 16:19 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yyy5XUUWGkU8B3IP@redhat.com \
--to=snitzer@redhat.com \
--cc=Johannes.Thumshirn@wdc.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dm-devel@redhat.com \
--cc=gost.dev@samsung.com \
--cc=hch@lst.de \
--cc=jaegeuk@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=matias.bjorling@wdc.com \
--cc=p.raghav@samsung.com \
--cc=pankydev8@gmail.com \
--cc=snitzer@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox