From: Mike Snitzer <snitzer@redhat.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
"hare@suse.de" <hare@suse.de>,
"axboe@kernel.dk" <axboe@kernel.dk>,
"jaegeuk@kernel.org" <jaegeuk@kernel.org>,
"yuchao0@huawei.com" <yuchao0@huawei.com>,
"ghe@suse.com" <ghe@suse.com>,
"mwilck@suse.com" <mwilck@suse.com>,
"tchvatal@suse.com" <tchvatal@suse.com>,
"zren@suse.com" <zren@suse.com>,
"agk@redhat.com" <agk@redhat.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: dm-zoned-tools: add zoned disk udev rules for scheduler / dmsetup
Date: Fri, 15 Jun 2018 10:50:37 -0400 [thread overview]
Message-ID: <20180615145037.GB1386@redhat.com> (raw)
In-Reply-To: <3dca87db-d8ce-5229-0fd9-939501bc6b3a@wdc.com>
On Fri, Jun 15 2018 at 5:59am -0400,
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> Mike,
>
> On 6/15/18 02:58, Mike Snitzer wrote:
> > On Thu, Jun 14 2018 at 1:37pm -0400,
> > Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >
> >> On Thu, Jun 14, 2018 at 08:38:06AM -0400, Mike Snitzer wrote:
> >>> On Wed, Jun 13 2018 at 8:11pm -0400,
> >>> Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >>>
> >>>> Setting up a zoned disks in a generic form is not so trivial. There
> >>>> is also quite a bit of tribal knowledge with these devices which is not
> >>>> easy to find.
> >>>>
> >>>> The currently supplied demo script works but it is not generic enough to be
> >>>> practical for Linux distributions or even developers which often move
> >>>> from one kernel to another.
> >>>>
> >>>> This tries to put a bit of this tribal knowledge into an initial udev
> >>>> rule for development with the hopes Linux distributions can later
> >>>> deploy. Three rule are added. One rule is optional for now, it should be
> >>>> extended later to be more distribution-friendly and then I think this
> >>>> may be ready for consideration for integration on distributions.
> >>>>
> >>>> 1) scheduler setup
> >>>
> >>> This is wrong.. if zoned devices are so dependent on deadline or
> >>> mq-deadline then the kernel should allow them to be hardcoded. I know
> >>> Jens removed the API to do so but the fact that drivers need to rely on
> >>> hacks like this udev rule to get a functional device is proof we need to
> >>> allow drivers to impose the scheduler used.
> >>
> >> This is the point to the patch as well, I actually tend to agree with you,
> >> and I had tried to draw up a patch to do just that, however its *not* possible
> >> today to do this and would require some consensus. So from what I can tell
> >> we *have* to live with this one or a form of it. Ie a file describing which
> >> disk serial gets deadline and which one gets mq-deadline.
> >>
> >> Jens?
> >>
> >> Anyway, let's assume this is done in the kernel, which one would use deadline,
> >> which one would use mq-deadline?
> >
> > The zoned storage driver needs to make that call based on what mode it
> > is in. If it is using blk-mq then it selects mq-deadline, otherwise
> > deadline.
>
> As Bart pointed out, deadline is an alias of mq-deadline. So using
> "deadline" as the scheduler name works in both legacy and mq cases.
>
> >>>> 2) backlist f2fs devices
> >>>
> >>> There should porbably be support in dm-zoned for detecting whether a
> >>> zoned device was formatted with f2fs (assuming there is a known f2fs
> >>> superblock)?
> >>
> >> Not sure what you mean. Are you suggesting we always setup dm-zoned for
> >> all zoned disks and just make an excemption on dm-zone code to somehow
> >> use the disk directly if a filesystem supports zoned disks directly somehow?
> >
> > No, I'm saying that a udev rule wouldn't be needed if dm-zoned just
> > errored out if asked to consume disks that already have an f2fs
> > superblock. And existing filesystems should get conflicting superblock
> > awareness "for free" if blkid or whatever is trained to be aware of
> > f2fs's superblock.
>
> Well that is the case already: on startup, dm-zoned will read its own
> metadata from sector 0, same as f2fs would do with its super-block. If
> the format/magic does not match expected values, dm-zoned will bail out
> and return an error. dm-zoned metadata and f2fs metadata reside in the
> same place and overwrite each other. There is no way to get one working
> on top of the other. I do not see any possibility of a problem on startup.
>
> But definitely, the user land format tools can step on each other toes.
> That needs fixing.
Right, I was talking about in the .ctr path for initial device creation,
not activation of a previously created dm-zoned device.
But I agree it makes most sense to do this check in userspace.
> >> f2fs does not require dm-zoned. What would be required is a bit more complex
> >> given one could dedicate portions of the disk to f2fs and other portions to
> >> another filesystem, which would require dm-zoned.
> >>
> >> Also filesystems which *do not* support zoned disks should *not* be allowing
> >> direct setup. Today that's all filesystems other than f2fs, in the future
> >> that may change. Those are bullets we are allowing to trigger for users
> >> just waiting to shot themselves on the foot with.
> >>
> >> So who's going to work on all the above?
> >
> > It should take care of itself if existing tools are trained to be aware
> > of new signatures. E.g. ext4 and xfs already are aware of one another
> > so that you cannot reformat a device with the other unless force is
> > given.
> >
> > Same kind of mutual exclussion needs to happen for zoned devices.
>
> Yes.
>
> > So the zoned device tools, dm-zoned, f2fs, whatever.. they need to be
> > updated to not step on each others toes. And other filesystems' tools
> > need to be updated to be zoned device aware.
>
> I will update dm-zoned tools to check for known FS superblocks,
> similarly to what mkfs.ext4 and mkfs.xfs do.
Thanks.
> >>>> 3) run dmsetup for the rest of devices
> >>>
> >>> automagically running dmsetup directly from udev to create a dm-zoned
> >>> target is very much wrong. It just gets in the way of proper support
> >>> that should be add to appropriate tools that admins use to setup their
> >>> zoned devices. For instance, persistent use of dm-zoned target should
> >>> be made reliable with a volume manager..
> >>
> >> Ah yes, but who's working on that? How long will it take?
> >
> > No idea, as is (from my vantage point) there is close to zero demand for
> > zoned devices. It won't be a priority until enough customers are asking
> > for it.
>
> From my point of view (drive vendor), things are different. We do see an
> increasing interest for these drives. However, most use cases are still
> limited to application based direct disk access with minimal involvement
> from the kernel and so few "support" requests. Many reasons to this, but
> one is to some extent the current lack of extended support by the
> kernel. Despite all the recent work done, as Luis experienced, zoned
> drives are still far harder to easily setup than regular disks. Chicken
> and egg situation...
>
> >> I agree it is odd to expect one to use dmsetup and then use a volume manager on
> >> top of it, if we can just add proper support onto the volume manager... then
> >> that's a reasonable way to go.
> >>
> >> But *we're not there* yet, and as-is today, what is described in the udev
> >> script is the best we can do for a generic setup.
> >
> > Just because doing things right takes work doesn't mean it makes sense
> > to elevate this udev script to be packaged in some upstream project like
> > udev or whatever.
>
> Agree. Will start looking into better solutions now that at least one
> user (Luis) complained. The customer is king.
>
> >>> In general this udev script is unwelcome and makes things way worse for
> >>> the long-term success of zoned devices.
> >>
> >> dm-zoned-tools does not acknowledge in any way a roadmap, and just provides
> >> a script, which IMHO is less generic and less distribution friendly. Having
> >> a udev rule in place to demonstrate the current state of affairs IMHO is
> >> more scalable demonstrates the issues better than the script.
> >>
> >> If we have an agreed upon long term strategy lets document that. But from
> >> what I gather we are not even in consensus with regards to the scheduler
> >> stuff. If we have consensus on the other stuff lets document that as
> >> dm-zoned-tools is the only place I think folks could find to reasonably
> >> deploy these things.
> >
> > I'm sure Damien and others will have something to say here.
>
> Yes. The scheduler setup pain is real. Jens made it clear that he
> prefers a udev rule. I fully understand his point of view, yet, I think
> an automatic switch in the block layer would be far easier and generate
> a lot less problem for users, and likely less "bug report" to
> distributions vendors (and to myself too).
Yeap, Jens would say that ;) Unfortnately using udev to get this
critical configuration correct is a real leap of faith that will prove
to be a whack-a-mole across distributions.
> That said, I also like to see the current dependency of zoned devices on
> the deadline scheduler as temporary until a better solution for ensuring
> write ordering is found. After all, requiring deadline as the disk
> scheduler does impose other limitations on the user. Lack of I/O
> priority support and no cgroup based fairness are two examples of what
> other schedulers provide but is lost with forcing deadline.
>
> The obvious fix is of course to make all disk schedulers zone device
> aware. A little heavy handed, probably lots of duplicated/similar code,
> and many more test cases to cover. This approach does not seem
> sustainable to me.
Right, it isn't sustainable. There isn't enough zoned device developer
expertise to go around.
> We discussed other possibilities at LSF/MM (specialized write queue in
> multi-queue path). One could also think of more invasive changes to the
> block layer (e.g. adding an optional "dispatcher" layer to tightly
> control command ordering ?). And probably a lot more options, But I am
> not yet sure what an appropriate replacement to deadline would be.
>
> Eventually, the removal of the legacy I/O path may also be the trigger
> to introduce some deeper design changes to blk-mq to accommodate more
> easily zoned block devices or other non-standard block devices (open
> channel SSDs for instance).
>
> As you can see from the above, working with these drives all day long
> does not make for a clear strategy. Inputs from other here are more than
> welcome. I would be happy to write up all the ideas I have to start a
> discussion so that we can come to a consensus and have a plan.
Doesn't hurt to establish a future plan(s) but we need to deal with the
reality of what we have. And all we have for this particular issue is
"deadline". Setting anything else is a bug.
Short of the block layer reinstating the ability for a driver to specify
an elevator: should the zoned driver put a check in place that errors
out if anything other than deadline is configured?
That'd at least save users from a very cutthroat learning curve.
> >>> I don't dispute there is an obvious void for how to properly setup zoned
> >>> devices, but this script is _not_ what should fill that void.
> >>
> >> Good to know! Again, consider it as an alternative to the script.
> >>
> >> I'm happy to adapt the language and supply it only as an example script
> >> developers can use, but we can't leave users hanging as well. Let's at
> >> least come up with a plan which we seem to agree on and document that.
> >
> > Best to try to get Damien and others more invested in zoned devices to
> > help you take up your cause. I think it is worthwhile to develop a
> > strategy. But it needs to be done in terms of the norms of the existing
> > infrastructure we all make use of today. So first step is making
> > existing tools zoned device aware (even if to reject such devices).
>
> Rest assured that I am fully invested in improving the existing
> infrastructure for zoned block devices. As mentioned above, applications
> based use of zoned block devices still prevails today. So I do tend to
> work more on that side of things (libzbc, tcmu, sysutils for instance)
> rather than on a better integration with more advanced tools (such as
> LVM) relying on kernel features. I am however seeing rising interest in
> file systems and also in dm-zoned. So definitely it is time to step up
> work in that area to further simplify using these drives.
>
> Thank you for the feedback.
Thanks for your insight. Sounds like you're ontop of it.
Mike
next prev parent reply other threads:[~2018-06-15 14:50 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-14 0:11 [PATCH] dm-zoned-tools: add zoned disk udev rules for scheduler / dmsetup Luis R. Rodriguez
2018-06-14 10:01 ` Damien Le Moal
2018-06-14 13:39 ` Bart Van Assche
2018-06-14 13:42 ` Christoph Hellwig
2018-06-15 11:07 ` Martin Wilck
2018-06-14 12:38 ` Mike Snitzer
2018-06-14 16:23 ` Bart Van Assche
2018-06-14 17:37 ` Luis R. Rodriguez
2018-06-14 17:46 ` Luis R. Rodriguez
2018-06-14 17:58 ` Mike Snitzer
2018-06-15 9:59 ` Damien Le Moal
2018-06-15 14:50 ` Mike Snitzer [this message]
2018-06-15 9:00 ` Damien Le Moal
2018-06-14 16:19 ` [PATCH] " Bart Van Assche
2018-06-14 17:44 ` Luis R. Rodriguez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180615145037.GB1386@redhat.com \
--to=snitzer@redhat.com \
--cc=Damien.LeMoal@wdc.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=ghe@suse.com \
--cc=hare@suse.de \
--cc=jaegeuk@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mwilck@suse.com \
--cc=tchvatal@suse.com \
--cc=yuchao0@huawei.com \
--cc=zren@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).