From: Mike Snitzer <snitzer@redhat.com>
To: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>,
"hare@suse.de" <hare@suse.de>,
"axboe@kernel.dk" <axboe@kernel.dk>,
"jaegeuk@kernel.org" <jaegeuk@kernel.org>,
"yuchao0@huawei.com" <yuchao0@huawei.com>,
"ghe@suse.com" <ghe@suse.com>,
"mwilck@suse.com" <mwilck@suse.com>,
"tchvatal@suse.com" <tchvatal@suse.com>,
"zren@suse.com" <zren@suse.com>,
"agk@redhat.com" <agk@redhat.com>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>
Subject: Re: dm-zoned-tools: add zoned disk udev rules for scheduler / dmsetup
Date: Fri, 15 Jun 2018 10:50:37 -0400 [thread overview]
Message-ID: <20180615145037.GB1386@redhat.com> (raw)
In-Reply-To: <3dca87db-d8ce-5229-0fd9-939501bc6b3a@wdc.com>
On Fri, Jun 15 2018 at 5:59am -0400,
Damien Le Moal <Damien.LeMoal@wdc.com> wrote:
> Mike,
>
> On 6/15/18 02:58, Mike Snitzer wrote:
> > On Thu, Jun 14 2018 at 1:37pm -0400,
> > Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >
> >> On Thu, Jun 14, 2018 at 08:38:06AM -0400, Mike Snitzer wrote:
> >>> On Wed, Jun 13 2018 at 8:11pm -0400,
> >>> Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >>>
> >>>> Setting up a zoned disks in a generic form is not so trivial. There
> >>>> is also quite a bit of tribal knowledge with these devices which is not
> >>>> easy to find.
> >>>>
> >>>> The currently supplied demo script works but it is not generic enough to be
> >>>> practical for Linux distributions or even developers which often move
> >>>> from one kernel to another.
> >>>>
> >>>> This tries to put a bit of this tribal knowledge into an initial udev
> >>>> rule for development with the hopes Linux distributions can later
> >>>> deploy. Three rule are added. One rule is optional for now, it should be
> >>>> extended later to be more distribution-friendly and then I think this
> >>>> may be ready for consideration for integration on distributions.
> >>>>
> >>>> 1) scheduler setup
> >>>
> >>> This is wrong.. if zoned devices are so dependent on deadline or
> >>> mq-deadline then the kernel should allow them to be hardcoded. I know
> >>> Jens removed the API to do so but the fact that drivers need to rely on
> >>> hacks like this udev rule to get a functional device is proof we need to
> >>> allow drivers to impose the scheduler used.
> >>
> >> This is the point to the patch as well, I actually tend to agree with you,
> >> and I had tried to draw up a patch to do just that, however its *not* possible
> >> today to do this and would require some consensus. So from what I can tell
> >> we *have* to live with this one or a form of it. Ie a file describing which
> >> disk serial gets deadline and which one gets mq-deadline.
> >>
> >> Jens?
> >>
> >> Anyway, let's assume this is done in the kernel, which one would use deadline,
> >> which one would use mq-deadline?
> >
> > The zoned storage driver needs to make that call based on what mode it
> > is in. If it is using blk-mq then it selects mq-deadline, otherwise
> > deadline.
>
> As Bart pointed out, deadline is an alias of mq-deadline. So using
> "deadline" as the scheduler name works in both legacy and mq cases.
>
> >>>> 2) backlist f2fs devices
> >>>
> >>> There should porbably be support in dm-zoned for detecting whether a
> >>> zoned device was formatted with f2fs (assuming there is a known f2fs
> >>> superblock)?
> >>
> >> Not sure what you mean. Are you suggesting we always setup dm-zoned for
> >> all zoned disks and just make an excemption on dm-zone code to somehow
> >> use the disk directly if a filesystem supports zoned disks directly somehow?
> >
> > No, I'm saying that a udev rule wouldn't be needed if dm-zoned just
> > errored out if asked to consume disks that already have an f2fs
> > superblock. And existing filesystems should get conflicting superblock
> > awareness "for free" if blkid or whatever is trained to be aware of
> > f2fs's superblock.
>
> Well that is the case already: on startup, dm-zoned will read its own
> metadata from sector 0, same as f2fs would do with its super-block. If
> the format/magic does not match expected values, dm-zoned will bail out
> and return an error. dm-zoned metadata and f2fs metadata reside in the
> same place and overwrite each other. There is no way to get one working
> on top of the other. I do not see any possibility of a problem on startup.
>
> But definitely, the user land format tools can step on each other toes.
> That needs fixing.
Right, I was talking about in the .ctr path for initial device creation,
not activation of a previously created dm-zoned device.
But I agree it makes most sense to do this check in userspace.
> >> f2fs does not require dm-zoned. What would be required is a bit more complex
> >> given one could dedicate portions of the disk to f2fs and other portions to
> >> another filesystem, which would require dm-zoned.
> >>
> >> Also filesystems which *do not* support zoned disks should *not* be allowing
> >> direct setup. Today that's all filesystems other than f2fs, in the future
> >> that may change. Those are bullets we are allowing to trigger for users
> >> just waiting to shot themselves on the foot with.
> >>
> >> So who's going to work on all the above?
> >
> > It should take care of itself if existing tools are trained to be aware
> > of new signatures. E.g. ext4 and xfs already are aware of one another
> > so that you cannot reformat a device with the other unless force is
> > given.
> >
> > Same kind of mutual exclussion needs to happen for zoned devices.
>
> Yes.
>
> > So the zoned device tools, dm-zoned, f2fs, whatever.. they need to be
> > updated to not step on each others toes. And other filesystems' tools
> > need to be updated to be zoned device aware.
>
> I will update dm-zoned tools to check for known FS superblocks,
> similarly to what mkfs.ext4 and mkfs.xfs do.
Thanks.
> >>>> 3) run dmsetup for the rest of devices
> >>>
> >>> automagically running dmsetup directly from udev to create a dm-zoned
> >>> target is very much wrong. It just gets in the way of proper support
> >>> that should be add to appropriate tools that admins use to setup their
> >>> zoned devices. For instance, persistent use of dm-zoned target should
> >>> be made reliable with a volume manager..
> >>
> >> Ah yes, but who's working on that? How long will it take?
> >
> > No idea, as is (from my vantage point) there is close to zero demand for
> > zoned devices. It won't be a priority until enough customers are asking
> > for it.
>
> From my point of view (drive vendor), things are different. We do see an
> increasing interest for these drives. However, most use cases are still
> limited to application based direct disk access with minimal involvement
> from the kernel and so few "support" requests. Many reasons to this, but
> one is to some extent the current lack of extended support by the
> kernel. Despite all the recent work done, as Luis experienced, zoned
> drives are still far harder to easily setup than regular disks. Chicken
> and egg situation...
>
> >> I agree it is odd to expect one to use dmsetup and then use a volume manager on
> >> top of it, if we can just add proper support onto the volume manager... then
> >> that's a reasonable way to go.
> >>
> >> But *we're not there* yet, and as-is today, what is described in the udev
> >> script is the best we can do for a generic setup.
> >
> > Just because doing things right takes work doesn't mean it makes sense
> > to elevate this udev script to be packaged in some upstream project like
> > udev or whatever.
>
> Agree. Will start looking into better solutions now that at least one
> user (Luis) complained. The customer is king.
>
> >>> In general this udev script is unwelcome and makes things way worse for
> >>> the long-term success of zoned devices.
> >>
> >> dm-zoned-tools does not acknowledge in any way a roadmap, and just provides
> >> a script, which IMHO is less generic and less distribution friendly. Having
> >> a udev rule in place to demonstrate the current state of affairs IMHO is
> >> more scalable demonstrates the issues better than the script.
> >>
> >> If we have an agreed upon long term strategy lets document that. But from
> >> what I gather we are not even in consensus with regards to the scheduler
> >> stuff. If we have consensus on the other stuff lets document that as
> >> dm-zoned-tools is the only place I think folks could find to reasonably
> >> deploy these things.
> >
> > I'm sure Damien and others will have something to say here.
>
> Yes. The scheduler setup pain is real. Jens made it clear that he
> prefers a udev rule. I fully understand his point of view, yet, I think
> an automatic switch in the block layer would be far easier and generate
> a lot less problem for users, and likely less "bug report" to
> distributions vendors (and to myself too).
Yeap, Jens would say that ;) Unfortnately using udev to get this
critical configuration correct is a real leap of faith that will prove
to be a whack-a-mole across distributions.
> That said, I also like to see the current dependency of zoned devices on
> the deadline scheduler as temporary until a better solution for ensuring
> write ordering is found. After all, requiring deadline as the disk
> scheduler does impose other limitations on the user. Lack of I/O
> priority support and no cgroup based fairness are two examples of what
> other schedulers provide but is lost with forcing deadline.
>
> The obvious fix is of course to make all disk schedulers zone device
> aware. A little heavy handed, probably lots of duplicated/similar code,
> and many more test cases to cover. This approach does not seem
> sustainable to me.
Right, it isn't sustainable. There isn't enough zoned device developer
expertise to go around.
> We discussed other possibilities at LSF/MM (specialized write queue in
> multi-queue path). One could also think of more invasive changes to the
> block layer (e.g. adding an optional "dispatcher" layer to tightly
> control command ordering ?). And probably a lot more options, But I am
> not yet sure what an appropriate replacement to deadline would be.
>
> Eventually, the removal of the legacy I/O path may also be the trigger
> to introduce some deeper design changes to blk-mq to accommodate more
> easily zoned block devices or other non-standard block devices (open
> channel SSDs for instance).
>
> As you can see from the above, working with these drives all day long
> does not make for a clear strategy. Inputs from other here are more than
> welcome. I would be happy to write up all the ideas I have to start a
> discussion so that we can come to a consensus and have a plan.
Doesn't hurt to establish a future plan(s) but we need to deal with the
reality of what we have. And all we have for this particular issue is
"deadline". Setting anything else is a bug.
Short of the block layer reinstating the ability for a driver to specify
an elevator: should the zoned driver put a check in place that errors
out if anything other than deadline is configured?
That'd at least save users from a very cutthroat learning curve.
> >>> I don't dispute there is an obvious void for how to properly setup zoned
> >>> devices, but this script is _not_ what should fill that void.
> >>
> >> Good to know! Again, consider it as an alternative to the script.
> >>
> >> I'm happy to adapt the language and supply it only as an example script
> >> developers can use, but we can't leave users hanging as well. Let's at
> >> least come up with a plan which we seem to agree on and document that.
> >
> > Best to try to get Damien and others more invested in zoned devices to
> > help you take up your cause. I think it is worthwhile to develop a
> > strategy. But it needs to be done in terms of the norms of the existing
> > infrastructure we all make use of today. So first step is making
> > existing tools zoned device aware (even if to reject such devices).
>
> Rest assured that I am fully invested in improving the existing
> infrastructure for zoned block devices. As mentioned above, applications
> based use of zoned block devices still prevails today. So I do tend to
> work more on that side of things (libzbc, tcmu, sysutils for instance)
> rather than on a better integration with more advanced tools (such as
> LVM) relying on kernel features. I am however seeing rising interest in
> file systems and also in dm-zoned. So definitely it is time to step up
> work in that area to further simplify using these drives.
>
> Thank you for the feedback.
Thanks for your insight. Sounds like you're ontop of it.
Mike
next prev parent reply other threads:[~2018-06-15 14:50 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-14 0:11 [PATCH] dm-zoned-tools: add zoned disk udev rules for scheduler / dmsetup Luis R. Rodriguez
2018-06-14 10:01 ` Damien Le Moal
2018-06-14 10:01 ` Damien Le Moal
2018-06-14 13:39 ` Bart Van Assche
2018-06-14 13:39 ` Bart Van Assche
2018-06-14 13:42 ` Christoph Hellwig
2018-06-15 11:07 ` Martin Wilck
2018-06-14 12:38 ` Mike Snitzer
2018-06-14 16:23 ` Bart Van Assche
2018-06-14 16:23 ` Bart Van Assche
2018-06-14 17:37 ` Luis R. Rodriguez
2018-06-14 17:46 ` Luis R. Rodriguez
2018-06-14 17:58 ` Mike Snitzer
2018-06-15 9:59 ` Damien Le Moal
2018-06-15 9:59 ` Damien Le Moal
2018-06-15 14:50 ` Mike Snitzer [this message]
2018-06-15 9:00 ` Damien Le Moal
2018-06-15 9:00 ` Damien Le Moal
2018-06-14 16:19 ` [PATCH] " Bart Van Assche
2018-06-14 16:19 ` Bart Van Assche
2018-06-14 17:44 ` Luis R. Rodriguez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180615145037.GB1386@redhat.com \
--to=snitzer@redhat.com \
--cc=Damien.LeMoal@wdc.com \
--cc=agk@redhat.com \
--cc=axboe@kernel.dk \
--cc=ghe@suse.com \
--cc=hare@suse.de \
--cc=jaegeuk@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mwilck@suse.com \
--cc=tchvatal@suse.com \
--cc=yuchao0@huawei.com \
--cc=zren@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.