From: Mike Snitzer <snitzer@redhat.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Toshi Kani <toshi.kani@hpe.com>,
dm-devel@redhat.com, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2 4/7] dm: prevent DAX mounts if not supported
Date: Wed, 20 Jun 2018 11:17:49 -0400 [thread overview]
Message-ID: <20180620151748.GA4847@redhat.com> (raw)
In-Reply-To: <20180604231508.GA10666@linux.intel.com>
On Mon, Jun 04 2018 at 7:15pm -0400,
Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> On Fri, Jun 01, 2018 at 05:55:13PM -0400, Mike Snitzer wrote:
> > On Tue, May 29 2018 at 3:51pm -0400,
> > Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> >
> > > Currently the code in dm_dax_direct_access() only checks whether the target
> > > type has a direct_access() operation defined, not whether the underlying
> > > block devices all support DAX. This latter property can be seen by looking
> > > at whether we set the QUEUE_FLAG_DAX request queue flag when creating the
> > > DM device.
> >
> > Wait... I thought DAX support was all or nothing?
>
> Right, it is, and that's what I'm trying to capture. The point of this series
> is to make sure that we don't use DAX thru DM if one of the DM members doesn't
> support DAX.
>
> This is a bit tricky, though, because as you've pointed out there are a lot of
> elements that go into a block device actually supporting DAX.
>
> First, the block device has to have a direct_access() operation defined in its
> struct dax_operations table. This is a static definition in the drivers,
> though, so it's necessary but not sufficient. For example, the PMEM driver
> always defines a direct_access() operation, but depending on the mode of the
> namespace (raw, fsdax or sector) it may or may not support DAX.
>
> The next step is that a driver needs to say that he block queue supports
> QUEUE_FLAG_DAX. This again is necessary but not sufficient. The PMEM driver
> currently sets this for all namespace modes, but I agree that this should be
> restricted to modes that support DAX. Even once we do that, though, for the
> block driver this isn't fully sufficient. We'd really like users to call
> bdev_dax_supported() so it can run some additional tests to make sure that DAX
> will work.
>
> So, the real test that filesystems rely on is bdev_dax_suppported().
>
> The trick is that with DM we need to verify each block device via
> bdev_dax_supported() just like a filesystem would, and then have some way of
> communicating the result of all those checks to the filesystem which is
> eventually mounted on the DM device. At DAX mount time the filesystem will
> call bdev_dax_supported() on the DM device, but it'll really only check the
> first device.
>
> So, the strategy is to have DM manually check each member device via
> bdev_dax_supported() then if they all pass set QUEUE_FLAG_DAX. This then
> becomes our one source of truth on whether or not a DM device supports DAX.
> When the filesystem mounts with DAX support it'll also run
> bdev_dax_supported(), but if we have QUEUE_FLAG_DAX set on the DM device, we
> know that this check will pass.
>
> > > This is problematic if we have, for example, a dm-linear device made up of
> > > a PMEM namespace in fsdax mode followed by a ramdisk from BRD.
> > > QUEUE_FLAG_DAX won't be set on the dm-linear device's request queue, but
> > > we have a working direct_access() entry point and the first member of the
> > > dm-linear set *does* support DAX.
> >
> > If you don't have a uniformly capable device then it is very dangerous
> > to advertise that the entire device has a certain capability. That
> > completely bit me in the past with discard (because for every IO I
> > wasn't then checking if the destination device supported discards).
> >
> > It is all well and good that you're adding that check here. But what I
> > don't like is how you're saying QUEUE_FLAG_DAX implies direct_access()
> > operation exists.. yet for raw PMEM namespaces we just discussed how
> > that is a lie.
>
> QUEUE_FLAG_DAX does imply that direct_access() exits. However, as discussed
> above for a given bdev we really do need to check bdev_dax_supported().
>
> > SO this type of change showcases how the QUEUE_FLAG_DAX doesn't _really_
> > imply direct_access() exists.
> >
> > > This allows the user to create a filesystem on the dm-linear device, and
> > > then mount it with DAX. The filesystem's bdev_dax_supported() test will
> > > pass because it'll operate on the first member of the dm-linear device,
> > > which happens to be a fsdax PMEM namespace.
> > >
> > > All DAX I/O will then fail to that dm-linear device because the lack of
> > > QUEUE_FLAG_DAX prevents fs_dax_get_by_bdev() from working. This means that
> > > the struct dax_device isn't ever set in the filesystem, so
> > > dax_direct_access() will always return -EOPNOTSUPP.
> >
> > Now you've lost me... these past 2 paragraphs. Why can a user mount it
> > is DAX mode? Because bdev_dax_supported() only accesses the first
> > portion (which happens to have DAX capabilities?)
>
> Right. bdev_dax_supported() runs all of its checks, and because they are
> running against the first block device in the dm set, they all pass. But the
> overall DM device does not actually support DAX.
>
> > Isn't this exactly why you should be checking for QUEUE_FLAG_DAX in the
> > caller (bdev_dax_supported)? Why not use bdev_get_queue() and verify
> > QUEUE_FLAG_DAX is set in there?
>
> I'll look into that for the next revision, thanks.
Have you made any progress on a new revision?
> > > By failing out of dm_dax_direct_access() if QUEUE_FLAG_DAX isn't set we let
> > > the filesystem know we don't support DAX at mount time. The filesystem
> > > will then silently fall back and remove the dax mount option, causing it to
> > > work properly.
> >
> > This shouldn't be needed. Again, QUEUE_FLAG_DAX wasn't set.. so don't
> > allow code to falsely try operations that should've been gated by the
> > fact it wasn't set.
>
> Right, the goal is to make QUEUE_FLAG_DAX our one source of truth for whether
> DM devices support DAX, and not have it half defined by that and half by the
> DM_TYPE_DAX_BIO_BASED.
My hope is that you can ignore the DM-internal book-keeping
(DM_TYPE_DAX_BIO_BASED) for now and just focus on fixing the real issue
of needing proper checking (as well as properly _not_ setting
QUEUE_FLAG_DAX in the case of pmem "raw").
Please advise, thanks Ross!
Mike
WARNING: multiple messages have this Message-ID (diff)
From: Mike Snitzer <snitzer@redhat.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>,
Toshi Kani <toshi.kani@hpe.com>,
dm-devel@redhat.com, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org,
linux-xfs@vger.kernel.org
Subject: Re: [PATCH v2 4/7] dm: prevent DAX mounts if not supported
Date: Wed, 20 Jun 2018 11:17:49 -0400 [thread overview]
Message-ID: <20180620151748.GA4847@redhat.com> (raw)
In-Reply-To: <20180604231508.GA10666@linux.intel.com>
On Mon, Jun 04 2018 at 7:15pm -0400,
Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> On Fri, Jun 01, 2018 at 05:55:13PM -0400, Mike Snitzer wrote:
> > On Tue, May 29 2018 at 3:51pm -0400,
> > Ross Zwisler <ross.zwisler@linux.intel.com> wrote:
> >
> > > Currently the code in dm_dax_direct_access() only checks whether the target
> > > type has a direct_access() operation defined, not whether the underlying
> > > block devices all support DAX. This latter property can be seen by looking
> > > at whether we set the QUEUE_FLAG_DAX request queue flag when creating the
> > > DM device.
> >
> > Wait... I thought DAX support was all or nothing?
>
> Right, it is, and that's what I'm trying to capture. The point of this series
> is to make sure that we don't use DAX thru DM if one of the DM members doesn't
> support DAX.
>
> This is a bit tricky, though, because as you've pointed out there are a lot of
> elements that go into a block device actually supporting DAX.
>
> First, the block device has to have a direct_access() operation defined in its
> struct dax_operations table. This is a static definition in the drivers,
> though, so it's necessary but not sufficient. For example, the PMEM driver
> always defines a direct_access() operation, but depending on the mode of the
> namespace (raw, fsdax or sector) it may or may not support DAX.
>
> The next step is that a driver needs to say that he block queue supports
> QUEUE_FLAG_DAX. This again is necessary but not sufficient. The PMEM driver
> currently sets this for all namespace modes, but I agree that this should be
> restricted to modes that support DAX. Even once we do that, though, for the
> block driver this isn't fully sufficient. We'd really like users to call
> bdev_dax_supported() so it can run some additional tests to make sure that DAX
> will work.
>
> So, the real test that filesystems rely on is bdev_dax_suppported().
>
> The trick is that with DM we need to verify each block device via
> bdev_dax_supported() just like a filesystem would, and then have some way of
> communicating the result of all those checks to the filesystem which is
> eventually mounted on the DM device. At DAX mount time the filesystem will
> call bdev_dax_supported() on the DM device, but it'll really only check the
> first device.
>
> So, the strategy is to have DM manually check each member device via
> bdev_dax_supported() then if they all pass set QUEUE_FLAG_DAX. This then
> becomes our one source of truth on whether or not a DM device supports DAX.
> When the filesystem mounts with DAX support it'll also run
> bdev_dax_supported(), but if we have QUEUE_FLAG_DAX set on the DM device, we
> know that this check will pass.
>
> > > This is problematic if we have, for example, a dm-linear device made up of
> > > a PMEM namespace in fsdax mode followed by a ramdisk from BRD.
> > > QUEUE_FLAG_DAX won't be set on the dm-linear device's request queue, but
> > > we have a working direct_access() entry point and the first member of the
> > > dm-linear set *does* support DAX.
> >
> > If you don't have a uniformly capable device then it is very dangerous
> > to advertise that the entire device has a certain capability. That
> > completely bit me in the past with discard (because for every IO I
> > wasn't then checking if the destination device supported discards).
> >
> > It is all well and good that you're adding that check here. But what I
> > don't like is how you're saying QUEUE_FLAG_DAX implies direct_access()
> > operation exists.. yet for raw PMEM namespaces we just discussed how
> > that is a lie.
>
> QUEUE_FLAG_DAX does imply that direct_access() exits. However, as discussed
> above for a given bdev we really do need to check bdev_dax_supported().
>
> > SO this type of change showcases how the QUEUE_FLAG_DAX doesn't _really_
> > imply direct_access() exists.
> >
> > > This allows the user to create a filesystem on the dm-linear device, and
> > > then mount it with DAX. The filesystem's bdev_dax_supported() test will
> > > pass because it'll operate on the first member of the dm-linear device,
> > > which happens to be a fsdax PMEM namespace.
> > >
> > > All DAX I/O will then fail to that dm-linear device because the lack of
> > > QUEUE_FLAG_DAX prevents fs_dax_get_by_bdev() from working. This means that
> > > the struct dax_device isn't ever set in the filesystem, so
> > > dax_direct_access() will always return -EOPNOTSUPP.
> >
> > Now you've lost me... these past 2 paragraphs. Why can a user mount it
> > is DAX mode? Because bdev_dax_supported() only accesses the first
> > portion (which happens to have DAX capabilities?)
>
> Right. bdev_dax_supported() runs all of its checks, and because they are
> running against the first block device in the dm set, they all pass. But the
> overall DM device does not actually support DAX.
>
> > Isn't this exactly why you should be checking for QUEUE_FLAG_DAX in the
> > caller (bdev_dax_supported)? Why not use bdev_get_queue() and verify
> > QUEUE_FLAG_DAX is set in there?
>
> I'll look into that for the next revision, thanks.
Have you made any progress on a new revision?
> > > By failing out of dm_dax_direct_access() if QUEUE_FLAG_DAX isn't set we let
> > > the filesystem know we don't support DAX at mount time. The filesystem
> > > will then silently fall back and remove the dax mount option, causing it to
> > > work properly.
> >
> > This shouldn't be needed. Again, QUEUE_FLAG_DAX wasn't set.. so don't
> > allow code to falsely try operations that should've been gated by the
> > fact it wasn't set.
>
> Right, the goal is to make QUEUE_FLAG_DAX our one source of truth for whether
> DM devices support DAX, and not have it half defined by that and half by the
> DM_TYPE_DAX_BIO_BASED.
My hope is that you can ignore the DM-internal book-keeping
(DM_TYPE_DAX_BIO_BASED) for now and just focus on fixing the real issue
of needing proper checking (as well as properly _not_ setting
QUEUE_FLAG_DAX in the case of pmem "raw").
Please advise, thanks Ross!
Mike
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2018-06-20 15:17 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-29 19:50 [PATCH v2 0/7] Fix DM DAX handling Ross Zwisler
2018-05-29 19:50 ` Ross Zwisler
2018-05-29 19:50 ` Ross Zwisler
[not found] ` <20180529195106.14268-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-29 19:51 ` [PATCH v2 1/7] fs: allow per-device dax status checking for filesystems Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` [PATCH v2 2/7] dax: change bdev_dax_supported() to support boolean returns Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
[not found] ` <20180529195106.14268-3-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-29 21:25 ` Darrick J. Wong
2018-05-29 21:25 ` Darrick J. Wong
2018-05-29 21:25 ` Darrick J. Wong
2018-05-29 22:01 ` Ross Zwisler
2018-05-29 22:01 ` Ross Zwisler
[not found] ` <20180529220114.GA13948-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-05-31 19:13 ` Darrick J. Wong
2018-05-31 19:13 ` Darrick J. Wong
2018-05-31 19:13 ` Darrick J. Wong
2018-05-31 20:34 ` Ross Zwisler
2018-05-31 20:34 ` Ross Zwisler
2018-05-31 20:34 ` Ross Zwisler
2018-05-31 20:35 ` Dan Williams
2018-05-31 20:35 ` Dan Williams
2018-05-31 20:35 ` Dan Williams
2018-05-31 20:41 ` Ross Zwisler
2018-05-31 20:41 ` Ross Zwisler
2018-05-31 20:41 ` Ross Zwisler
2018-05-31 20:52 ` Mike Snitzer
2018-05-31 20:52 ` Mike Snitzer
2018-05-31 20:52 ` Mike Snitzer
[not found] ` <20180531205206.GA12681-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-05-31 22:26 ` [dm-devel] " Darrick J. Wong
2018-05-31 22:26 ` Darrick J. Wong
2018-05-31 22:26 ` Darrick J. Wong
2018-06-01 20:59 ` Ross Zwisler
2018-06-01 20:59 ` Ross Zwisler
2018-06-01 20:59 ` Ross Zwisler
2018-06-01 1:26 ` Dave Chinner
2018-06-01 1:26 ` Dave Chinner
2018-06-01 1:57 ` Dan Williams
2018-06-01 1:57 ` Dan Williams
[not found] ` <CAPcyv4g6dv9sG9dR-iY+aEfSi2r3ey1dXGbX-nexKsN72t0QMw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-06-01 2:24 ` Dave Chinner
2018-06-01 2:24 ` Dave Chinner
2018-06-01 2:24 ` Dave Chinner
2018-06-01 4:02 ` Dan Williams
2018-06-01 4:02 ` Dan Williams
2018-06-01 4:02 ` Dan Williams
2018-06-03 22:20 ` Dave Chinner
2018-06-03 22:20 ` Dave Chinner
2018-06-04 0:25 ` Dave Chinner
2018-06-04 0:25 ` Dave Chinner
2018-06-04 0:25 ` Dave Chinner
2018-06-04 1:48 ` Dan Williams
2018-06-04 1:48 ` Dan Williams
2018-06-04 1:48 ` Dan Williams
[not found] ` <CAPcyv4jL5AKgP3io_hBZxkrKzBZq8wtuqk49L48v=XRiOHdoEw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-06-04 23:40 ` Dan Williams
2018-06-04 23:40 ` Dan Williams
2018-06-04 23:40 ` Dan Williams
[not found] ` <CAPcyv4iVU3n3G3Vxf9e6cKuCtQtmrm6+R6vS379NyHX6eTZ5Lg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-06-05 0:33 ` Mike Snitzer
2018-06-05 0:33 ` Mike Snitzer
2018-06-05 0:33 ` Mike Snitzer
[not found] ` <20180605003325.GA6898-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-06-05 5:55 ` Dave Chinner
2018-06-05 5:55 ` Dave Chinner
2018-06-05 5:55 ` Dave Chinner
2018-06-05 3:32 ` Dan Williams
2018-06-05 3:32 ` Dan Williams
2018-05-29 19:51 ` [PATCH v2 3/7] dm: fix test for DAX device support Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
[not found] ` <20180529195106.14268-4-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-06-01 20:19 ` Mike Snitzer
2018-06-01 20:19 ` Mike Snitzer
2018-06-01 20:19 ` Mike Snitzer
[not found] ` <20180601201924.GA1144-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-06-01 20:46 ` Mike Snitzer
2018-06-01 20:46 ` Mike Snitzer
2018-06-01 20:46 ` Mike Snitzer
[not found] ` <20180601204604.GB1144-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-06-01 21:11 ` Ross Zwisler
2018-06-01 21:11 ` Ross Zwisler
2018-06-01 21:11 ` Ross Zwisler
2018-06-01 21:16 ` Dan Williams
2018-06-01 21:16 ` Dan Williams
2018-06-01 21:16 ` Dan Williams
2018-05-29 19:51 ` [PATCH v2 4/7] dm: prevent DAX mounts if not supported Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
[not found] ` <20180529195106.14268-5-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-06-01 21:55 ` Mike Snitzer
2018-06-01 21:55 ` Mike Snitzer
2018-06-01 21:55 ` Mike Snitzer
[not found] ` <20180601215513.GA18712-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-06-04 23:15 ` Ross Zwisler
2018-06-04 23:15 ` Ross Zwisler
2018-06-04 23:15 ` Ross Zwisler
2018-06-20 15:17 ` Mike Snitzer [this message]
2018-06-20 15:17 ` Mike Snitzer
2018-06-25 19:20 ` Ross Zwisler
2018-06-25 19:20 ` Ross Zwisler
2018-05-29 19:51 ` [PATCH v2 5/7] dm: remove DM_TYPE_DAX_BIO_BASED dm_queue_mode Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
[not found] ` <20180529195106.14268-6-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-06-01 22:04 ` Mike Snitzer
2018-06-01 22:04 ` Mike Snitzer
2018-06-01 22:04 ` Mike Snitzer
[not found] ` <20180601220443.GB18712-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-06-04 23:24 ` Ross Zwisler
2018-06-04 23:24 ` Ross Zwisler
2018-06-04 23:24 ` Ross Zwisler
[not found] ` <20180604232416.GB10666-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-06-04 23:49 ` Kani, Toshi
2018-06-04 23:49 ` Kani, Toshi
2018-06-04 23:49 ` Kani, Toshi
2018-06-05 0:46 ` Mike Snitzer
2018-06-05 0:46 ` Mike Snitzer
2018-06-05 0:46 ` Mike Snitzer
[not found] ` <20180605004558.GB6898-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-06-06 17:24 ` Ross Zwisler
2018-06-06 17:24 ` Ross Zwisler
2018-06-06 17:24 ` Ross Zwisler
[not found] ` <20180606172421.GA2208-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2018-06-06 22:29 ` Mike Snitzer
2018-06-06 22:29 ` Mike Snitzer
2018-06-06 22:29 ` Mike Snitzer
2018-05-29 19:51 ` [PATCH v2 6/7] dm-snap: remove unnecessary direct_access() stub Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` [PATCH v2 7/7] dm-error: " Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
2018-05-29 19:51 ` Ross Zwisler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180620151748.GA4847@redhat.com \
--to=snitzer@redhat.com \
--cc=dm-devel@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ross.zwisler@linux.intel.com \
--cc=toshi.kani@hpe.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.