From: Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
To: Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
Cc: Eric Sandeen <sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Theodore Ts'o <tytso-3s7WtUTddSA@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
"Darrick J. Wong"
<darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>,
Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>,
"linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org"
<linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org>,
Dave Chinner <david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>,
xfs <linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Lukas Czerner <lczerner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
linux-ext4 <linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [PATCH 0/9] add ext4 per-inode DAX flag
Date: Thu, 7 Sep 2017 15:51:48 -0600 [thread overview]
Message-ID: <20170907215148.GA12669@linux.intel.com> (raw)
In-Reply-To: <5F58D3F5-D93B-4648-AE01-8A46956FBB4B-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote:
> On Sep 7, 2017, at 3:13 PM, Ross Zwisler <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> >
> > On Thu, Sep 07, 2017 at 01:54:45PM -0700, Dan Williams wrote:
> >> On Wed, Sep 6, 2017 at 10:07 AM, Ross Zwisler
> >> <ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> >>> On Tue, Sep 05, 2017 at 09:12:35PM -0500, Eric Sandeen wrote:
> >>>> On 9/5/17 5:35 PM, Ross Zwisler wrote:
> >>>>> The original intent of this series was to add a per-inode DAX flag to ext4
> >>>>> so that it would be consistent with XFS. In my travels I found and fixed
> >>>>> several related issues in both ext4 and XFS.
> >>>>
> >>>> Hi Ross -
> >>>>
> >>>> hch had a lot of reasons to nuke the dax flag from orbit, and we just
> >>>> /disabled/ it in xfs due to its habit of crashing the kernel...
> >>>
> >>> Ah, sorry, I wasn't CC'd on those threads and missed them. For any interested
> >>> bystanders:
> >>>
> >>> https://www.spinics.net/lists/linux-ext4/msg57840.html
> >>> https://www.spinics.net/lists/linux-xfs/msg09831.html
> >>> https://www.spinics.net/lists/linux-xfs/msg10124.html
> >>>
> >>>> so a couple questions:
> >>>>
> >>>> 1) does this series pass hch's "test the per-inode DAX flag" fstest?
> >>>
> >>> Nope, it has the exact same problems as the XFS per-inode DAX flag.
> >>>
> >>>> 2) do we have an agreement that we need this flag at all, or is this
> >>>> just a parity item because xfs has^whad a per-inode flag?
> >>>
> >>> It was for parity, and because it allows admins finer grained control over
> >>> their system. Basically all things discussed in response to Lukas's original
> >>> patch in the first link above.
> >>
> >> I think it's more than parity. When pmem is slower than page cache it
> >> is actively harmful to have DAX enabled globally for a filesystem. So,
> >> not only should we push for per-inode DAX control, we should also push
> >> to deprecate the mount option. I agree with Christoph that we should
> >> try to automatically and transparently enable DAX where it makes
> >> sense, but we also need a finer-grained mechanism than a mount flag to
> >> force the behavior one way or the other.
> >
> > Yep, agreed. I'll play with how to make this work after I've sorted out all
> > the data corruptions I've found. :)
>
> It seems that the majority of problems are from enabling/disabling S_DAX
> on an inode that already has dirty data.
I don't think it's precisely about dirty data, more about having mappings set
up and I/Os in flight, even if those are read operations. Tomorrow I'll post
some xfstests for the data corruptions due to DAX + each of inline data and
journaling, and those both happen because we set up one mapping to page cache,
and one to DAX. Once either is written to they become out of sync.
> However, I wonder if this could
> be prevented at runtime, and only allow S_DAX to be set when the inode is
> first instantiated, and wouldn't be allowed to change after that? Setting
> or clearing the per-inode DAX flag might still be allowed, but it wouldn't
> be enabled until the inode is next fetched into cache? Similarly, for
> inodes that have conflicting features (e.g. inline data or encryption)
> would not be allowed to enable S_DAX.
Ooh, this seems interesting. This would ensure that S_DAX transitions
couldn't ever race with I/Os or mmaps(). I had some other ideas for how to
handle this, but I think your idea is more promising. :)
I guess with this solution we'd need:
a) A good way of letting the user detect the state where they had set the DAX
inode flag, but that it wasn't yet in use by the inode.
b) A reliable way of flushing the inode from the filesystem cache, so that the
next time an open() happens they get the new behavior. The way I usually do
this is via umount/remount, but there is probably already a way to do this?
> My assumption here is that it is possible to fall back to always using
> page cache for such inodes, and flush the data to pmem via the block
> interface for inodes that don't have S_DAX set?
Correct.
> That would allow the vast majority of cases to work out of the box, or in
> a few rare cases where the DAX feature is being changed (e.g. inline data
> inode on disk growing to external disk blocks) would use the page cache
> until such a time that the inode is dropped from cache and reloaded (at
> worst the next remount).
Ah, yep, this has the potential to solve those cases as well. Seems
promising, to me at least. :)
next prev parent reply other threads:[~2017-09-07 21:51 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-05 22:35 [PATCH 0/9] add ext4 per-inode DAX flag Ross Zwisler
2017-09-05 22:35 ` [PATCH 1/9] ext4: remove duplicate extended attributes defs Ross Zwisler
2017-09-06 7:29 ` Jan Kara
[not found] ` <20170905223541.20594-1-ross.zwisler-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-09-05 22:35 ` [PATCH 2/9] xfs: always use DAX if mount option is used Ross Zwisler
2017-09-05 22:35 ` [PATCH 3/9] xfs: validate bdev support for DAX inode flag Ross Zwisler
2017-09-05 22:35 ` [PATCH 4/9] ext4: add ext4_should_use_dax() Ross Zwisler
2017-09-05 22:35 ` [PATCH 5/9] ext4: ext4_change_inode_journal_flag error handling Ross Zwisler
2017-09-05 22:35 ` [PATCH 6/9] ext4: safely transition S_DAX on journaling changes Ross Zwisler
2017-09-06 9:47 ` Jan Kara
[not found] ` <20170906094700.GC27916-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2017-09-06 17:09 ` Ross Zwisler
2017-09-05 22:35 ` [PATCH 7/9] ext4: prevent data corruption with inline data + DAX Ross Zwisler
2017-09-06 20:55 ` Andreas Dilger
2017-09-06 23:11 ` Ross Zwisler
2017-09-05 22:35 ` [PATCH 8/9] ext4: add sanity check for encryption " Ross Zwisler
2017-09-05 22:35 ` [PATCH 9/9] ext4: add per-inode DAX flag Ross Zwisler
2017-09-06 2:12 ` [PATCH 0/9] add ext4 " Eric Sandeen
2017-09-06 17:07 ` Ross Zwisler
2017-09-07 20:54 ` Dan Williams
[not found] ` <CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-09-07 21:13 ` Ross Zwisler
2017-09-07 21:26 ` Andreas Dilger
[not found] ` <5F58D3F5-D93B-4648-AE01-8A46956FBB4B-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>
2017-09-07 21:51 ` Ross Zwisler [this message]
2017-09-07 22:12 ` Dave Chinner
2017-09-07 22:19 ` Ross Zwisler
[not found] ` <20170907221900.GB12669-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2017-09-07 23:25 ` Dave Chinner
2017-09-08 9:48 ` Jan Kara
2017-09-08 15:39 ` Theodore Ts'o
[not found] ` <20170908153913.jjhzogjs5zpeea5v-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2017-09-11 8:47 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170907215148.GA12669@linux.intel.com \
--to=ross.zwisler-vuqaysv1563yd54fqh9/ca@public.gmane.org \
--cc=adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org \
--cc=david-FqsqvQoI3Ljby3iVrkZq2A@public.gmane.org \
--cc=hch-jcswGhMUV9g@public.gmane.org \
--cc=jack-AlSwsSmVLrQ@public.gmane.org \
--cc=lczerner-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-ext4-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org \
--cc=linux-xfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=sandeen-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=tytso-3s7WtUTddSA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).