public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: "Michal Suchánek" <msuchanek@suse.de>
Cc: Dave Chinner <david@fromorbit.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Ira Weiny <ira.weiny@intel.com>, Jan Kara <jack@suse.cz>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] xfs: show the dax option in mount options.
Date: Thu, 12 Nov 2020 16:10:46 +0100	[thread overview]
Message-ID: <20201112151046.GD27697@quack2.suse.cz> (raw)
In-Reply-To: <20201112111217.GF29778@kitsune.suse.cz>

On Thu 12-11-20 12:12:17, Michal Suchánek wrote:
> On Thu, Nov 12, 2020 at 12:49:52PM +1100, Dave Chinner wrote:
> > On Wed, Nov 11, 2020 at 11:28:48AM +0100, Michal Suchánek wrote:
> > > On Tue, Nov 10, 2020 at 08:08:23AM +1100, Dave Chinner wrote:
> > > > On Mon, Nov 09, 2020 at 09:27:05PM +0100, Michal Suchánek wrote:
> > > > > On Mon, Nov 09, 2020 at 11:24:19AM -0800, Darrick J. Wong wrote:
> > > > > > On Mon, Nov 09, 2020 at 08:10:08PM +0100, Michal Suchanek wrote:
> > > > > > > xfs accepts both dax and dax_enum but shows only dax_enum. Show both
> > > > > > > options.
> > > > > > > 
> > > > > > > Fixes: 8d6c3446ec23 ("fs/xfs: Make DAX mount option a tri-state")
> > > > > > > Signed-off-by: Michal Suchanek <msuchanek@suse.de>
> > > > > > > ---
> > > > > > >  fs/xfs/xfs_super.c | 2 +-
> > > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > > > > > > index e3e229e52512..a3b00003840d 100644
> > > > > > > --- a/fs/xfs/xfs_super.c
> > > > > > > +++ b/fs/xfs/xfs_super.c
> > > > > > > @@ -163,7 +163,7 @@ xfs_fs_show_options(
> > > > > > >  		{ XFS_MOUNT_GRPID,		",grpid" },
> > > > > > >  		{ XFS_MOUNT_DISCARD,		",discard" },
> > > > > > >  		{ XFS_MOUNT_LARGEIO,		",largeio" },
> > > > > > > -		{ XFS_MOUNT_DAX_ALWAYS,		",dax=always" },
> > > > > > > +		{ XFS_MOUNT_DAX_ALWAYS,		",dax,dax=always" },
> > > > > > 
> > > > > > NAK, programs that require DAX semantics for files stored on XFS must
> > > > > > call statx to detect the STATX_ATTR_DAX flag, as outlined in "Enabling
> > > > > > DAX on xfs" in Documentation/filesystems/dax.txt.
> > > > > statx can be used to query S_DAX.  NOTE that only regular files will
> > > > > ever have S_DAX set and therefore statx will never indicate that S_DAX
> > > > > is set on directories.
> > > > 
> > > > Yup, by design.
> > > > 
> > > > The application doesn't need to do anything complex to make this
> > > > work. If the app wants to use DAX, then it should use
> > > > FS_IOC_FS{GS}ETXATTR to always set the on disk per inode DAX flags
> > > > for it's data dirs and files, and then STATX_ATTR_DAX will *always*
> > > > tell it whether DAX is actively in use at runtime. It's pretty
> > > > simple, really.
> > > > 
> > > > > The filesystem may not have any files so statx cannot be used.
> > > > 
> > > > Really?  The app or installer is about to *write to the fs* and has
> > > > all the permissions it needs to modify the contents of the fs. It's
> > > > pretty simple to create a tmpdir, set the DAX flag on the tmpdir,
> > > > then create a tmpfile in the tmpdir and run STATX_ATTR_DAX on it to
> > > > see if DAX is active or not.....
> > > 
> > > Have you ever seen a 'wizard' style installer?
> > 
> > I wrote my first one in 1995 on Windows NT 3.51 using Installshield.
> > 
> > > Like one that firsts asks what to install, and then presents a list of
> > > suitable locations that have enough space, supported filesystem features
> > > enabled, and whatnot?
> > 
> > Hold on, 1995 is calling me. The application I was packaging used
> > ACLs. But the NTFS version created by windows NT 3.1 was
> > incompatible as ACL support didn't arrive until NT 3.51 and service
> > pack 4(?) for NT 3.1. Yes, I had to write code to probe the
> > filesystems to detect whether ACL support was available or not by
> > -trying to create an ACL-.
> > 
> > I guess you could say "been there, done that, learnt the lesson".
> So we are trying to be as bad as Windows now?
> > 
> > > So to present a list of mountpoints that support DAX one has to scribble
> > > over every mountpoint on the system?
> > 
> > If you are filtering storage options presented to the user by
> > supported features, then you have to probe for them in some way.
> > And that means you have to consider that many option filesystem
> > features that applications use cannot be detected via mount options
> > checking the filesytem config. That is, there are features that can
> > only be discovered by actually testing whether they work or not.
> > 
> > > That sounds ridiculous.
> > 
> > Reality is a harsh mistress. :/
> > 
> > [snip the rest because you're being ridiculous]
> > 
> > Are you aware of ndctl?
> > 
> > $ ndctl list
> > [
> >   {
> >     "dev":"namespace1.0",
> >     "mode":"fsdax",
> >     "map":"mem",
> >     "size":8589934592,
> >     "sector_size":512,
> >     "blockdev":"pmem1"
> >   },
> >   {
> >     "dev":"namespace0.0",
> >     "mode":"fsdax",
> >     "map":"mem",
> >     "size":8589934592,
> >     "sector_size":512,
> >     "blockdev":"pmem0"
> >   }
> > ]
> Yes, that tells me that the device can be configured for dax. Not if the
> filesystem will use it.
> > 
> > Oh, look there are two block devices on this machine that are
> > configured for filesystem DAX (fsdax). They are /dev/pmem0 and
> > /dev/pmem1.
> > 
> > What filesytsems are on them?
> > 
> > $ lsblk -o NAME,SIZE,FSTYPE,MOUNTPOINT /dev/pmem0 /dev/pmem1
> > NAME  SIZE FSTYPE MOUNTPOINT
> > pmem1   8G ext4   /mnt/test
> > pmem0   8G xfs    /mnt/scratch
> > $
> > 
> > One XFs, one ext4, both of which will be using DAX capable unless
> > the dax=never mount option is set. Which:
> Or the bock size does not match page size. Or whatever other requirement
> the filesystem might have is not met.
> > 
> > $ mount  |grep pmem
> > /dev/pmem0 on /mnt/scratch type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)
> > /dev/pmem1 on /mnt/test type ext4 (rw,relatime)
> > $
> > 
> > is not set on either mount.
> > 
> > Hence both filesystems at DAX capable and enabled, and should be
> > presented as options to the user as such.
> No, it is not the case. That is why it would make sense for the kernel
> to make the information about DAX availability accessible somewhere.
> > 
> > And all this comes about because DAX is a property of the block
> > device, not the filesystem. Hence the only time a DAX capable
> > filesystem on a block device that is DAX capable will not be DAX
> > capable is if the dax=never is set...
> See, it is not property of the block device. It is property of the mount
> point. The availability on the device is one requirement but the
> filesystem options affect availability to the user in the end.

No, it is not really a property of the mountpoint either. If anything it is
a property of the inode. Two different inodes on the very same filesystem,
one may support DAX the other will not (think for example of XFS real-time
volumes, or simply inodes with / without S_DAX flag set). And we are back
at what Dave tries to get accross. As inconvenient as it is
statx(STATX_ATTR_DAX) is the only way to tell.

> > Of course, this is just encoding how existing filesystems behave -
> > it's not a requirement for future filesytsems so they may use other
> > mechanisms for enabling/disabling DAX. Which leaves you with the
> > only reliable mechanism of creating filesystem and checking
> > statx(STATX_ATTR_DAX)....
> Or the kernel could just tell the user. But right, information is power,
> and keeping the user in the dark is much more entertaining.

I think it would be more productive if you actually answered Ted's
question: Exactly which application got broken by the change? I know for a
fact that one large DB vendor was parsing mount options in /proc/mounts to
determine whether their DB can use DAX or not (and this was already a
"cleaned up" method because before this they were parsing VMA flags in
/proc/<pid>/smaps which is even worse). But in this case they also seemed
OK to switch to statx() once it is available...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2020-11-12 15:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-09 19:10 [PATCH 0/2] Tristate moount option comatibility fixup Michal Suchanek
2020-11-09 19:10 ` [PATCH 1/2] xfs: show the dax option in mount options Michal Suchanek
2020-11-09 19:24   ` Eric Sandeen
2020-11-09 19:24   ` Darrick J. Wong
2020-11-09 20:27     ` Michal Suchánek
2020-11-09 21:08       ` Dave Chinner
2020-11-11 10:28         ` Michal Suchánek
2020-11-12  1:49           ` Dave Chinner
2020-11-12 11:12             ` Michal Suchánek
2020-11-12 15:10               ` Jan Kara [this message]
2020-11-12 16:08                 ` Ira Weiny
2020-12-01 16:29                   ` Michal Suchánek
2020-11-10 18:38   ` Christoph Hellwig
2020-11-09 19:10 ` [PATCH 2/2] ext4: " Michal Suchanek
2020-11-10 14:14 ` [PATCH 0/2] Tristate moount option comatibility fixup Theodore Y. Ts'o
2020-11-10 18:35 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201112151046.GD27697@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger.kernel@dilger.ca \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=msuchanek@suse.de \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox