Re: [PATCH] block: Fix S_DAX inode flag locking

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Jon Derrick <jonathan.derrick@intel.com>,
	linux-block@vger.kernel.org, Jens Axboe <axboe@fb.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Jeff Moyer <jmoyer@redhat.com>,
	Stephen Bates <stephen.bates@microsemi.com>,
	Keith Busch <keith.busch@intel.com>,
	Christoph Hellwig <hch@infradead.org>,
	Robert Elliott <elliott@hpe.com>
Subject: Re: [PATCH] block: Fix S_DAX inode flag locking
Date: Thu, 19 May 2016 00:32:03 +1000	[thread overview]
Message-ID: <20160518143203.GZ26977@dastard> (raw)
In-Reply-To: <20160518082039.GB26315@quack2.suse.cz>

On Wed, May 18, 2016 at 10:20:39AM +0200, Jan Kara wrote:
> On Wed 18-05-16 08:58:42, Dave Chinner wrote:
> > On Tue, May 17, 2016 at 12:34:57PM -0700, Dan Williams wrote:
> > > On Tue, May 17, 2016 at 11:29 AM, Jon Derrick
> > > <jonathan.derrick@intel.com> wrote:
> > > > This patch fixes S_DAX bd_inode i_flag locking to conform to suggested
> > > 
> > > A "fix" implies that its currently broken.  I don't see how it is, not
> > > until we add an ioctl method or other path that also tries to update
> > > the flags outside of blkdev_get() context.  So, I don't think this
> > > patch stands on its own if you were intending it to be merged
> > > separately.
> > > 
> > > > locking rules. It is presumed that S_DAX is the only valid inode flag
> > > > for a block device which subscribes to direct-access, and will restore
> > > > any previously set flags if direct-access initialization fails.
> > > >
> > > > This reverts to i_flags behavior prior to
> > > > bbab37ddc20bae4709bca8745c128c4f46fe63c5
> > > > by allowing other bd_inode flags when DAX is disabled
> > > >
> > > > Signed-off-by: Jon Derrick <jonathan.derrick@intel.com>
> > > > ---
> > > >  fs/block_dev.c | 31 ++++++++++++++++++++++++++-----
> > > >  1 file changed, 26 insertions(+), 5 deletions(-)
> > > >
> > > > diff --git a/fs/block_dev.c b/fs/block_dev.c
> > > > index 20a2c02..d41e37f 100644
> > > > --- a/fs/block_dev.c
> > > > +++ b/fs/block_dev.c
> > > > @@ -1159,6 +1159,20 @@ void bd_set_size(struct block_device *bdev, loff_t size)
> > > >  }
> > > >  EXPORT_SYMBOL(bd_set_size);
> > > >
> > > > +static void bd_add_dax(struct inode *inode)
> > > > +{
> > > > +       inode_lock(inode);
> > > > +       inode->i_flags |= S_DAX;
> > > > +       inode_unlock(inode);
> > > > +}
> > > > +
> > > > +static void bd_clear_dax(struct inode *inode)
> > > > +{
> > > > +       inode_lock(inode);
> > > > +       inode->i_flags &= ~S_DAX;
> > > > +       inode_unlock(inode);
> > > > +}
> > > 
> > > Since this is inode generic should these helpers be prefixed "i_"
> > > rather than "bd_"?
> > 
> > Probably not, because in general filesystems are responsible for
> > updating i_flags to reflect on-disk inode configuration and that's
> > typically done under transaction contexts. e.g.  through ioctl
> > interfaces to set/clear flags that are stored on disk.  As such,
> > inode->i_flags is effectively protected by the filesystem specific
> > locking heirarchy, not the generic inode_lock().
> >
> > e.g. have a look at XFS storing a persistent "DAX-enabled" flag in
> > the inode, which can be set/cleared on individual inodes dynamically
> > by FS_IOC_FSSETXATTR. The XFS i_flags update function assumes
> > exclusive access to the field as it is called under locked
> > transaction context. Similar code exists in ext4, btrfs, gfs2,
> > etc....
> 
> So in case of ext4, we actually do use inode_lock() to protect against
> racing IOC_SETFLAGS calls. i_flags is a strange mix and there are a few
> (like S_DEAD or S_NOSEC) which are not persistent and those get set /
> cleared by VFS. Some of those places (e.g. the clearing in
> __vfs_setxattr_noperm()) are not really controlled by the filesystem
> AFAICT. So when XFS doesn't use inode_lock() to protect i_flags updates it
> can race with VFS on i_flags updates.

There are several other filesystems with the same problem. It's not
actually clear what they rules for i_flags are. The comment above
inode_set_flags() is ambiguous at best, and even points this out:

 * In the long run, i_mutex is overkill, and we should probably look
 * at using the i_lock spinlock to protect i_flags, and then make sure
 * it is so documented in include/linux/fs.h and that all code follows
 * the locking convention!!

Perhaps that should be done before the situation gets worse...

Cheers

Dave.
-- 
Dave Chinner
david@fromorbit.com

     prev parent reply	other threads:[~2016-05-18 14:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-17 18:29 [PATCH] Fix S_DAX setting in __blkdev_get Jon Derrick
2016-05-17 18:29 ` [PATCH] block: Fix S_DAX inode flag locking Jon Derrick
2016-05-17 19:34   ` Dan Williams
2016-05-17 20:38     ` Jon Derrick
2016-05-18  8:29       ` Jan Kara
2016-05-17 22:58     ` Dave Chinner
2016-05-18  8:20       ` Jan Kara
2016-05-18 14:32         ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160518143203.GZ26977@dastard \
    --to=david@fromorbit.com \
    --cc=axboe@fb.com \
    --cc=dan.j.williams@intel.com \
    --cc=elliott@hpe.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jmoyer@redhat.com \
    --cc=jonathan.derrick@intel.com \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=stephen.bates@microsemi.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).