All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jeff Layton <jlayton@poochiereds.net>, Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-xfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 7/7] xfs: re-enable XFS per-inode DAX
Date: Tue, 26 Sep 2017 10:31:14 +1000	[thread overview]
Message-ID: <20170926003114.GN10955@dastard> (raw)
In-Reply-To: <20170925231404.32723-8-ross.zwisler@linux.intel.com>

On Mon, Sep 25, 2017 at 05:14:04PM -0600, Ross Zwisler wrote:
> Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when
> any mappings are present.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  fs/xfs/xfs_ioctl.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 386b437..7a24dd5 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1012,12 +1012,10 @@ xfs_diflags_to_linux(
>  		inode->i_flags |= S_NOATIME;
>  	else
>  		inode->i_flags &= ~S_NOATIME;
> -#if 0	/* disabled until the flag switching races are sorted out */
>  	if ((xflags & FS_XFLAG_DAX) || (ip->i_mount->m_flags & XFS_MOUNT_DAX))
>  		inode->i_flags |= S_DAX;
>  	else
>  		inode->i_flags &= ~S_DAX;
> -#endif
>  }
>  
>  static bool
> @@ -1049,6 +1047,8 @@ xfs_ioctl_setattr_xflags(
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	uint64_t		di_flags2;
> +	struct address_space	*mapping = VFS_I(ip)->i_mapping;
> +	bool			dax_changing;
>  
>  	/* Can't change realtime flag if any extents are allocated. */
>  	if ((ip->i_d.di_nextents || ip->i_delayed_blks) &&
> @@ -1084,10 +1084,23 @@ xfs_ioctl_setattr_xflags(
>  	if (di_flags2 && ip->i_d.di_version < 3)
>  		return -EINVAL;
>  
> +	dax_changing = xfs_is_dax_state_changing(fa->fsx_xflags, ip);
> +	if (dax_changing) {
> +		i_mmap_lock_read(mapping);
> +		if (mapping_mapped(mapping)) {
> +			i_mmap_unlock_read(mapping);
> +			return -EBUSY;
> +		}
> +	}
> +
>  	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  	ip->i_d.di_flags2 = di_flags2;
>  
>  	xfs_diflags_to_linux(ip);
> +
> +	if (dax_changing)
> +		i_mmap_unlock_read(mapping);

Is this safe to be taking here under the ILOCK_EXCL? i.e. this is
the lock order here:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> ILOCK_EXCL -> i_mmap_rwsem

The truncate path must run outside the ILOCK
context, and it does this order via unmap_mapping_range:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> i_mmap_rwsem

On a page fault, we do:

	mmap_sem -> MMAPLOCK_EXCL -> page lock -> ILOCK_EXCL

Which gives the order

IOLOCK_EXCL
  -> mmap_sem
     -> MMAPLOCK_EXCL
	-> page lock
	   -> ILOCK_EXCL
	      -> i_mmap_rwsem

What I'm not clear on is what the orders between page locks and
pte locks and i_mapping_tree_lock and i_mmap_rwsem. If there's any
locks that the filesystem can take above the ILOCK that are also
taken under the i_mmap_rwsem, then we have a deadlock vector.

Historically we've avoided any mm/ level interactions under the
ILOCK_EXCL because of it's location in the page fault path locking
order (e.g. lockdep will go nuts if we take a page fault with the
ILOCK held). Hence I'm extremely wary of putting any other mm/ level
locks under the ILOCK like this without a clear explanation of the
locking orders and why it won't deadlock....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@poochiereds.net>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 7/7] xfs: re-enable XFS per-inode DAX
Date: Tue, 26 Sep 2017 10:31:14 +1000	[thread overview]
Message-ID: <20170926003114.GN10955@dastard> (raw)
In-Reply-To: <20170925231404.32723-8-ross.zwisler@linux.intel.com>

On Mon, Sep 25, 2017 at 05:14:04PM -0600, Ross Zwisler wrote:
> Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when
> any mappings are present.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  fs/xfs/xfs_ioctl.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 386b437..7a24dd5 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1012,12 +1012,10 @@ xfs_diflags_to_linux(
>  		inode->i_flags |= S_NOATIME;
>  	else
>  		inode->i_flags &= ~S_NOATIME;
> -#if 0	/* disabled until the flag switching races are sorted out */
>  	if ((xflags & FS_XFLAG_DAX) || (ip->i_mount->m_flags & XFS_MOUNT_DAX))
>  		inode->i_flags |= S_DAX;
>  	else
>  		inode->i_flags &= ~S_DAX;
> -#endif
>  }
>  
>  static bool
> @@ -1049,6 +1047,8 @@ xfs_ioctl_setattr_xflags(
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	uint64_t		di_flags2;
> +	struct address_space	*mapping = VFS_I(ip)->i_mapping;
> +	bool			dax_changing;
>  
>  	/* Can't change realtime flag if any extents are allocated. */
>  	if ((ip->i_d.di_nextents || ip->i_delayed_blks) &&
> @@ -1084,10 +1084,23 @@ xfs_ioctl_setattr_xflags(
>  	if (di_flags2 && ip->i_d.di_version < 3)
>  		return -EINVAL;
>  
> +	dax_changing = xfs_is_dax_state_changing(fa->fsx_xflags, ip);
> +	if (dax_changing) {
> +		i_mmap_lock_read(mapping);
> +		if (mapping_mapped(mapping)) {
> +			i_mmap_unlock_read(mapping);
> +			return -EBUSY;
> +		}
> +	}
> +
>  	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  	ip->i_d.di_flags2 = di_flags2;
>  
>  	xfs_diflags_to_linux(ip);
> +
> +	if (dax_changing)
> +		i_mmap_unlock_read(mapping);

Is this safe to be taking here under the ILOCK_EXCL? i.e. this is
the lock order here:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> ILOCK_EXCL -> i_mmap_rwsem

The truncate path must run outside the ILOCK
context, and it does this order via unmap_mapping_range:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> i_mmap_rwsem

On a page fault, we do:

	mmap_sem -> MMAPLOCK_EXCL -> page lock -> ILOCK_EXCL

Which gives the order

IOLOCK_EXCL
  -> mmap_sem
     -> MMAPLOCK_EXCL
	-> page lock
	   -> ILOCK_EXCL
	      -> i_mmap_rwsem

What I'm not clear on is what the orders between page locks and
pte locks and i_mapping_tree_lock and i_mmap_rwsem. If there's any
locks that the filesystem can take above the ILOCK that are also
taken under the i_mmap_rwsem, then we have a deadlock vector.

Historically we've avoided any mm/ level interactions under the
ILOCK_EXCL because of it's location in the page fault path locking
order (e.g. lockdep will go nuts if we take a page fault with the
ILOCK held). Hence I'm extremely wary of putting any other mm/ level
locks under the ILOCK like this without a clear explanation of the
locking orders and why it won't deadlock....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	Christoph Hellwig <hch@lst.de>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@poochiereds.net>,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 7/7] xfs: re-enable XFS per-inode DAX
Date: Tue, 26 Sep 2017 10:31:14 +1000	[thread overview]
Message-ID: <20170926003114.GN10955@dastard> (raw)
In-Reply-To: <20170925231404.32723-8-ross.zwisler@linux.intel.com>

On Mon, Sep 25, 2017 at 05:14:04PM -0600, Ross Zwisler wrote:
> Re-enable the XFS per-inode DAX flag, preventing S_DAX from changing when
> any mappings are present.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> ---
>  fs/xfs/xfs_ioctl.c | 17 +++++++++++++++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 386b437..7a24dd5 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -1012,12 +1012,10 @@ xfs_diflags_to_linux(
>  		inode->i_flags |= S_NOATIME;
>  	else
>  		inode->i_flags &= ~S_NOATIME;
> -#if 0	/* disabled until the flag switching races are sorted out */
>  	if ((xflags & FS_XFLAG_DAX) || (ip->i_mount->m_flags & XFS_MOUNT_DAX))
>  		inode->i_flags |= S_DAX;
>  	else
>  		inode->i_flags &= ~S_DAX;
> -#endif
>  }
>  
>  static bool
> @@ -1049,6 +1047,8 @@ xfs_ioctl_setattr_xflags(
>  {
>  	struct xfs_mount	*mp = ip->i_mount;
>  	uint64_t		di_flags2;
> +	struct address_space	*mapping = VFS_I(ip)->i_mapping;
> +	bool			dax_changing;
>  
>  	/* Can't change realtime flag if any extents are allocated. */
>  	if ((ip->i_d.di_nextents || ip->i_delayed_blks) &&
> @@ -1084,10 +1084,23 @@ xfs_ioctl_setattr_xflags(
>  	if (di_flags2 && ip->i_d.di_version < 3)
>  		return -EINVAL;
>  
> +	dax_changing = xfs_is_dax_state_changing(fa->fsx_xflags, ip);
> +	if (dax_changing) {
> +		i_mmap_lock_read(mapping);
> +		if (mapping_mapped(mapping)) {
> +			i_mmap_unlock_read(mapping);
> +			return -EBUSY;
> +		}
> +	}
> +
>  	ip->i_d.di_flags = xfs_flags2diflags(ip, fa->fsx_xflags);
>  	ip->i_d.di_flags2 = di_flags2;
>  
>  	xfs_diflags_to_linux(ip);
> +
> +	if (dax_changing)
> +		i_mmap_unlock_read(mapping);

Is this safe to be taking here under the ILOCK_EXCL? i.e. this is
the lock order here:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> ILOCK_EXCL -> i_mmap_rwsem

The truncate path must run outside the ILOCK
context, and it does this order via unmap_mapping_range:

IOLOCK_EXCL -> MMAPLOCK_EXCL -> i_mmap_rwsem

On a page fault, we do:

	mmap_sem -> MMAPLOCK_EXCL -> page lock -> ILOCK_EXCL

Which gives the order

IOLOCK_EXCL
  -> mmap_sem
     -> MMAPLOCK_EXCL
	-> page lock
	   -> ILOCK_EXCL
	      -> i_mmap_rwsem

What I'm not clear on is what the orders between page locks and
pte locks and i_mapping_tree_lock and i_mmap_rwsem. If there's any
locks that the filesystem can take above the ILOCK that are also
taken under the i_mmap_rwsem, then we have a deadlock vector.

Historically we've avoided any mm/ level interactions under the
ILOCK_EXCL because of it's location in the page fault path locking
order (e.g. lockdep will go nuts if we take a page fault with the
ILOCK held). Hence I'm extremely wary of putting any other mm/ level
locks under the ILOCK like this without a clear explanation of the
locking orders and why it won't deadlock....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-09-26  0:28 UTC|newest]

Thread overview: 130+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-25 23:13 [PATCH 0/7] re-enable XFS per-inode DAX Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-25 23:13 ` Ross Zwisler
2017-09-25 23:13 ` [PATCH 1/7] xfs: always use DAX if mount option is used Ross Zwisler
2017-09-25 23:13   ` Ross Zwisler
2017-09-25 23:13   ` Ross Zwisler
2017-09-25 23:38   ` Dave Chinner
2017-09-25 23:38     ` Dave Chinner
2017-09-25 23:38     ` Dave Chinner
2017-09-26  9:35     ` Jan Kara
2017-09-26  9:35       ` Jan Kara
2017-09-26  9:35       ` Jan Kara
2017-09-26 11:09       ` Dave Chinner
2017-09-26 11:09         ` Dave Chinner
2017-09-26 11:09         ` Dave Chinner
2017-09-26 14:37         ` Christoph Hellwig
2017-09-26 14:37           ` Christoph Hellwig
2017-09-26 17:30           ` Ross Zwisler
2017-09-26 17:30             ` Ross Zwisler
2017-09-26 17:30             ` Ross Zwisler
2017-09-26 19:48             ` Darrick J. Wong
2017-09-26 19:48               ` Darrick J. Wong
2017-09-26 22:00               ` Dave Chinner
2017-09-26 22:00                 ` Dave Chinner
2017-09-26 22:00                 ` Dave Chinner
2017-09-27  6:40             ` Christoph Hellwig
2017-09-27  6:40               ` Christoph Hellwig
2017-09-27  6:40               ` Christoph Hellwig
2017-09-27 16:15               ` Ross Zwisler
2017-09-27 16:15                 ` Ross Zwisler
2017-10-01  8:17                 ` Christoph Hellwig
2017-10-01  8:17                   ` Christoph Hellwig
2017-10-01  8:17                   ` Christoph Hellwig
2017-09-26 18:02         ` Eric Sandeen
2017-09-26 18:02           ` Eric Sandeen
2017-09-26 18:02           ` Eric Sandeen
2017-09-26 18:50     ` Ross Zwisler
2017-09-26 18:50       ` Ross Zwisler
2017-09-26 18:50       ` Ross Zwisler
2017-09-25 23:13 ` [PATCH 2/7] xfs: validate bdev support for DAX inode flag Ross Zwisler
2017-09-25 23:13   ` Ross Zwisler
2017-09-25 23:13   ` Ross Zwisler
2017-09-26  6:36   ` Christoph Hellwig
2017-09-26  6:36     ` Christoph Hellwig
2017-09-26  6:36     ` Christoph Hellwig
2017-09-26 17:16     ` Ross Zwisler
2017-09-26 17:16       ` Ross Zwisler
2017-09-26 17:16       ` Ross Zwisler
2017-09-26 17:57       ` Darrick J. Wong
2017-09-26 17:57         ` Darrick J. Wong
2017-09-25 23:14 ` [PATCH 3/7] xfs: protect S_DAX transitions in XFS read path Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:27   ` Dave Chinner
2017-09-25 23:27     ` Dave Chinner
2017-09-25 23:27     ` Dave Chinner
2017-09-26  6:32   ` Christoph Hellwig
2017-09-26  6:32     ` Christoph Hellwig
2017-09-26  6:32     ` Christoph Hellwig
2017-09-26 13:59     ` Dan Williams
2017-09-26 13:59       ` Dan Williams
2017-09-26 13:59       ` Dan Williams
2017-09-26 14:33       ` Christoph Hellwig
2017-09-26 14:33         ` Christoph Hellwig
2017-09-26 14:33         ` Christoph Hellwig
2017-09-26 18:11         ` Dan Williams
2017-09-26 18:11           ` Dan Williams
2017-10-01  8:17           ` Christoph Hellwig
2017-10-01  8:17             ` Christoph Hellwig
2017-10-01  8:17             ` Christoph Hellwig
2017-09-25 23:14 ` [PATCH 4/7] xfs: protect S_DAX transitions in XFS write path Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:29   ` Dave Chinner
2017-09-25 23:29     ` Dave Chinner
2017-09-25 23:29     ` Dave Chinner
2017-09-25 23:14 ` [PATCH 5/7] xfs: introduce xfs_is_dax_state_changing Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-26  6:33   ` Christoph Hellwig
2017-09-26  6:33     ` Christoph Hellwig
2017-09-26  6:33     ` Christoph Hellwig
2017-09-25 23:14 ` [PATCH 6/7] mm, fs: introduce file_operations->post_mmap() Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:38   ` Dan Williams
2017-09-25 23:38     ` Dan Williams
2017-09-26 18:57     ` Ross Zwisler
2017-09-26 18:57       ` Ross Zwisler
2017-09-26 18:57       ` Ross Zwisler
2017-09-26 19:19       ` Dan Williams
2017-09-26 19:19         ` Dan Williams
2017-09-26 19:19         ` Dan Williams
2017-09-26 21:06         ` Ross Zwisler
2017-09-26 21:06           ` Ross Zwisler
2017-09-26 21:06           ` Ross Zwisler
2017-09-26 21:41           ` Dan Williams
2017-09-26 21:41             ` Dan Williams
2017-09-26 21:41             ` Dan Williams
2017-09-27 11:35             ` Jan Kara
2017-09-27 11:35               ` Jan Kara
2017-09-27 11:35               ` Jan Kara
2017-09-27 14:00               ` Dan Williams
2017-09-27 14:00                 ` Dan Williams
2017-09-27 14:00                 ` Dan Williams
2017-09-27 15:07                 ` Jan Kara
2017-09-27 15:07                   ` Jan Kara
2017-09-27 15:07                   ` Jan Kara
2017-09-27 15:36                   ` Dan Williams
2017-09-27 15:36                     ` Dan Williams
2017-09-27 15:39               ` Ross Zwisler
2017-09-27 15:39                 ` Ross Zwisler
2017-09-27 15:39                 ` Ross Zwisler
2017-09-27 15:54                 ` Dan Williams
2017-09-27 15:54                   ` Dan Williams
2017-09-27 15:54                   ` Dan Williams
2017-09-26  6:34   ` Christoph Hellwig
2017-09-26  6:34     ` Christoph Hellwig
2017-09-26  6:34     ` Christoph Hellwig
2017-09-25 23:14 ` [PATCH 7/7] xfs: re-enable XFS per-inode DAX Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-25 23:14   ` Ross Zwisler
2017-09-26  0:31   ` Dave Chinner [this message]
2017-09-26  0:31     ` Dave Chinner
2017-09-26  0:31     ` Dave Chinner
2017-09-26  6:36   ` Christoph Hellwig
2017-09-26  6:36     ` Christoph Hellwig
2017-09-26 19:01     ` Ross Zwisler
2017-09-26 19:01       ` Ross Zwisler
2017-09-26 19:01       ` Ross Zwisler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170926003114.GN10955@dastard \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jlayton@poochiereds.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ross.zwisler@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.