All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-xfs@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 2/3] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
Date: Tue, 1 Aug 2017 10:30:32 +1000	[thread overview]
Message-ID: <20170801003032.GL17762@dastard> (raw)
In-Reply-To: <CAPcyv4hPr8b3UsshQ086hi7_KrE3-y0uGTpdi1eXc5R59mYmUw@mail.gmail.com>

On Mon, Jul 31, 2017 at 11:25:34AM -0700, Dan Williams wrote:
> On Mon, Jul 31, 2017 at 10:09 AM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> >> index 93e955262d07..c4fc79a0704f 100644
> >> --- a/fs/xfs/xfs_bmap_util.c
> >> +++ b/fs/xfs/xfs_bmap_util.c
> >> @@ -1387,6 +1387,92 @@ xfs_zero_file_space(
> >>
> >>  }
> >>
> >> +int
> >> +xfs_seal_file_space(
> >> +     struct xfs_inode        *ip,
> >> +     xfs_off_t               offset,
> >> +     xfs_off_t               len)
> >> +{
> >> +     struct inode            *inode = VFS_I(ip);
> >> +     struct address_space    *mapping = inode->i_mapping;
> >> +     int                     error = 0;
> >> +
> >> +     if (offset)
> >> +             return -EINVAL;
> >> +
> >> +     i_mmap_lock_read(mapping);
> >
> > (Are we allowed to take address_space->i_mmap_rwsem while holding
> > xfs_inode->i_mmaplock?)
> 
> Empirically, yes. Lockdep complains when those locks are taken in the
> reverse order.

My pet hate: people who rely on lockdep to tell them that locking is
wrong rather than understanding what the correct locking order is
before writing code.

> That seems to be inconsistent with the "mmap_sem -> i_mmap_lock ->
> page_lock" note in the xfs_ilock comment. Am I confusing what
> i_mmap_lock means in that comment, is that the i_mmap_lock_read(), or
> the i_mmaplock in the xfs_inode?

The latter. The lock orders you need to pay attention to are
documented in mm/filemap.c. (Which, incidentally, needs updating to
refer to i_rwsem, not i_mutex.)

 *  ->i_mutex
 *    ->i_mmap_rwsem            (truncate->unmap_mapping_range)

 *  ->mmap_sem                                                                   
 *    ->i_mmap_rwsem                                                             
 *      ->page_table_lock or pte_lock   (various, mainly in memory.c)            
 *        ->mapping->tree_lock  (arch-dependent flush_dcache_mmap_lock)

As it is, I think we shold not be taking internal mm/ state locks
deep inside filesystem code as it smells of layering violations. We
don't do this anywhere else for mapping checks - if we already hold
the XFS_MMAPLOCK_EXCL here, then we've already locked out page
faults from changing the state of the inode. In which case, we
should not need a mmap internal lock to be held here, same as all
the other filesystem operations that lock out page faults....

> >> +     xfs_ilock(ip, XFS_ILOCK_EXCL);
> >> +     if (len == 0) {
> >> +             /*
> >> +              * Clear the immutable flag provided there are no active
> >> +              * mappings. The active mapping check prevents an
> >> +              * application that is assuming a static block map, for
> >> +              * DAX or peer-to-peer DMA, from having this state
> >> +              * silently change behind its back.
> >> +              */
> >> +             if (RB_EMPTY_ROOT(&mapping->i_mmap))

			mapping_mapped(mapping)

> >> +                     inode->i_flags &= ~S_IOMAP_IMMUTABLE;
> >> +             else
> >> +                     error = -EBUSY;
> >> +     } else if (IS_IOMAP_IMMUTABLE(inode)) {
> >> +             if (len == i_size_read(inode)) {
> >> +                     /*
> >> +                      * The file is already in the correct state,
> >> +                      * bail out without error below.
> >> +                      */
> >> +                     len = 0;

> >> +             } else {
> >> +                     /* too late to allocate more space */
> >> +                     error = -ETXTBSY;
> >> +             }
> >> +     } else {
> >> +             if (len < i_size_read(inode)) {
> >> +                     /*
> >> +                      * Since S_IOMAP_IMMUTABLE is inode global it
> >> +                      * does not make sense to fallocate(immutable)
> >> +                      * on a sub-range of the file.
> >> +                      */
> >> +                     error = -EINVAL;
> >> +             } else if (!RB_EMPTY_ROOT(&mapping->i_mmap)) {
> >> +                     /*
> >> +                      * It's not strictly required to prevent setting
> >> +                      * immutable while a file is already mapped, but
> >> +                      * we do it for simplicity and symmetry with the
> >> +                      * S_IOMAP_IMMUTABLE disable case.
> >> +                      */
> >> +                     error = -EBUSY;
> >> +             } else
> >> +                     inode->i_flags |= S_IOMAP_IMMUTABLE;
> >> +     }
> >> +     xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >> +     i_mmap_unlock_read(mapping);
> >> +
> >> +     if (error || len == 0)
> >> +             return error;

I have to say, I find this checking to be fairly grotty. The "len ==
0" API to remove the immutable flag is a gross hack.  IMO, it's
better to add a separate fallocate command to "unseal" the extent
map, and let that happen according to whether the file is mapped or
not.  Perhaps it would be better to start with a man page
documenting the desired API?

FWIW, the if/else if/else structure could be cleaned up with a
simple "goto out_unlock" construct such as:

	/* don't make immutable if inode is currently mapped */
	error = -EBUSY;
	if (mapping_mapped(mapping))
		goto out_unlock;

	/* can't do anything if inode is already immutable */
	error = -ETXTBSY;
	if (IS_IMMUTABLE(inode) || IS_IOMAP_IMMUTABLE(inode))
		goto out_unlock;

	/* XFS only supports whole file extent immutability */
	error = -EINVAL;
	if (len != i_size_read(inode))
		goto out_unlock;

	/* all good to go */
	error = 0;

out_unlock:
	xfs_iunlock(ip, XFS_ILOCK_EXCL);
	i_mmap_unlock_read(mapping);

	if (error)
	     return error;

	/* now unshare, allocate and add immutable flag */

Cheers,

Dave.


-- 
Dave Chinner
david@fromorbit.com
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>,
	Jan Kara <jack@suse.cz>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	linux-xfs@vger.kernel.org, Jeff Moyer <jmoyer@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andy Lutomirski <luto@kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH 2/3] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP
Date: Tue, 1 Aug 2017 10:30:32 +1000	[thread overview]
Message-ID: <20170801003032.GL17762@dastard> (raw)
In-Reply-To: <CAPcyv4hPr8b3UsshQ086hi7_KrE3-y0uGTpdi1eXc5R59mYmUw@mail.gmail.com>

On Mon, Jul 31, 2017 at 11:25:34AM -0700, Dan Williams wrote:
> On Mon, Jul 31, 2017 at 10:09 AM, Darrick J. Wong
> <darrick.wong@oracle.com> wrote:
> >> index 93e955262d07..c4fc79a0704f 100644
> >> --- a/fs/xfs/xfs_bmap_util.c
> >> +++ b/fs/xfs/xfs_bmap_util.c
> >> @@ -1387,6 +1387,92 @@ xfs_zero_file_space(
> >>
> >>  }
> >>
> >> +int
> >> +xfs_seal_file_space(
> >> +     struct xfs_inode        *ip,
> >> +     xfs_off_t               offset,
> >> +     xfs_off_t               len)
> >> +{
> >> +     struct inode            *inode = VFS_I(ip);
> >> +     struct address_space    *mapping = inode->i_mapping;
> >> +     int                     error = 0;
> >> +
> >> +     if (offset)
> >> +             return -EINVAL;
> >> +
> >> +     i_mmap_lock_read(mapping);
> >
> > (Are we allowed to take address_space->i_mmap_rwsem while holding
> > xfs_inode->i_mmaplock?)
> 
> Empirically, yes. Lockdep complains when those locks are taken in the
> reverse order.

My pet hate: people who rely on lockdep to tell them that locking is
wrong rather than understanding what the correct locking order is
before writing code.

> That seems to be inconsistent with the "mmap_sem -> i_mmap_lock ->
> page_lock" note in the xfs_ilock comment. Am I confusing what
> i_mmap_lock means in that comment, is that the i_mmap_lock_read(), or
> the i_mmaplock in the xfs_inode?

The latter. The lock orders you need to pay attention to are
documented in mm/filemap.c. (Which, incidentally, needs updating to
refer to i_rwsem, not i_mutex.)

 *  ->i_mutex
 *    ->i_mmap_rwsem            (truncate->unmap_mapping_range)

 *  ->mmap_sem                                                                   
 *    ->i_mmap_rwsem                                                             
 *      ->page_table_lock or pte_lock   (various, mainly in memory.c)            
 *        ->mapping->tree_lock  (arch-dependent flush_dcache_mmap_lock)

As it is, I think we shold not be taking internal mm/ state locks
deep inside filesystem code as it smells of layering violations. We
don't do this anywhere else for mapping checks - if we already hold
the XFS_MMAPLOCK_EXCL here, then we've already locked out page
faults from changing the state of the inode. In which case, we
should not need a mmap internal lock to be held here, same as all
the other filesystem operations that lock out page faults....

> >> +     xfs_ilock(ip, XFS_ILOCK_EXCL);
> >> +     if (len == 0) {
> >> +             /*
> >> +              * Clear the immutable flag provided there are no active
> >> +              * mappings. The active mapping check prevents an
> >> +              * application that is assuming a static block map, for
> >> +              * DAX or peer-to-peer DMA, from having this state
> >> +              * silently change behind its back.
> >> +              */
> >> +             if (RB_EMPTY_ROOT(&mapping->i_mmap))

			mapping_mapped(mapping)

> >> +                     inode->i_flags &= ~S_IOMAP_IMMUTABLE;
> >> +             else
> >> +                     error = -EBUSY;
> >> +     } else if (IS_IOMAP_IMMUTABLE(inode)) {
> >> +             if (len == i_size_read(inode)) {
> >> +                     /*
> >> +                      * The file is already in the correct state,
> >> +                      * bail out without error below.
> >> +                      */
> >> +                     len = 0;

> >> +             } else {
> >> +                     /* too late to allocate more space */
> >> +                     error = -ETXTBSY;
> >> +             }
> >> +     } else {
> >> +             if (len < i_size_read(inode)) {
> >> +                     /*
> >> +                      * Since S_IOMAP_IMMUTABLE is inode global it
> >> +                      * does not make sense to fallocate(immutable)
> >> +                      * on a sub-range of the file.
> >> +                      */
> >> +                     error = -EINVAL;
> >> +             } else if (!RB_EMPTY_ROOT(&mapping->i_mmap)) {
> >> +                     /*
> >> +                      * It's not strictly required to prevent setting
> >> +                      * immutable while a file is already mapped, but
> >> +                      * we do it for simplicity and symmetry with the
> >> +                      * S_IOMAP_IMMUTABLE disable case.
> >> +                      */
> >> +                     error = -EBUSY;
> >> +             } else
> >> +                     inode->i_flags |= S_IOMAP_IMMUTABLE;
> >> +     }
> >> +     xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >> +     i_mmap_unlock_read(mapping);
> >> +
> >> +     if (error || len == 0)
> >> +             return error;

I have to say, I find this checking to be fairly grotty. The "len ==
0" API to remove the immutable flag is a gross hack.  IMO, it's
better to add a separate fallocate command to "unseal" the extent
map, and let that happen according to whether the file is mapped or
not.  Perhaps it would be better to start with a man page
documenting the desired API?

FWIW, the if/else if/else structure could be cleaned up with a
simple "goto out_unlock" construct such as:

	/* don't make immutable if inode is currently mapped */
	error = -EBUSY;
	if (mapping_mapped(mapping))
		goto out_unlock;

	/* can't do anything if inode is already immutable */
	error = -ETXTBSY;
	if (IS_IMMUTABLE(inode) || IS_IOMAP_IMMUTABLE(inode))
		goto out_unlock;

	/* XFS only supports whole file extent immutability */
	error = -EINVAL;
	if (len != i_size_read(inode))
		goto out_unlock;

	/* all good to go */
	error = 0;

out_unlock:
	xfs_iunlock(ip, XFS_ILOCK_EXCL);
	i_mmap_unlock_read(mapping);

	if (error)
	     return error;

	/* now unshare, allocate and add immutable flag */

Cheers,

Dave.


-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2017-08-01  0:37 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-29 19:43 [PATCH 0/3] fs, xfs: block map immutable files for dax, dma-to-storage, and swap Dan Williams
2017-07-29 19:43 ` Dan Williams
2017-07-29 19:43 ` [PATCH 1/3] fs, xfs: introduce S_IOMAP_IMMUTABLE Dan Williams
2017-07-29 19:43   ` Dan Williams
2017-07-31 16:02   ` Colin Walters
2017-07-31 16:02     ` Colin Walters
2017-07-31 16:29     ` Dan Williams
2017-07-31 16:29       ` Dan Williams
2017-07-31 16:32       ` Colin Walters
2017-07-31 16:32         ` Colin Walters
2017-07-31 17:42         ` Colin Walters
2017-07-31 17:42           ` Colin Walters
2017-07-31 18:23           ` Darrick J. Wong
2017-07-31 18:23             ` Darrick J. Wong
2017-08-01  2:15             ` Colin Walters
2017-08-01  2:15               ` Colin Walters
2017-08-01  2:42               ` Dave Chinner
2017-08-01  2:42                 ` Dave Chinner
2017-08-05  9:45                 ` Christoph Hellwig
2017-08-05  9:45                   ` Christoph Hellwig
2017-07-31 16:46   ` Darrick J. Wong
2017-07-31 16:46     ` Darrick J. Wong
2017-07-31 17:32     ` Dan Williams
2017-07-31 17:32       ` Dan Williams
2017-07-29 19:43 ` [PATCH 2/3] fs, xfs: introduce FALLOC_FL_SEAL_BLOCK_MAP Dan Williams
2017-07-29 19:43   ` Dan Williams
2017-07-29 19:43   ` Dan Williams
2017-07-31 17:09   ` Darrick J. Wong
2017-07-31 17:09     ` Darrick J. Wong
2017-07-31 18:25     ` Dan Williams
2017-07-31 18:25       ` Dan Williams
2017-08-01  0:30       ` Dave Chinner [this message]
2017-08-01  0:30         ` Dave Chinner
2017-07-29 19:43 ` [PATCH 3/3] xfs: persist S_IOMAP_IMMUTABLE in di_flags2 Dan Williams
2017-07-29 19:43   ` Dan Williams
2017-07-31 17:15   ` Darrick J. Wong
2017-07-31 17:15     ` Darrick J. Wong
2017-08-01  0:42   ` Dave Chinner
2017-08-01  0:42     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170801003032.GL17762@dastard \
    --to=david@fromorbit.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.