linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-nvdimm@lists.01.org, linux-xfs@vger.kernel.org,
	Jan Kara <jack@suse.cz>, Arnd Bergmann <arnd@arndb.de>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	linux-rdma@vger.kernel.org, linux-api@vger.kernel.org,
	Christoph Hellwig <hch@lst.de>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	linux-mm@kvack.org, Jeff Moyer <jmoyer@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	Jeff Layton <jlayton@poochiereds.net>,
	Ross Zwisler <ross.zwisler@linux.intel.com>
Subject: Re: [PATCH v7 06/12] xfs: wire up MAP_DIRECT
Date: Mon, 9 Oct 2017 14:40:30 +1100	[thread overview]
Message-ID: <20171009034030.GH3666@dastard> (raw)
In-Reply-To: <150732934955.22363.14950885120988262779.stgit@dwillia2-desk3.amr.corp.intel.com>

On Fri, Oct 06, 2017 at 03:35:49PM -0700, Dan Williams wrote:
> MAP_DIRECT is an mmap(2) flag with the following semantics:
> 
>   MAP_DIRECT
>   When specified with MAP_SHARED_VALIDATE, sets up a file lease with the
>   same lifetime as the mapping. Unlike a typical F_RDLCK lease this lease
>   is broken when a "lease breaker" attempts to write(2), change the block
>   map (fallocate), or change the size of the file. Otherwise the mechanism
>   of a lease break is identical to the typical lease break case where the
>   lease needs to be removed (munmap) within the number of seconds
>   specified by /proc/sys/fs/lease-break-time. If the lease holder fails to
>   remove the lease in time the kernel will invalidate the mapping and
>   force all future accesses to the mapping to trigger SIGBUS.
> 
>   In addition to lease break timeouts causing faults in the mapping to
>   result in SIGBUS, other states of the file will trigger SIGBUS at fault
>   time:
> 
>       * The file is not DAX capable
>       * The file has reflinked (copy-on-write) blocks
>       * The fault would trigger the filesystem to allocate blocks
>       * The fault would trigger the filesystem to perform extent conversion
> 
>   In other words, MAP_DIRECT expects and enforces a fully allocated file
>   where faults can be satisfied without modifying block map metadata.
> 
>   An unprivileged process may establish a MAP_DIRECT mapping on a file
>   whose UID (owner) matches the filesystem UID of the  process. A process
>   with the CAP_LEASE capability may establish a MAP_DIRECT mapping on
>   arbitrary files
> 
>   ERRORS
>   EACCES Beyond the typical mmap(2) conditions that trigger EACCES
>   MAP_DIRECT also requires the permission to set a file lease.
> 
>   EOPNOTSUPP The filesystem explicitly does not support the flag
> 
>   SIGBUS Attempted to write a MAP_DIRECT mapping at a file offset that
>          might require block-map updates, or the lease timed out and the
>          kernel invalidated the mapping.
> 
> Cc: Jan Kara <jack@suse.cz>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Cc: Jeff Moyer <jmoyer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dave Chinner <david@fromorbit.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Jeff Layton <jlayton@poochiereds.net>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  fs/xfs/Kconfig                  |    2 -
>  fs/xfs/xfs_file.c               |  102 +++++++++++++++++++++++++++++++++++++++
>  include/linux/mman.h            |    3 +
>  include/uapi/asm-generic/mman.h |    1 
>  4 files changed, 106 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
> index f62fc6629abb..f8765653a438 100644
> --- a/fs/xfs/Kconfig
> +++ b/fs/xfs/Kconfig
> @@ -112,4 +112,4 @@ config XFS_ASSERT_FATAL
>  
>  config XFS_LAYOUT
>  	def_bool y
> -	depends on EXPORTFS_BLOCK_OPS
> +	depends on EXPORTFS_BLOCK_OPS || FS_DAX
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index ebdd0bd2b261..e35518600e28 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -40,12 +40,22 @@
>  #include "xfs_iomap.h"
>  #include "xfs_reflink.h"
>  
> +#include <linux/mman.h>
>  #include <linux/dcache.h>
>  #include <linux/falloc.h>
>  #include <linux/pagevec.h>
> +#include <linux/mapdirect.h>
>  #include <linux/backing-dev.h>
>  
>  static const struct vm_operations_struct xfs_file_vm_ops;
> +static const struct vm_operations_struct xfs_file_vm_direct_ops;
> +
> +static inline bool
> +is_xfs_map_direct(
> +		struct vm_area_struct *vma)
> +{
> +	return vma->vm_ops == &xfs_file_vm_direct_ops;
> +}

Namespacing (xfs_vma_is_direct) and whitespace damage.

>  
>  /*
>   * Clear the specified ranges to zero through either the pagecache or DAX.
> @@ -1008,6 +1018,26 @@ xfs_file_llseek(
>  	return vfs_setpos(file, offset, inode->i_sb->s_maxbytes);
>  }
>  
> +static int
> +xfs_vma_checks(
> +	struct vm_area_struct	*vma,
> +	struct inode		*inode)

Exactly what are we checking for - function name doesn't tell me,
and there's no comments, either?

> +{
> +	if (!is_xfs_map_direct(vma))
> +		return 0;
> +
> +	if (!is_map_direct_valid(vma->vm_private_data))
> +		return VM_FAULT_SIGBUS;
> +
> +	if (xfs_is_reflink_inode(XFS_I(inode)))
> +		return VM_FAULT_SIGBUS;
> +
> +	if (!IS_DAX(inode))
> +		return VM_FAULT_SIGBUS;

And how do we get is_xfs_map_direct() set to true if we don't have a
DAX inode or the inode has shared extents?

> +
> +	return 0;
> +}
> +
>  /*
>   * Locking for serialisation of IO during page faults. This results in a lock
>   * ordering of:
> @@ -1024,6 +1054,7 @@ __xfs_filemap_fault(
>  	enum page_entry_size	pe_size,
>  	bool			write_fault)
>  {
> +	struct vm_area_struct	*vma = vmf->vma;
>  	struct inode		*inode = file_inode(vmf->vma->vm_file);

You missed this vmf->vma....

.....
>  
> +#define XFS_MAP_SUPPORTED (LEGACY_MAP_MASK | MAP_DIRECT)
> +
> +STATIC int
> +xfs_file_mmap_validate(
> +	struct file		*filp,
> +	struct vm_area_struct	*vma,
> +	unsigned long		map_flags,
> +	int			fd)
> +{
> +	struct inode		*inode = file_inode(filp);
> +	struct xfs_inode	*ip = XFS_I(inode);
> +	struct map_direct_state	*mds;
> +
> +	if (map_flags & ~(XFS_MAP_SUPPORTED))
> +		return -EOPNOTSUPP;
> +
> +	if ((map_flags & MAP_DIRECT) == 0)
> +		return xfs_file_mmap(filp, vma);
> +
> +	file_accessed(filp);
> +	vma->vm_ops = &xfs_file_vm_direct_ops;
> +	if (IS_DAX(inode))
> +		vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;

And if it isn't a DAX inode? what is MAP_DIRECT supposed to do then?

> +	mds = map_direct_register(fd, vma);
> +	if (IS_ERR(mds))
> +		return PTR_ERR(mds);
> +
> +	/* flush in-flight faults */
> +	xfs_ilock(ip, XFS_MMAPLOCK_EXCL);
> +	xfs_iunlock(ip, XFS_MMAPLOCK_EXCL);

Urk. That's nasty. And why is it even necessary? Please explain why
this is necessary in the comment, because it's not at all obvious to
me...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-10-09  3:40 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-06 22:35 [PATCH v7 00/12] MAP_DIRECT for DAX RDMA and userspace flush Dan Williams
2017-10-06 22:35 ` [PATCH v7 01/12] mm: introduce MAP_SHARED_VALIDATE, a mechanism to safely define new mmap flags Dan Williams
2017-10-06 22:35 ` [PATCH v7 02/12] fs, mm: pass fd to ->mmap_validate() Dan Williams
2017-10-06 22:35 ` [PATCH v7 03/12] fs: introduce i_mapdcount Dan Williams
2017-10-09  3:08   ` Dave Chinner
2017-10-06 22:35 ` [PATCH v7 04/12] fs: MAP_DIRECT core Dan Williams
2017-10-06 22:35 ` [PATCH v7 05/12] xfs: prepare xfs_break_layouts() for reuse with MAP_DIRECT Dan Williams
2017-10-06 22:35 ` [PATCH v7 06/12] xfs: wire up MAP_DIRECT Dan Williams
2017-10-09  3:40   ` Dave Chinner [this message]
2017-10-09 17:08     ` Dan Williams
2017-10-09 22:50       ` Dave Chinner
2017-10-06 22:35 ` [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Dan Williams
2017-10-06 22:45   ` David Woodhouse
2017-10-06 22:52     ` Dan Williams
2017-10-06 23:10       ` David Woodhouse
2017-10-06 23:15         ` Dan Williams
2017-10-07 11:08           ` David Woodhouse
2017-10-07 23:33             ` Dan Williams
2017-10-06 23:12       ` Dan Williams
2017-10-08  3:45   ` [PATCH v8] dma-mapping: introduce dma_get_iommu_domain() Dan Williams
2017-10-09 10:37     ` Robin Murphy
2017-10-09 17:32       ` Dan Williams
2017-10-10 14:40     ` Raj, Ashok
2017-10-09 18:58   ` [PATCH v7 07/12] dma-mapping: introduce dma_has_iommu() Jason Gunthorpe
2017-10-09 19:05     ` Dan Williams
2017-10-09 19:18       ` Jason Gunthorpe
2017-10-09 19:28         ` Dan Williams
2017-10-10 17:25           ` Jason Gunthorpe
2017-10-10 17:39             ` Dan Williams
2017-10-10 18:05               ` Jason Gunthorpe
2017-10-10 20:17                 ` Dan Williams
2017-10-12 18:27                   ` Jason Gunthorpe
2017-10-12 20:10                     ` Dan Williams
2017-10-13  6:50                       ` Christoph Hellwig
2017-10-13 15:03                         ` Jason Gunthorpe
2017-10-15 15:14                           ` Matan Barak
2017-10-15 15:21                             ` Dan Williams
2017-10-13  7:09         ` Christoph Hellwig
2017-10-06 22:36 ` [PATCH v7 08/12] fs, mapdirect: introduce ->lease_direct() Dan Williams
2017-10-06 22:36 ` [PATCH v7 09/12] xfs: wire up ->lease_direct() Dan Williams
2017-10-09  3:45   ` Dave Chinner
2017-10-09 17:10     ` Dan Williams
2017-10-06 22:36 ` [PATCH v7 10/12] device-dax: " Dan Williams
2017-10-06 22:36 ` [PATCH v7 11/12] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings Dan Williams
2017-10-08  4:02   ` [PATCH v8 1/2] iommu: up-level sg_num_pages() from amd-iommu Dan Williams
2017-10-08  4:04   ` [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings Dan Williams
2017-10-08  6:45     ` kbuild test robot
2017-10-08 15:49       ` Dan Williams
2017-10-06 22:36 ` [PATCH v7 12/12] tools/testing/nvdimm: enable rdma unit tests Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171009034030.GH3666@dastard \
    --to=david@fromorbit.com \
    --cc=arnd@arndb.de \
    --cc=bfields@fieldses.org \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jlayton@poochiereds.net \
    --cc=jmoyer@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).