public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes
@ 2018-03-02  3:53 Dan Williams
       [not found] ` <151996281307.28483.12343847096989509127.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2018-03-02  3:53 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: Jane Chu, Michal Hocko, Jan Kara, kvm-u79uwXL29TY76Z2rM5mHXA,
	Darrick J. Wong, Matthew Wilcox,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Alex Williamson, Gerd Rausch,
	Andreas Dilger, Alexander Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Christoph Hellwig

Changes since v4 [1]:
* Fix the changelog of "dax: introduce IS_DEVDAX() and IS_FSDAX()" to
  better clarify the need for new helpers (Jan)
* Replace dax_sem_is_locked() with dax_sem_assert_held() (Jan)
* Use file_inode() in vma_is_dax() (Jan)
* Resend the full series to linux-xfs@ (Dave)
* Collect Jan's Reviewed-by

[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014271.html
---

The vfio interface, like RDMA, wants to setup long term (indefinite)
pins of the pages backing an address range so that a guest or userspace
driver can perform DMA to the with physical address. Given that this
pinning may lead to filesystem operations deadlocking in the
filesystem-dax case, the pinning request needs to be rejected.

The longer term fix for vfio, RDMA, and any other long term pin user, is
to provide a 'pin with lease' mechanism. Similar to the leases that are
hold for pNFS RDMA layouts, this userspace lease gives the kernel a way
to notify userspace that the block layout of the file is changing and
the kernel is revoking access to pinned pages.

Related to this change is the discovery that vma_is_fsdax() was causing
device-dax inode detection to fail. That lead to series of fixes and
cleanups to make sure that S_DAX is defined correctly in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case.

---

Dan Williams (12):
      dax: fix vma_is_fsdax() helper
      dax: introduce IS_DEVDAX() and IS_FSDAX()
      ext2, dax: finish implementing dax_sem helpers
      ext2, dax: define ext2_dax_*() infrastructure in all cases
      ext4, dax: define ext4_dax_*() infrastructure in all cases
      ext2, dax: replace IS_DAX() with IS_FSDAX()
      ext4, dax: replace IS_DAX() with IS_FSDAX()
      xfs, dax: replace IS_DAX() with IS_FSDAX()
      mm, dax: replace IS_DAX() with IS_DEVDAX() or IS_FSDAX()
      fs, dax: kill IS_DAX()
      dax: fix S_DAX definition
      vfio: disable filesystem-dax page pinning


 drivers/vfio/vfio_iommu_type1.c |   18 ++++++++++++++--
 fs/ext2/ext2.h                  |    6 +++++
 fs/ext2/file.c                  |   19 +++++------------
 fs/ext2/inode.c                 |   10 ++++-----
 fs/ext4/file.c                  |   18 +++++-----------
 fs/ext4/inode.c                 |    4 ++--
 fs/ext4/ioctl.c                 |    2 +-
 fs/ext4/super.c                 |    2 +-
 fs/iomap.c                      |    2 +-
 fs/xfs/xfs_file.c               |   14 ++++++-------
 fs/xfs/xfs_ioctl.c              |    4 ++--
 fs/xfs/xfs_iomap.c              |    6 +++--
 fs/xfs/xfs_reflink.c            |    2 +-
 include/linux/dax.h             |   12 ++++++++---
 include/linux/fs.h              |   43 ++++++++++++++++++++++++++++-----------
 mm/fadvise.c                    |    3 ++-
 mm/filemap.c                    |    4 ++--
 mm/huge_memory.c                |    4 +++-
 mm/madvise.c                    |    3 ++-
 19 files changed, 102 insertions(+), 74 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v5 12/12] vfio: disable filesystem-dax page pinning
       [not found] ` <151996281307.28483.12343847096989509127.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2018-03-02  3:54   ` Dan Williams
  2018-03-02 22:10   ` [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Dan Williams @ 2018-03-02  3:54 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: Michal Hocko, kvm-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Alex Williamson,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Christoph Hellwig

Filesystem-DAX is incompatible with 'longterm' page pinning. Without
page cache indirection a DAX mapping maps filesystem blocks directly.
This means that the filesystem must not modify a file's block map while
any page in a mapping is pinned. In order to prevent the situation of
userspace holding of filesystem operations indefinitely, disallow
'longterm' Filesystem-DAX mappings.

RDMA has the same conflict and the plan there is to add a 'with lease'
mechanism to allow the kernel to notify userspace that the mapping is
being torn down for block-map maintenance. Perhaps something similar can
be put in place for vfio.

Note that xfs and ext4 still report:

   "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"

...at mount time, and resolving the dax-dma-vs-truncate problem is one
of the last hurdles to remove that designation.

Acked-by: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Cc: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Reported-by: Haozhong Zhang <haozhong.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Tested-by: Haozhong Zhang <haozhong.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Fixes: d475c6346a38 ("dax,ext2: replace XIP read and write with DAX I/O")
Signed-off-by: Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/vfio/vfio_iommu_type1.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e30e29ae4819..45657e2b1ff7 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -338,11 +338,12 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 {
 	struct page *page[1];
 	struct vm_area_struct *vma;
+	struct vm_area_struct *vmas[1];
 	int ret;
 
 	if (mm == current->mm) {
-		ret = get_user_pages_fast(vaddr, 1, !!(prot & IOMMU_WRITE),
-					  page);
+		ret = get_user_pages_longterm(vaddr, 1, !!(prot & IOMMU_WRITE),
+					      page, vmas);
 	} else {
 		unsigned int flags = 0;
 
@@ -351,7 +352,18 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 
 		down_read(&mm->mmap_sem);
 		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
-					    NULL, NULL);
+					    vmas, NULL);
+		/*
+		 * The lifetime of a vaddr_get_pfn() page pin is
+		 * userspace-controlled. In the fs-dax case this could
+		 * lead to indefinite stalls in filesystem operations.
+		 * Disallow attempts to pin fs-dax pages via this
+		 * interface.
+		 */
+		if (ret > 0 && vma_is_fsdax(vmas[0])) {
+			ret = -EOPNOTSUPP;
+			put_page(page[0]);
+		}
 		up_read(&mm->mmap_sem);
 	}

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes
       [not found] ` <151996281307.28483.12343847096989509127.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
  2018-03-02  3:54   ` [PATCH v5 12/12] vfio: disable filesystem-dax page pinning Dan Williams
@ 2018-03-02 22:10   ` Christoph Hellwig
       [not found]     ` <20180302221020.GA30722-jcswGhMUV9g@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2018-03-02 22:10 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jane Chu, Michal Hocko, Jan Kara, Matthew Wilcox,
	kvm-u79uwXL29TY76Z2rM5mHXA, Darrick J. Wong,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Alex Williamson, Gerd Rausch,
	Andreas Dilger, Alexander Viro,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Theodore Ts'o,
	Christoph Hellwig

I really don't like these IS_DEVDAX and IS_FSDAX flags.  We should
stop pretending DAX is a global per-inode choice and get rid of these
magic flags entirely.  So please convert the instances inside the
various file systems to checking the file system mount options instead.

For the core ones we'll need to differentiate:

 - the checks in generic_file_read_iter and __generic_file_write_iter
   seem to not be needed anymore at all since we stopped abusing the
   direct I/O code for DAX, so they should probably be removed.
 - io_is_direct is a weird check and should probably just go away,
   as there is not point in always setting IOCB_DIRECT for DAX I/O
 - fadvise should either become a file op, or a flag on the inode that
   fadvice is supported instead of the nasty noop_backing_dev_info or
   DAX check.
 - Ditto for madvise
 - vma_is_dax should probably be replaced with a VMA flag.
 - thp_get_unmapped_area I don't really understand why we have a dax
   check there.
 - dax_mapping will be much harder to sort out.

But all these DAX flags certainly look like a major hodge podge to me.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes
       [not found]     ` <20180302221020.GA30722-jcswGhMUV9g@public.gmane.org>
@ 2018-03-02 22:21       ` Dan Williams
       [not found]         ` <CAPcyv4gKyvkHY_qQTYvd8wrLpaXXciJyWZY+9T7Q_Eg-Zuxpgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2018-03-02 22:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jane Chu, Michal Hocko, Jan Kara, Matthew Wilcox, KVM list,
	Darrick J. Wong, linux-nvdimm, Linux Kernel Mailing List, stable,
	linux-xfs, Linux MM, Alex Williamson, Gerd Rausch, Andreas Dilger,
	Alexander Viro, linux-fsdevel, Theodore Ts'o

On Fri, Mar 2, 2018 at 2:10 PM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> I really don't like these IS_DEVDAX and IS_FSDAX flags.  We should
> stop pretending DAX is a global per-inode choice and get rid of these
> magic flags entirely.  So please convert the instances inside the
> various file systems to checking the file system mount options instead.
>
> For the core ones we'll need to differentiate:
>
>  - the checks in generic_file_read_iter and __generic_file_write_iter
>    seem to not be needed anymore at all since we stopped abusing the
>    direct I/O code for DAX, so they should probably be removed.
>  - io_is_direct is a weird check and should probably just go away,
>    as there is not point in always setting IOCB_DIRECT for DAX I/O
>  - fadvise should either become a file op, or a flag on the inode that
>    fadvice is supported instead of the nasty noop_backing_dev_info or
>    DAX check.
>  - Ditto for madvise
>  - vma_is_dax should probably be replaced with a VMA flag.
>  - thp_get_unmapped_area I don't really understand why we have a dax
>    check there.
>  - dax_mapping will be much harder to sort out.
>
> But all these DAX flags certainly look like a major hodge podge to me.

They are indeed a hodge-podge. The problem is that the current
IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with
IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks
you propose for the next merge window.

Acceptable?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes
       [not found]         ` <CAPcyv4gKyvkHY_qQTYvd8wrLpaXXciJyWZY+9T7Q_Eg-Zuxpgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-02 22:57           ` Christoph Hellwig
       [not found]             ` <20180302225734.GE31240-jcswGhMUV9g@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2018-03-02 22:57 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jane Chu, Michal Hocko, Jan Kara, Matthew Wilcox, KVM list,
	Darrick J. Wong, linux-nvdimm, Linux Kernel Mailing List, stable,
	linux-xfs, Linux MM, Alex Williamson, Gerd Rausch, Andreas Dilger,
	Alexander Viro, linux-fsdevel, Theodore Ts'o,
	Christoph Hellwig

On Fri, Mar 02, 2018 at 02:21:40PM -0800, Dan Williams wrote:
> They are indeed a hodge-podge. The problem is that the current
> IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with
> IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks
> you propose for the next merge window.

The only thing broken about IS_DAX are the code elimination games
based on the CONFIG_* flags.  Remove those and just add proper stubs
for the dax routines and everything will be fine for now until we can
kill that inode flag.

IS_FSDAX and IS_DEVDAX on the other hand are a giant mess that isn't
helping anyone.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes
       [not found]             ` <20180302225734.GE31240-jcswGhMUV9g@public.gmane.org>
@ 2018-03-02 23:49               ` Dan Williams
       [not found]                 ` <CAPcyv4jM=N=wjnK4gWxHu0Fk9VXnfReLf6shW6mbzvf3sahjrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Dan Williams @ 2018-03-02 23:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jane Chu, Michal Hocko, Jan Kara, Matthew Wilcox, KVM list,
	Darrick J. Wong, linux-nvdimm, Linux Kernel Mailing List, stable,
	linux-xfs, Linux MM, Alex Williamson, Gerd Rausch, Andreas Dilger,
	Alexander Viro, linux-fsdevel, Theodore Ts'o

On Fri, Mar 2, 2018 at 2:57 PM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
> On Fri, Mar 02, 2018 at 02:21:40PM -0800, Dan Williams wrote:
>> They are indeed a hodge-podge. The problem is that the current
>> IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with
>> IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks
>> you propose for the next merge window.
>
> The only thing broken about IS_DAX are the code elimination games
> based on the CONFIG_* flags.  Remove those and just add proper stubs
> for the dax routines and everything will be fine for now until we can
> kill that inode flag.
>
> IS_FSDAX and IS_DEVDAX on the other hand are a giant mess that isn't
> helping anyone.

Ok, I'll take another shot at something suitable for 4.16, but without
these new helpers...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes
       [not found]                 ` <CAPcyv4jM=N=wjnK4gWxHu0Fk9VXnfReLf6shW6mbzvf3sahjrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-03  2:19                   ` Dan Williams
  0 siblings, 0 replies; 7+ messages in thread
From: Dan Williams @ 2018-03-03  2:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michal Hocko, Jan Kara, Matthew Wilcox, KVM list, Darrick J. Wong,
	linux-nvdimm, Linux Kernel Mailing List, stable, linux-xfs,
	Linux MM, Alex Williamson, Gerd Rausch, Andreas Dilger,
	Alexander Viro, linux-fsdevel, Theodore Ts'o

On Fri, Mar 2, 2018 at 3:49 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> On Fri, Mar 2, 2018 at 2:57 PM, Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org> wrote:
>> On Fri, Mar 02, 2018 at 02:21:40PM -0800, Dan Williams wrote:
>>> They are indeed a hodge-podge. The problem is that the current
>>> IS_DAX() is broken. So I'd like to propose fixing IS_DAX() with
>>> IS_FSDAX() + IS_DEVDAX() for 4.16-rc4 and queue up these wider reworks
>>> you propose for the next merge window.
>>
>> The only thing broken about IS_DAX are the code elimination games
>> based on the CONFIG_* flags.  Remove those and just add proper stubs
>> for the dax routines and everything will be fine for now until we can
>> kill that inode flag.
>>
>> IS_FSDAX and IS_DEVDAX on the other hand are a giant mess that isn't
>> helping anyone.
>
> Ok, I'll take another shot at something suitable for 4.16, but without
> these new helpers...

I'll drop patches 2-11 for now, and just get the high priority fixes
in for the next rc.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-03-03  2:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-03-02  3:53 [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes Dan Williams
     [not found] ` <151996281307.28483.12343847096989509127.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2018-03-02  3:54   ` [PATCH v5 12/12] vfio: disable filesystem-dax page pinning Dan Williams
2018-03-02 22:10   ` [PATCH v5 00/12] vfio, dax: prevent long term filesystem-dax pins and other fixes Christoph Hellwig
     [not found]     ` <20180302221020.GA30722-jcswGhMUV9g@public.gmane.org>
2018-03-02 22:21       ` Dan Williams
     [not found]         ` <CAPcyv4gKyvkHY_qQTYvd8wrLpaXXciJyWZY+9T7Q_Eg-Zuxpgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-02 22:57           ` Christoph Hellwig
     [not found]             ` <20180302225734.GE31240-jcswGhMUV9g@public.gmane.org>
2018-03-02 23:49               ` Dan Williams
     [not found]                 ` <CAPcyv4jM=N=wjnK4gWxHu0Fk9VXnfReLf6shW6mbzvf3sahjrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-03  2:19                   ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox