From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Dan Williams <dan.j.williams@intel.com>,
Christoph Hellwig <hch@lst.de>,
Doug Ledford <dledford@redhat.com>,
Hal Rosenstock <hal.rosenstock@gmail.com>,
Inki Dae <inki.dae@samsung.com>, Jan Kara <jack@suse.cz>,
Jason Gunthorpe <jgg@mellanox.com>,
Jeff Moyer <jmoyer@redhat.com>,
Joonyoung Shim <jy0922.shim@samsung.com>,
Kyungmin Park <kyungmin.park@samsung.com>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Mel Gorman <mgorman@suse.de>,
Ross Zwisler <ross.zwisler@linux.intel.com>,
Sean Hefty <sean.hefty@intel.com>,
Seung-Woo Kim <sw0312.kim@samsung.com>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.14 09/95] mm: introduce get_user_pages_longterm
Date: Mon, 4 Dec 2017 16:59:33 +0100 [thread overview]
Message-ID: <20171204160046.564227420@linuxfoundation.org> (raw)
In-Reply-To: <20171204160046.206920966@linuxfoundation.org>
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Williams <dan.j.williams@intel.com>
commit 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 upstream.
Patch series "introduce get_user_pages_longterm()", v2.
Here is a new get_user_pages api for cases where a driver intends to
keep an elevated page count indefinitely. This is distinct from usages
like iov_iter_get_pages where the elevated page counts are transient.
The iov_iter_get_pages cases immediately turn around and submit the
pages to a device driver which will put_page when the i/o operation
completes (under kernel control).
In the longterm case userspace is responsible for dropping the page
reference at some undefined point in the future. This is untenable for
filesystem-dax case where the filesystem is in control of the lifetime
of the block / page and needs reasonable limits on how long it can wait
for pages in a mapping to become idle.
Fixing filesystems to actually wait for dax pages to be idle before
blocks from a truncate/hole-punch operation are repurposed is saved for
a later patch series.
Also, allowing longterm registration of dax mappings is a future patch
series that introduces a "map with lease" semantic where the kernel can
revoke a lease and force userspace to drop its page references.
I have also tagged these for -stable to purposely break cases that might
assume that longterm memory registrations for filesystem-dax mappings
were supported by the kernel. The behavior regression this policy
change implies is one of the reasons we maintain the "dax enabled.
Warning: EXPERIMENTAL, use at your own risk" notification when mounting
a filesystem in dax mode.
It is worth noting the device-dax interface does not suffer the same
constraints since it does not support file space management operations
like hole-punch.
This patch (of 4):
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow long standing memory registrations against
filesytem-dax vmas. Device-dax vmas do not have this problem and are
explicitly allowed.
This is temporary until a "memory registration with layout-lease"
mechanism can be implemented for the affected sub-systems (RDMA and
V4L2).
[akpm@linux-foundation.org: use kcalloc()]
Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/fs.h | 14 +++++++++++
include/linux/mm.h | 13 ++++++++++
mm/gup.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+)
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3175,6 +3175,20 @@ static inline bool vma_is_dax(struct vm_
return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host);
}
+static inline bool vma_is_fsdax(struct vm_area_struct *vma)
+{
+ struct inode *inode;
+
+ if (!vma->vm_file)
+ return false;
+ if (!vma_is_dax(vma))
+ return false;
+ inode = file_inode(vma->vm_file);
+ if (inode->i_mode == S_IFCHR)
+ return false; /* device-dax */
+ return true;
+}
+
static inline int iocb_flags(struct file *file)
{
int res = 0;
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1368,6 +1368,19 @@ long get_user_pages_locked(unsigned long
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
+#ifdef CONFIG_FS_DAX
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas);
+#else
+static inline long get_user_pages_longterm(unsigned long start,
+ unsigned long nr_pages, unsigned int gup_flags,
+ struct page **pages, struct vm_area_struct **vmas)
+{
+ return get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+}
+#endif /* CONFIG_FS_DAX */
+
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start,
}
EXPORT_SYMBOL(get_user_pages);
+#ifdef CONFIG_FS_DAX
+/*
+ * This is the same as get_user_pages() in that it assumes we are
+ * operating on the current task's mm, but it goes further to validate
+ * that the vmas associated with the address range are suitable for
+ * longterm elevated page reference counts. For example, filesystem-dax
+ * mappings are subject to the lifetime enforced by the filesystem and
+ * we need guarantees that longterm users like RDMA and V4L2 only
+ * establish mappings that have a kernel enforced revocation mechanism.
+ *
+ * "longterm" == userspace controlled elevated page count lifetime.
+ * Contrast this to iov_iter_get_pages() usages which are transient.
+ */
+long get_user_pages_longterm(unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ struct vm_area_struct **vmas_arg)
+{
+ struct vm_area_struct **vmas = vmas_arg;
+ struct vm_area_struct *vma_prev = NULL;
+ long rc, i;
+
+ if (!pages)
+ return -EINVAL;
+
+ if (!vmas) {
+ vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *),
+ GFP_KERNEL);
+ if (!vmas)
+ return -ENOMEM;
+ }
+
+ rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas);
+
+ for (i = 0; i < rc; i++) {
+ struct vm_area_struct *vma = vmas[i];
+
+ if (vma == vma_prev)
+ continue;
+
+ vma_prev = vma;
+
+ if (vma_is_fsdax(vma))
+ break;
+ }
+
+ /*
+ * Either get_user_pages() failed, or the vma validation
+ * succeeded, in either case we don't need to put_page() before
+ * returning.
+ */
+ if (i >= rc)
+ goto out;
+
+ for (i = 0; i < rc; i++)
+ put_page(pages[i]);
+ rc = -EOPNOTSUPP;
+out:
+ if (vmas != vmas_arg)
+ kfree(vmas);
+ return rc;
+}
+EXPORT_SYMBOL(get_user_pages_longterm);
+#endif /* CONFIG_FS_DAX */
+
/**
* populate_vma_page_range() - populate a range of pages in the vma.
* @vma: target vma
next prev parent reply other threads:[~2017-12-04 16:04 UTC|newest]
Thread overview: 91+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-04 15:59 [PATCH 4.14 00/95] 4.14.4-stable review Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 02/95] mm, memory_hotplug: do not back off draining pcp free pages from kworker context Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 03/95] mm, oom_reaper: gather each vma to prevent leaking TLB entry Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 04/95] mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 05/95] mm/cma: fix alloc_contig_range ret code/potential leak Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 06/95] mm: fix device-dax pud write-faults triggered by get_user_pages() Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 07/95] mm, hugetlbfs: introduce ->split() to vm_operations_struct Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 08/95] device-dax: implement ->split() to catch invalid munmap attempts Greg Kroah-Hartman
2017-12-04 15:59 ` Greg Kroah-Hartman [this message]
2017-12-04 15:59 ` [PATCH 4.14 10/95] mm: fail get_vaddr_frames() for filesystem-dax mappings Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 11/95] v4l2: disable filesystem-dax mapping support Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 12/95] IB/core: disable memory registration of filesystem-dax vmas Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 13/95] exec: avoid RLIMIT_STACK races with prlimit() Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 14/95] mm/madvise.c: fix madvise() infinite loop under special circumstances Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 16/95] mm, memcg: fix mem_cgroup_swapout() for THPs Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 17/95] fs/fat/inode.c: fix sb_rdonly() change Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 18/95] autofs: revert "autofs: take more care to not update last_used on path walk" Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 19/95] autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored" Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 20/95] mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 21/95] btrfs: clear space cache inode generation always Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 22/95] nfsd: Fix stateid races between OPEN and CLOSE Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 23/95] nfsd: Fix another OPEN stateid race Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 24/95] nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat Greg Kroah-Hartman
2017-12-04 15:59 ` Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 25/95] crypto: algif_aead - skip SGL entries with NULL page Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 26/95] crypto: af_alg - remove locking in async callback Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 28/95] lockd: lost rollback of set_grace_period() in lockd_down_net() Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 29/95] s390: revert ELF_ET_DYN_BASE base changes Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 30/95] drm: omapdrm: Fix DPI on platforms using the DSI VDDS Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 31/95] omapdrm: hdmi4: Correct the SoC revision matching Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 32/95] apparmor: fix oops in audit_signal_cb hook Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 33/95] arm64: module-plts: factor out PLT generation code for ftrace Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 34/95] arm64: ftrace: emit ftrace-mod.o contents through code Greg Kroah-Hartman
2017-12-04 15:59 ` [PATCH 4.14 35/95] powerpc/powernv: Fix kexec crashes caused by tlbie tracing Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 36/95] powerpc/kexec: Fix kexec/kdump in P9 guest kernels Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 40/95] KVM: lapic: Split out x2apic ldr calculation Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 41/95] KVM: lapic: Fixup LDR on load in x2apic Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 42/95] mmc: sdhci: Avoid swiotlb buffer being full Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 43/95] mmc: block: Fix missing blk_put_request() Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 44/95] mmc: block: Check return value of blk_get_request() Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 45/95] mmc: core: Do not leave the block driver in a suspended state Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 46/95] mmc: block: Ensure that debugfs files are removed Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 47/95] mmc: core: prepend 0x to pre_eol_info entry in sysfs Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 48/95] mmc: core: prepend 0x to OCR " Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 49/95] ACPI / EC: Fix regression related to PM ops support in ECDT device Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 50/95] eeprom: at24: fix reading from 24MAC402/24MAC602 Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 51/95] eeprom: at24: correctly set the size for at24mac402 Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 52/95] eeprom: at24: check at24_read/write arguments Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 53/95] i2c: i801: Fix Failed to allocate irq -2147483648 error Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 54/95] cxl: Check if vphb exists before iterating over AFU devices Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 55/95] bcache: Fix building error on MIPS Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 56/95] bcache: only permit to recovery read error when cache device is clean Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 57/95] bcache: recover data from backing when data " Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 58/95] hwmon: (jc42) optionally try to disable the SMBUS timeout Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 59/95] nvme-pci: add quirk for delay before CHK RDY for WDC SN200 Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 63/95] drm/amdgpu: correct reference clock value on vega10 Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 65/95] drm/amdgpu: Properly allocate VM invalidate eng v2 Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 66/95] drm/amdgpu: Remove check which is not valid for certain VBIOS Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 73/95] drm/tilcdc: Precalculate total frametime in tilcdc_crtc_set_mode() Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 74/95] drm/radeon: fix atombios on big endian Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 75/95] drm/panel: simple: Add missing panel_simple_unprepare() calls Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 76/95] drm/hisilicon: Ensure LDI regs are properly configured Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 78/95] drm/amd/pp: fix typecast error in powerplay Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 79/95] drm/fb_helper: Disable all crtcs when initial setup fails Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 80/95] drm/fsl-dcu: Dont set connector DPMS property Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 84/95] include/linux/compiler-clang.h: handle randomizable anonymous structs Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 85/95] IB/core: Do not warn on lid conversions for OPA Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 86/95] IB/hfi1: " Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 87/95] e1000e: fix the use of magic numbers for buffer overrun issue Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 88/95] md: forbid a RAID5 from having both a bitmap and a journal Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 89/95] drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2 Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 90/95] drm/i915: Re-register PMIC bus access notifier on runtime resume Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 91/95] drm/i915/fbdev: Serialise early hotplug events with async fbdev config Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 92/95] drm/i915/gvt: Correct ADDR_4K/2M/1G_MASK definition Greg Kroah-Hartman
2017-12-04 16:00 ` [PATCH 4.14 95/95] Revert "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()" Greg Kroah-Hartman
2017-12-04 20:29 ` [PATCH 4.14 00/95] 4.14.4-stable review Shuah Khan
2017-12-05 6:25 ` Greg Kroah-Hartman
2017-12-04 21:12 ` Tom Gall
2017-12-05 6:24 ` Greg Kroah-Hartman
2017-12-05 21:45 ` Tom Gall
2017-12-06 6:49 ` Greg Kroah-Hartman
2017-12-06 6:51 ` Greg Kroah-Hartman
2017-12-06 18:01 ` Tom Gall
2017-12-07 7:49 ` Greg Kroah-Hartman
2017-12-06 14:41 ` Sumit Semwal
2017-12-06 15:33 ` Greg Kroah-Hartman
2017-12-06 15:39 ` Sumit Semwal
2017-12-04 23:46 ` Guenter Roeck
2017-12-05 6:24 ` Greg Kroah-Hartman
2017-12-05 7:01 ` Naresh Kamboju
2017-12-05 7:50 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171204160046.564227420@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=dledford@redhat.com \
--cc=hal.rosenstock@gmail.com \
--cc=hch@lst.de \
--cc=inki.dae@samsung.com \
--cc=jack@suse.cz \
--cc=jgg@mellanox.com \
--cc=jmoyer@redhat.com \
--cc=jy0922.shim@samsung.com \
--cc=kyungmin.park@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=mgorman@suse.de \
--cc=ross.zwisler@linux.intel.com \
--cc=sean.hefty@intel.com \
--cc=stable@vger.kernel.org \
--cc=sw0312.kim@samsung.com \
--cc=torvalds@linux-foundation.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.