* [PATCH v8 1/2] iommu: up-level sg_num_pages() from amd-iommu
[not found] <150732937720.22363.1037155753760964267.stgit@dwillia2-desk3.amr.corp.intel.com>
@ 2017-10-08 4:02 ` Dan Williams
2017-10-08 4:04 ` [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings Dan Williams
1 sibling, 0 replies; 4+ messages in thread
From: Dan Williams @ 2017-10-08 4:02 UTC (permalink / raw)
To: linux-nvdimm
Cc: linux-rdma, linux-api, Joerg Roedel, linux-xfs, linux-mm, iommu,
linux-fsdevel
iommu_sg_num_pages() is a helper that walks a scattlerlist and counts
pages taking segment boundaries and iommu_num_pages() into account.
Up-level it for determining the IOVA range that dma_map_ops established
at dma_map_sg() time. The intent is to iommu_unmap() the IOVA range in
advance of freeing IOVA range.
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
New patch in v8.
drivers/iommu/amd_iommu.c | 30 ++----------------------------
drivers/iommu/iommu.c | 27 +++++++++++++++++++++++++++
include/linux/iommu.h | 2 ++
3 files changed, 31 insertions(+), 28 deletions(-)
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c8e1a45af182..4795b0823469 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2459,32 +2459,6 @@ static void unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,
__unmap_single(dma_dom, dma_addr, size, dir);
}
-static int sg_num_pages(struct device *dev,
- struct scatterlist *sglist,
- int nelems)
-{
- unsigned long mask, boundary_size;
- struct scatterlist *s;
- int i, npages = 0;
-
- mask = dma_get_seg_boundary(dev);
- boundary_size = mask + 1 ? ALIGN(mask + 1, PAGE_SIZE) >> PAGE_SHIFT :
- 1UL << (BITS_PER_LONG - PAGE_SHIFT);
-
- for_each_sg(sglist, s, nelems, i) {
- int p, n;
-
- s->dma_address = npages << PAGE_SHIFT;
- p = npages % boundary_size;
- n = iommu_num_pages(sg_phys(s), s->length, PAGE_SIZE);
- if (p + n > boundary_size)
- npages += boundary_size - p;
- npages += n;
- }
-
- return npages;
-}
-
/*
* The exported map_sg function for dma_ops (handles scatter-gather
* lists).
@@ -2507,7 +2481,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
dma_dom = to_dma_ops_domain(domain);
dma_mask = *dev->dma_mask;
- npages = sg_num_pages(dev, sglist, nelems);
+ npages = iommu_sg_num_pages(dev, sglist, nelems);
address = dma_ops_alloc_iova(dev, dma_dom, npages, dma_mask);
if (address == AMD_IOMMU_MAPPING_ERROR)
@@ -2585,7 +2559,7 @@ static void unmap_sg(struct device *dev, struct scatterlist *sglist,
startaddr = sg_dma_address(sglist) & PAGE_MASK;
dma_dom = to_dma_ops_domain(domain);
- npages = sg_num_pages(dev, sglist, nelems);
+ npages = iommu_sg_num_pages(dev, sglist, nelems);
__unmap_single(dma_dom, startaddr, npages << PAGE_SHIFT, dir);
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3de5c0bcb5cc..cfe6eeea3578 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -33,6 +33,7 @@
#include <linux/bitops.h>
#include <linux/property.h>
#include <trace/events/iommu.h>
+#include <linux/iommu-helper.h>
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
@@ -1631,6 +1632,32 @@ size_t iommu_unmap_fast(struct iommu_domain *domain,
}
EXPORT_SYMBOL_GPL(iommu_unmap_fast);
+int iommu_sg_num_pages(struct device *dev, struct scatterlist *sglist,
+ int nelems)
+{
+ unsigned long mask, boundary_size;
+ struct scatterlist *s;
+ int i, npages = 0;
+
+ mask = dma_get_seg_boundary(dev);
+ boundary_size = mask + 1 ? ALIGN(mask + 1, PAGE_SIZE) >> PAGE_SHIFT
+ : 1UL << (BITS_PER_LONG - PAGE_SHIFT);
+
+ for_each_sg(sglist, s, nelems, i) {
+ int p, n;
+
+ s->dma_address = npages << PAGE_SHIFT;
+ p = npages % boundary_size;
+ n = iommu_num_pages(sg_phys(s), s->length, PAGE_SIZE);
+ if (p + n > boundary_size)
+ npages += boundary_size - p;
+ npages += n;
+ }
+
+ return npages;
+}
+EXPORT_SYMBOL_GPL(iommu_sg_num_pages);
+
size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
struct scatterlist *sg, unsigned int nents, int prot)
{
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a7f2ac689d29..5b2d20e1475a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -303,6 +303,8 @@ extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova,
size_t size);
extern size_t iommu_unmap_fast(struct iommu_domain *domain,
unsigned long iova, size_t size);
+extern int iommu_sg_num_pages(struct device *dev, struct scatterlist *sglist,
+ int nelems);
extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
struct scatterlist *sg,unsigned int nents,
int prot);
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings
[not found] <150732937720.22363.1037155753760964267.stgit@dwillia2-desk3.amr.corp.intel.com>
2017-10-08 4:02 ` [PATCH v8 1/2] iommu: up-level sg_num_pages() from amd-iommu Dan Williams
@ 2017-10-08 4:04 ` Dan Williams
[not found] ` <150743537023.13602.3520782942682280917.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
1 sibling, 1 reply; 4+ messages in thread
From: Dan Williams @ 2017-10-08 4:04 UTC (permalink / raw)
To: linux-nvdimm
Cc: Sean Hefty, linux-xfs, Jan Kara, Ashok Raj, Darrick J. Wong,
linux-rdma, linux-api, Joerg Roedel, Dave Chinner, Jeff Moyer,
iommu, Christoph Hellwig, J. Bruce Fields, linux-mm, Doug Ledford,
Ross Zwisler, linux-fsdevel, Jeff Layton, David Woodhouse,
Hal Rosenstock
Currently the ibverbs core in the kernel is completely unaware of the
dangers of filesystem-DAX mappings. Specifically, the filesystem is free
to move file blocks at will. In the case of DAX, it means that RDMA to a
given file offset can dynamically switch to another file offset, another
file, or free space with no notification to RDMA device to cease
operations. Historically, this lack of communication between the ibverbs
core and filesystem was not a problem because RDMA always targeted
dynamically allocated page cache, so at least the RDMA device would have
valid memory to target even if the file was being modified. With DAX we
need to add coordination since RDMA is bypassing page-cache and going
direct to on-media pages of the file. RDMA to DAX can cause damage if
filesystem blocks move / change state.
Use the new ->lease_direct() operation to get a notification when the
filesystem is invalidating the block map of the file and needs RDMA
operations to stop. Given that the kernel can not be in a position where
it needs to wait indefinitely for userspace to stop a device we need a
mechanism where the kernel can force-revoke access. Towards that end, use
the dma_get_iommu_domain() to both check if the device has domain
mappings that can be invalidated and retrieve the iommu_domain for use
with iommu_unmap.
Once we have that assurance that we can block in-flight I/O when the
file's block map changes then we can safely allow RDMA to DAX.
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
Changes since v7:
* Switch from dma_has_iommu() to dma_get_iommu_domain().
* Switch from dma_unmap_sg() at lease break time, to iommu_unmap() so
that the IOVA remains allocated while the device might still be
sending transactions.
drivers/infiniband/core/umem.c | 90 +++++++++++++++++++++++++++++++++++-----
include/rdma/ib_umem.h | 8 ++++
2 files changed, 86 insertions(+), 12 deletions(-)
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 21e60b1e2ff4..5e4598982359 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -36,6 +36,7 @@
#include <linux/dma-mapping.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
+#include <linux/mapdirect.h>
#include <linux/export.h>
#include <linux/hugetlb.h>
#include <linux/slab.h>
@@ -46,10 +47,16 @@
static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty)
{
+ struct lease_direct *ld, *_ld;
struct scatterlist *sg;
struct page *page;
int i;
+ list_for_each_entry_safe(ld, _ld, &umem->leases, list) {
+ list_del_init(&ld->list);
+ map_direct_lease_destroy(ld);
+ }
+
if (umem->nmap > 0)
ib_dma_unmap_sg(dev, umem->sg_head.sgl,
umem->npages,
@@ -64,10 +71,20 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
}
sg_free_table(&umem->sg_head);
- return;
}
+static void ib_umem_lease_break(void *__umem)
+{
+ struct ib_umem *umem = umem;
+ struct ib_device *idev = umem->context->device;
+ struct device *dev = idev->dma_device;
+ struct scatterlist *sgl = umem->sg_head.sgl;
+
+ iommu_unmap(umem->iommu, sg_dma_address(sgl) & PAGE_MASK,
+ iommu_sg_num_pages(dev, sgl, umem->npages));
+}
+
/**
* ib_umem_get - Pin and DMA map userspace memory.
*
@@ -96,7 +113,10 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
struct scatterlist *sg, *sg_list_start;
int need_release = 0;
unsigned int gup_flags = FOLL_WRITE;
+ struct vm_area_struct *vma_prev = NULL;
+ struct device *dma_dev;
+ dma_dev = context->device->dma_device;
if (dmasync)
dma_attrs |= DMA_ATTR_WRITE_BARRIER;
@@ -120,6 +140,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
umem->address = addr;
umem->page_shift = PAGE_SHIFT;
umem->pid = get_task_pid(current, PIDTYPE_PID);
+ INIT_LIST_HEAD(&umem->leases);
/*
* We ask for writable memory if any of the following
* access flags are set. "Local write" and "remote write"
@@ -147,19 +168,21 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
umem->hugetlb = 1;
page_list = (struct page **) __get_free_page(GFP_KERNEL);
- if (!page_list) {
- put_pid(umem->pid);
- kfree(umem);
- return ERR_PTR(-ENOMEM);
- }
+ if (!page_list)
+ goto err_pagelist;
/*
- * if we can't alloc the vma_list, it's not so bad;
- * just assume the memory is not hugetlb memory
+ * If DAX is enabled we need the vma to setup a ->lease_direct()
+ * lease to protect against file modifications, otherwise we can
+ * tolerate a failure to allocate the vma_list and just assume
+ * that all vmas are not hugetlb-vmas.
*/
vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
- if (!vma_list)
+ if (!vma_list) {
+ if (IS_ENABLED(CONFIG_DAX_MAP_DIRECT))
+ goto err_vmalist;
umem->hugetlb = 0;
+ }
npages = ib_umem_num_pages(umem);
@@ -199,15 +222,52 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
if (ret < 0)
goto out;
- umem->npages += ret;
cur_base += ret * PAGE_SIZE;
npages -= ret;
for_each_sg(sg_list_start, sg, ret, i) {
- if (vma_list && !is_vm_hugetlb_page(vma_list[i]))
- umem->hugetlb = 0;
+ const struct vm_operations_struct *vm_ops;
+ struct vm_area_struct *vma;
+ struct lease_direct *ld;
sg_set_page(sg, page_list[i], PAGE_SIZE, 0);
+ umem->npages++;
+
+ if (!vma_list)
+ continue;
+ vma = vma_list[i];
+
+ if (vma == vma_prev)
+ continue;
+ vma_prev = vma;
+
+ if (!is_vm_hugetlb_page(vma))
+ umem->hugetlb = 0;
+
+ if (!vma_is_dax(vma))
+ continue;
+
+ vm_ops = vma->vm_ops;
+ if (!vm_ops->lease_direct) {
+ dev_info(dma_dev, "DAX-RDMA requires a MAP_DIRECT mapping\n");
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+
+ if (!umem->iommu)
+ umem->iommu = dma_get_iommu_domain(dma_dev);
+ if (!umem->iommu) {
+ dev_info(dma_dev, "DAX-RDMA requires an iommu protected device\n");
+ ret = -EOPNOTSUPP;
+ goto out;
+ }
+ ld = vm_ops->lease_direct(vma, ib_umem_lease_break,
+ umem);
+ if (IS_ERR(ld)) {
+ ret = PTR_ERR(ld);
+ goto out;
+ }
+ list_add(&ld->list, &umem->leases);
}
/* preparing for next loop */
@@ -242,6 +302,12 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr,
free_page((unsigned long) page_list);
return ret < 0 ? ERR_PTR(ret) : umem;
+err_vmalist:
+ free_page((unsigned long) page_list);
+err_pagelist:
+ put_pid(umem->pid);
+ kfree(umem);
+ return ERR_PTR(-ENOMEM);
}
EXPORT_SYMBOL(ib_umem_get);
diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h
index 23159dd5be18..5048be012f96 100644
--- a/include/rdma/ib_umem.h
+++ b/include/rdma/ib_umem.h
@@ -34,6 +34,7 @@
#define IB_UMEM_H
#include <linux/list.h>
+#include <linux/iommu.h>
#include <linux/scatterlist.h>
#include <linux/workqueue.h>
@@ -55,6 +56,13 @@ struct ib_umem {
struct sg_table sg_head;
int nmap;
int npages;
+ /*
+ * Note: no lock protects this list since we assume memory
+ * registration never races unregistration for a given ib_umem
+ * instance.
+ */
+ struct list_head leases;
+ struct iommu_domain *iommu;
};
/* Returns the offset of the umem start relative to the first page. */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings
[not found] ` <150743537023.13602.3520782942682280917.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-10-08 6:45 ` kbuild test robot
[not found] ` <201710081447.sQSonloO%fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 4+ messages in thread
From: kbuild test robot @ 2017-10-08 6:45 UTC (permalink / raw)
To: Dan Williams
Cc: Jan Kara, Dave Chinner, J. Bruce Fields,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jeff Layton, Sean Hefty,
Ashok Raj, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Joerg Roedel,
Christoph Hellwig, Doug Ledford, Hal Rosenstock,
linux-api-u79uwXL29TY76Z2rM5mHXA, Darrick J. Wong,
linux-xfs-u79uwXL29TY76Z2rM5mHXA,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kbuild-all-JC7UmRfGjtg, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
David Woodhouse
Hi Dan,
[auto build test ERROR on rdma/master]
[also build test ERROR on v4.14-rc3 next-20170929]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Dan-Williams/iommu-up-level-sg_num_pages-from-amd-iommu/20171008-133505
base: https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master
config: i386-randconfig-n0-201741 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
>> drivers/infiniband/core/umem.c:39:29: fatal error: linux/mapdirect.h: No such file or directory
#include <linux/mapdirect.h>
^
compilation terminated.
vim +39 drivers/infiniband/core/umem.c
> 39 #include <linux/mapdirect.h>
40 #include <linux/export.h>
41 #include <linux/hugetlb.h>
42 #include <linux/slab.h>
43 #include <rdma/ib_umem_odp.h>
44
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings
[not found] ` <201710081447.sQSonloO%fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
@ 2017-10-08 15:49 ` Dan Williams
0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2017-10-08 15:49 UTC (permalink / raw)
To: kbuild test robot
Cc: kbuild-all-JC7UmRfGjtg,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org, Sean Hefty,
linux-xfs-u79uwXL29TY76Z2rM5mHXA, Jan Kara, Ashok Raj,
Darrick J. Wong, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Linux API,
Joerg Roedel, Dave Chinner, Jeff Moyer,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Christoph Hellwig, J. Bruce Fields, Linux MM, Doug Ledford,
Ross Zwisler, linux-fsdevel, Jeff Layton, David Woodhouse, Hal
On Sat, Oct 7, 2017 at 11:45 PM, kbuild test robot <lkp-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org> wrote:
> Hi Dan,
>
> [auto build test ERROR on rdma/master]
> [also build test ERROR on v4.14-rc3 next-20170929]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
This was a fixed up resend of patch [v7 11/12]. It's not clear how to
teach the kbuild robot to be aware of patch-replies to individual
patches in the series. I.e. reworked patches without resending the
complete series?
> url: https://github.com/0day-ci/linux/commits/Dan-Williams/iommu-up-level-sg_num_pages-from-amd-iommu/20171008-133505
> base: https://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git master
> config: i386-randconfig-n0-201741 (attached as .config)
> compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=i386
>
> All errors (new ones prefixed by >>):
>
>>> drivers/infiniband/core/umem.c:39:29: fatal error: linux/mapdirect.h: No such file or directory
> #include <linux/mapdirect.h>
mapdirect.h indeed does not exist when missing the earlier patches in
the series. It would be slick if the 0day-robot read the the
"in-reply-to" header and auto replaced a patch in a series, but that
would be a feature approaching magic.
> compilation terminated.
>
> vim +39 drivers/infiniband/core/umem.c
>
> > 39 #include <linux/mapdirect.h>
> 40 #include <linux/export.h>
> 41 #include <linux/hugetlb.h>
> 42 #include <linux/slab.h>
> 43 #include <rdma/ib_umem_odp.h>
> 44
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-10-08 15:49 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <150732937720.22363.1037155753760964267.stgit@dwillia2-desk3.amr.corp.intel.com>
2017-10-08 4:02 ` [PATCH v8 1/2] iommu: up-level sg_num_pages() from amd-iommu Dan Williams
2017-10-08 4:04 ` [PATCH v8 2/2] IB/core: use MAP_DIRECT to fix / enable RDMA to DAX mappings Dan Williams
[not found] ` <150743537023.13602.3520782942682280917.stgit-p8uTFz9XbKj2zm6wflaqv1nYeNYlB/vhral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-10-08 6:45 ` kbuild test robot
[not found] ` <201710081447.sQSonloO%fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-10-08 15:49 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).