* [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface
@ 2017-08-17 12:56 Joerg Roedel
2017-08-17 12:56 ` [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API Joerg Roedel
` (4 more replies)
0 siblings, 5 replies; 34+ messages in thread
From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA
Hi,
here is a patch-set to introduce an explicit interface to
the IOMMU-API to flush IOMMU and device IO/TLBs. Currently
the iommu_map(), iommu_map_sg(), and iommu_unmap() functions
have to make sure all IO/TLBs in the system are synchronized
with the page-table updates they made.
This is very inefficient in some scenarios, for example when
a large address space is unmapped and an IO/TLB flush has to
be done in every call of iommu_unmap(). Or in a scenario
where it makes sense to queue up some changes to the
page-tables and flush them together.
To optimize these scenarios, the need to synchronize with
the IOMMU and device TLBs has been removed from the
map/unmap functions of the IOMMU-API and an interface to
explicitly do the flushes has been introduced.
To make the conversion of existing users of the IOMMU-API
easier, new functions - iommu_map_sync(), iommu_map_sg_sync(),
and iommu_unmap_sync() - have been introduced. These
functions guarantee that the IO/TLBs are synchronized with
any page-table update when they return. The optimizations
possible with the new interface are subject to separate
patch-sets.
Patch 1 just renames a few functions in the AMD-Vi driver
that would otherwise collide with the new TLB-flush
functions from the IOMMU-API.
Patch 2 introduces the new IO/TLB Flush-Interface.
Patch 3-13 convert existing users of the IOMMU-API to use
the *_sync functions for now.
Please review.
Thanks,
Joerg
Joerg Roedel (13):
iommu/amd: Rename a few flush functions
iommu: Introduce Interface for IOMMU TLB Flushing
vfio/type1: Use sychronized interface of the IOMMU-API
iommu/dma: Use sychronized interface of the IOMMU-API
arm: dma-mapping: Use sychronized interface of the IOMMU-API
drm/etnaviv: Use sychronized interface of the IOMMU-API
drm/msm: Use sychronized interface of the IOMMU-API
drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API
drm/rockchip: Use sychronized interface of the IOMMU-API
drm/tegra: Use sychronized interface of the IOMMU-API
gpu: host1x: Use sychronized interface of the IOMMU-API
IB/usnic: Use sychronized interface of the IOMMU-API
remoteproc: Use sychronized interface of the IOMMU-API
arch/arm/mm/dma-mapping.c | 21 +++---
drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +--
drivers/gpu/drm/msm/msm_iommu.c | 5 +-
.../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 12 ++--
drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 6 +-
drivers/gpu/drm/tegra/drm.c | 6 +-
drivers/gpu/drm/tegra/gem.c | 6 +-
drivers/gpu/host1x/cdma.c | 6 +-
drivers/gpu/host1x/job.c | 6 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 10 +--
drivers/iommu/amd_iommu.c | 16 ++---
drivers/iommu/dma-iommu.c | 8 +--
drivers/iommu/iommu.c | 26 +++++++
drivers/remoteproc/remoteproc_core.c | 10 +--
drivers/vfio/vfio_iommu_type1.c | 38 +++++-----
include/linux/iommu.h | 80 +++++++++++++++++++++-
16 files changed, 189 insertions(+), 77 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 34+ messages in thread* [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API 2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-19 15:39 ` Rob Clark 2017-08-17 12:56 ` [PATCH 09/13] drm/rockchip: " Joerg Roedel ` (3 subsequent siblings) 4 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Rob Clark, David Airlie, linux-arm-msm, dri-devel, freedreno From: Joerg Roedel <jroedel@suse.de> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Rob Clark <robdclark@gmail.com> Cc: David Airlie <airlied@linux.ie> Cc: linux-arm-msm@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: freedreno@lists.freedesktop.org Signed-off-by: Joerg Roedel <jroedel@suse.de> --- drivers/gpu/drm/msm/msm_iommu.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index b23d336..b3525b7 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -64,7 +64,8 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, size_t ret; // pm_runtime_get_sync(mmu->dev); - ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); + ret = iommu_map_sg_sync(iommu->domain, iova, sgt->sgl, + sgt->nents, prot); // pm_runtime_put_sync(mmu->dev); WARN_ON(ret < 0); @@ -77,7 +78,7 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); pm_runtime_get_sync(mmu->dev); - iommu_unmap(iommu->domain, iova, len); + iommu_unmap_sync(iommu->domain, iova, len); pm_runtime_put_sync(mmu->dev); return 0; -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API 2017-08-17 12:56 ` [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API Joerg Roedel @ 2017-08-19 15:39 ` Rob Clark 0 siblings, 0 replies; 34+ messages in thread From: Rob Clark @ 2017-08-19 15:39 UTC (permalink / raw) To: Joerg Roedel Cc: iommu@lists.linux-foundation.org, Linux Kernel Mailing List, Suravee Suthikulpanit, Joerg Roedel, David Airlie, linux-arm-msm, dri-devel@lists.freedesktop.org, freedreno On Thu, Aug 17, 2017 at 8:56 AM, Joerg Roedel <joro@8bytes.org> wrote: > From: Joerg Roedel <jroedel@suse.de> > > The map and unmap functions of the IOMMU-API changed their > semantics: They do no longer guarantee that the hardware > TLBs are synchronized with the page-table updates they made. > > To make conversion easier, new synchronized functions have > been introduced which give these guarantees again until the > code is converted to use the new TLB-flush interface of the > IOMMU-API, which allows certain optimizations. > > But for now, just convert this code to use the synchronized > functions so that it will behave as before. I like the idea of decoupling tlb inv with map (well that doesn't really effect any recent hw) and unmap (which does need tlb flush).. and for a gpu driver it should be pretty easy to back up unmaps. I did slightly prefer adding a new iommu_{map,unmap}_async() rather than changing semantics of existing API, but either way: Acked-by: Rob Clark <robdclark@gmail.com> > Cc: Rob Clark <robdclark@gmail.com> > Cc: David Airlie <airlied@linux.ie> > Cc: linux-arm-msm@vger.kernel.org > Cc: dri-devel@lists.freedesktop.org > Cc: freedreno@lists.freedesktop.org > Signed-off-by: Joerg Roedel <jroedel@suse.de> > --- > drivers/gpu/drm/msm/msm_iommu.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c > index b23d336..b3525b7 100644 > --- a/drivers/gpu/drm/msm/msm_iommu.c > +++ b/drivers/gpu/drm/msm/msm_iommu.c > @@ -64,7 +64,8 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, > size_t ret; > > // pm_runtime_get_sync(mmu->dev); > - ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); > + ret = iommu_map_sg_sync(iommu->domain, iova, sgt->sgl, > + sgt->nents, prot); > // pm_runtime_put_sync(mmu->dev); > WARN_ON(ret < 0); > > @@ -77,7 +78,7 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, > struct msm_iommu *iommu = to_msm_iommu(mmu); > > pm_runtime_get_sync(mmu->dev); > - iommu_unmap(iommu->domain, iova, len); > + iommu_unmap_sync(iommu->domain, iova, len); > pm_runtime_put_sync(mmu->dev); > > return 0; > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 09/13] drm/rockchip: Use sychronized interface of the IOMMU-API 2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel 2017-08-17 12:56 ` [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 12:56 ` [PATCH 12/13] IB/usnic: " Joerg Roedel ` (2 subsequent siblings) 4 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Mark Yao, David Airlie, Heiko Stuebner, dri-devel, linux-arm-kernel, linux-rockchip From: Joerg Roedel <jroedel@suse.de> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Mark Yao <mark.yao@rock-chips.com> Cc: David Airlie <airlied@linux.ie> Cc: Heiko Stuebner <heiko@sntech.de> Cc: dri-devel@lists.freedesktop.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-rockchip@lists.infradead.org Signed-off-by: Joerg Roedel <jroedel@suse.de> --- drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c index b74ac71..6d28224 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c @@ -41,8 +41,8 @@ static int rockchip_gem_iommu_map(struct rockchip_gem_object *rk_obj) rk_obj->dma_addr = rk_obj->mm.start; - ret = iommu_map_sg(private->domain, rk_obj->dma_addr, rk_obj->sgt->sgl, - rk_obj->sgt->nents, prot); + ret = iommu_map_sg_sync(private->domain, rk_obj->dma_addr, + rk_obj->sgt->sgl, rk_obj->sgt->nents, prot); if (ret < rk_obj->base.size) { DRM_ERROR("failed to map buffer: size=%zd request_size=%zd\n", ret, rk_obj->base.size); @@ -67,7 +67,7 @@ static int rockchip_gem_iommu_unmap(struct rockchip_gem_object *rk_obj) struct drm_device *drm = rk_obj->base.dev; struct rockchip_drm_private *private = drm->dev_private; - iommu_unmap(private->domain, rk_obj->dma_addr, rk_obj->size); + iommu_unmap_sync(private->domain, rk_obj->dma_addr, rk_obj->size); mutex_lock(&private->mm_lock); -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 12/13] IB/usnic: Use sychronized interface of the IOMMU-API 2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel 2017-08-17 12:56 ` [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API Joerg Roedel 2017-08-17 12:56 ` [PATCH 09/13] drm/rockchip: " Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 12:56 ` [PATCH 13/13] remoteproc: " Joerg Roedel [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 4 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Christian Benvenuti, Dave Goodell, Doug Ledford, Sean Hefty, Hal Rosenstock, linux-rdma From: Joerg Roedel <jroedel@suse.de> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Christian Benvenuti <benve@cisco.com> Cc: Dave Goodell <dgoodell@cisco.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> --- drivers/infiniband/hw/usnic/usnic_uiom.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index c49db7c..e53dc38 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -205,7 +205,7 @@ static void usnic_uiom_unmap_sorted_intervals(struct list_head *intervals, while (size > 0) { /* Workaround for RH 970401 */ usnic_dbg("va 0x%lx size 0x%lx", va, PAGE_SIZE); - iommu_unmap(pd->domain, va, PAGE_SIZE); + iommu_unmap_sync(pd->domain, va, PAGE_SIZE); va += PAGE_SIZE; size -= PAGE_SIZE; } @@ -282,8 +282,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head *intervals, size = pa_end - pa_start + PAGE_SIZE; usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x", va_start, &pa_start, size, flags); - err = iommu_map(pd->domain, va_start, pa_start, - size, flags); + err = iommu_map_sync(pd->domain, va_start, + pa_start, size, flags); if (err) { usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", va_start, &pa_start, size, err); @@ -299,8 +299,8 @@ static int usnic_uiom_map_sorted_intervals(struct list_head *intervals, size = pa - pa_start + PAGE_SIZE; usnic_dbg("va 0x%lx pa %pa size 0x%zx flags 0x%x\n", va_start, &pa_start, size, flags); - err = iommu_map(pd->domain, va_start, pa_start, - size, flags); + err = iommu_map_sync(pd->domain, va_start, + pa_start, size, flags); if (err) { usnic_err("Failed to map va 0x%lx pa %pa size 0x%zx with err %d\n", va_start, &pa_start, size, err); -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 13/13] remoteproc: Use sychronized interface of the IOMMU-API 2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel ` (2 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 12/13] IB/usnic: " Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-24 19:06 ` Bjorn Andersson [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 4 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu Cc: linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Ohad Ben-Cohen, Bjorn Andersson, linux-remoteproc From: Joerg Roedel <jroedel@suse.de> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Ohad Ben-Cohen <ohad@wizery.com> Cc: Bjorn Andersson <bjorn.andersson@linaro.org> Cc: linux-remoteproc@vger.kernel.org Signed-off-by: Joerg Roedel <jroedel@suse.de> --- drivers/remoteproc/remoteproc_core.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index 564061d..16db19c 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -553,7 +553,8 @@ static int rproc_handle_devmem(struct rproc *rproc, struct fw_rsc_devmem *rsc, if (!mapping) return -ENOMEM; - ret = iommu_map(rproc->domain, rsc->da, rsc->pa, rsc->len, rsc->flags); + ret = iommu_map_sync(rproc->domain, rsc->da, + rsc->pa, rsc->len, rsc->flags); if (ret) { dev_err(dev, "failed to map devmem: %d\n", ret); goto out; @@ -661,8 +662,8 @@ static int rproc_handle_carveout(struct rproc *rproc, goto dma_free; } - ret = iommu_map(rproc->domain, rsc->da, dma, rsc->len, - rsc->flags); + ret = iommu_map_sync(rproc->domain, rsc->da, dma, rsc->len, + rsc->flags); if (ret) { dev_err(dev, "iommu_map failed: %d\n", ret); goto free_mapping; @@ -823,7 +824,8 @@ static void rproc_resource_cleanup(struct rproc *rproc) list_for_each_entry_safe(entry, tmp, &rproc->mappings, node) { size_t unmapped; - unmapped = iommu_unmap(rproc->domain, entry->da, entry->len); + unmapped = iommu_unmap_sync(rproc->domain, entry->da, + entry->len); if (unmapped != entry->len) { /* nothing much to do besides complaining */ dev_err(dev, "failed to unmap %u/%zu\n", entry->len, -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 13/13] remoteproc: Use sychronized interface of the IOMMU-API 2017-08-17 12:56 ` [PATCH 13/13] remoteproc: " Joerg Roedel @ 2017-08-24 19:06 ` Bjorn Andersson 0 siblings, 0 replies; 34+ messages in thread From: Bjorn Andersson @ 2017-08-24 19:06 UTC (permalink / raw) To: Joerg Roedel Cc: iommu, linux-kernel, Suravee Suthikulpanit, Joerg Roedel, Ohad Ben-Cohen, linux-remoteproc On Thu 17 Aug 05:56 PDT 2017, Joerg Roedel wrote: > From: Joerg Roedel <jroedel@suse.de> > > The map and unmap functions of the IOMMU-API changed their > semantics: They do no longer guarantee that the hardware > TLBs are synchronized with the page-table updates they made. > > To make conversion easier, new synchronized functions have > been introduced which give these guarantees again until the > code is converted to use the new TLB-flush interface of the > IOMMU-API, which allows certain optimizations. > > But for now, just convert this code to use the synchronized > functions so that it will behave as before. > > Cc: Ohad Ben-Cohen <ohad@wizery.com> > Cc: Bjorn Andersson <bjorn.andersson@linaro.org> > Cc: linux-remoteproc@vger.kernel.org > Signed-off-by: Joerg Roedel <jroedel@suse.de> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Regards, Bjorn > --- > drivers/remoteproc/remoteproc_core.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c > index 564061d..16db19c 100644 > --- a/drivers/remoteproc/remoteproc_core.c > +++ b/drivers/remoteproc/remoteproc_core.c > @@ -553,7 +553,8 @@ static int rproc_handle_devmem(struct rproc *rproc, struct fw_rsc_devmem *rsc, > if (!mapping) > return -ENOMEM; > > - ret = iommu_map(rproc->domain, rsc->da, rsc->pa, rsc->len, rsc->flags); > + ret = iommu_map_sync(rproc->domain, rsc->da, > + rsc->pa, rsc->len, rsc->flags); > if (ret) { > dev_err(dev, "failed to map devmem: %d\n", ret); > goto out; > @@ -661,8 +662,8 @@ static int rproc_handle_carveout(struct rproc *rproc, > goto dma_free; > } > > - ret = iommu_map(rproc->domain, rsc->da, dma, rsc->len, > - rsc->flags); > + ret = iommu_map_sync(rproc->domain, rsc->da, dma, rsc->len, > + rsc->flags); > if (ret) { > dev_err(dev, "iommu_map failed: %d\n", ret); > goto free_mapping; > @@ -823,7 +824,8 @@ static void rproc_resource_cleanup(struct rproc *rproc) > list_for_each_entry_safe(entry, tmp, &rproc->mappings, node) { > size_t unmapped; > > - unmapped = iommu_unmap(rproc->domain, entry->da, entry->len); > + unmapped = iommu_unmap_sync(rproc->domain, entry->da, > + entry->len); > if (unmapped != entry->len) { > /* nothing much to do besides complaining */ > dev_err(dev, "failed to unmap %u/%zu\n", entry->len, > -- > 2.7.4 > ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>]
* [PATCH 01/13] iommu/amd: Rename a few flush functions [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 12:56 ` [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing Joerg Roedel ` (8 subsequent siblings) 9 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, linux-kernel-u79uwXL29TY76Z2rM5mHXA From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> Rename a few iommu cache-flush functions that start with iommu_ to start with amd_iommu now. This is to prevent name collisions with generic iommu code later on. Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/iommu/amd_iommu.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 354cbd6..5469457 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -1165,7 +1165,7 @@ static int iommu_flush_dte(struct amd_iommu *iommu, u16 devid) return iommu_queue_command(iommu, &cmd); } -static void iommu_flush_dte_all(struct amd_iommu *iommu) +static void amd_iommu_flush_dte_all(struct amd_iommu *iommu) { u32 devid; @@ -1179,7 +1179,7 @@ static void iommu_flush_dte_all(struct amd_iommu *iommu) * This function uses heavy locking and may disable irqs for some time. But * this is no issue because it is only called during resume. */ -static void iommu_flush_tlb_all(struct amd_iommu *iommu) +static void amd_iommu_flush_tlb_all(struct amd_iommu *iommu) { u32 dom_id; @@ -1193,7 +1193,7 @@ static void iommu_flush_tlb_all(struct amd_iommu *iommu) iommu_completion_wait(iommu); } -static void iommu_flush_all(struct amd_iommu *iommu) +static void amd_iommu_flush_all(struct amd_iommu *iommu) { struct iommu_cmd cmd; @@ -1212,7 +1212,7 @@ static void iommu_flush_irt(struct amd_iommu *iommu, u16 devid) iommu_queue_command(iommu, &cmd); } -static void iommu_flush_irt_all(struct amd_iommu *iommu) +static void amd_iommu_flush_irt_all(struct amd_iommu *iommu) { u32 devid; @@ -1225,11 +1225,11 @@ static void iommu_flush_irt_all(struct amd_iommu *iommu) void iommu_flush_all_caches(struct amd_iommu *iommu) { if (iommu_feature(iommu, FEATURE_IA)) { - iommu_flush_all(iommu); + amd_iommu_flush_all(iommu); } else { - iommu_flush_dte_all(iommu); - iommu_flush_irt_all(iommu); - iommu_flush_tlb_all(iommu); + amd_iommu_flush_dte_all(iommu); + amd_iommu_flush_irt_all(iommu); + amd_iommu_flush_tlb_all(iommu); } } -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 2017-08-17 12:56 ` [PATCH 01/13] iommu/amd: Rename a few flush functions Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel [not found] ` <1502974596-23835-3-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 2017-08-17 12:56 ` [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API Joerg Roedel ` (7 subsequent siblings) 9 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> With the current IOMMU-API the hardware TLBs have to be flushed in every iommu_map(), iommu_map_sg(), and iommu_unmap() call. For unmapping large amounts of address space, like it happens when a KVM domain with assigned devices is destroyed, this causes thousands of unnecessary TLB flushes in the IOMMU hardware because the unmap call-back runs for every unmapped physical page. With the TLB Flush Interface introduced here the need to clean the hardware TLBs is removed from the iommu_map/unmap functions. Users now have to explicitly call these functions to sync the page-table changes to the hardware. Three functions are introduced: * iommu_flush_tlb_all() - Flushes all TLB entries associated with that domain. TLBs entries are flushed when this function returns. * iommu_tlb_range_add() - This will add a given range to the flush queue for this domain. * iommu_tlb_sync() - Flushes all queued ranges from the hardware TLBs. Returns when the flush is finished. The semantic of this interface is intentionally similar to the iommu_gather_ops from the io-pgtable code. Additionally, this patch introduces synchronized versions of the iommu_map(), iommu_map_sg(), and iommu_unmap() functions. They will be used by current users of the IOMMU-API, before they are optimized to the unsynchronized versions. Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/iommu/iommu.c | 26 +++++++++++++++++ include/linux/iommu.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 105 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3f6ea16..816e248 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -527,6 +527,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group, } + iommu_flush_tlb_all(domain); + out: iommu_put_resv_regions(dev, &mappings); @@ -1556,6 +1558,18 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, } EXPORT_SYMBOL_GPL(iommu_map); +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + int ret = iommu_map(domain, iova, paddr, size, prot); + + iommu_tlb_range_add(domain, iova, size); + iommu_tlb_sync(domain); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_map_sync); + size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { size_t unmapped_page, unmapped = 0; @@ -1608,6 +1622,18 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) } EXPORT_SYMBOL_GPL(iommu_unmap); +size_t iommu_unmap_sync(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + size_t ret = iommu_unmap(domain, iova, size); + + iommu_tlb_range_add(domain, iova, size); + iommu_tlb_sync(domain); + + return ret; +} +EXPORT_SYMBOL_GPL(iommu_unmap_sync); + size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot) { diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2cb54ad..7f9c114 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -167,6 +167,10 @@ struct iommu_resv_region { * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain * @map_sg: map a scatter-gather list of physically contiguous memory chunks + * @flush_tlb_all: Synchronously flush all hardware TLBs for this domain + * @tlb_range_add: Add a given iova range to the flush queue for this domain + * @tlb_sync: Flush all queued ranges from the hardware TLBs and empty flush + * queue * to an iommu domain * @iova_to_phys: translate iova to physical address * @add_device: add device to iommu grouping @@ -199,6 +203,10 @@ struct iommu_ops { size_t size); size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot); + void (*flush_iotlb_all)(struct iommu_domain *domain); + void (*iotlb_range_add)(struct iommu_domain *domain, + unsigned long iova, size_t size); + void (*iotlb_sync)(struct iommu_domain *domain); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); int (*add_device)(struct device *dev); void (*remove_device)(struct device *dev); @@ -285,8 +293,12 @@ extern void iommu_detach_device(struct iommu_domain *domain, extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, phys_addr_t paddr, size_t size, int prot); +extern int iommu_map_sync(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot); extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, - size_t size); + size_t size); +extern size_t iommu_unmap_sync(struct iommu_domain *domain, + unsigned long iova, size_t size); extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg,unsigned int nents, int prot); @@ -343,6 +355,25 @@ extern void iommu_domain_window_disable(struct iommu_domain *domain, u32 wnd_nr) extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev, unsigned long iova, int flags); +static inline void iommu_flush_tlb_all(struct iommu_domain *domain) +{ + if (domain->ops->flush_iotlb_all) + domain->ops->flush_iotlb_all(domain); +} + +static inline void iommu_tlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ + if (domain->ops->iotlb_range_add) + domain->ops->iotlb_range_add(domain, iova, size); +} + +static inline void iommu_tlb_sync(struct iommu_domain *domain) +{ + if (domain->ops->iotlb_sync) + domain->ops->iotlb_sync(domain); +} + static inline size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot) @@ -350,6 +381,20 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain, return domain->ops->map_sg(domain, iova, sg, nents, prot); } +static inline size_t iommu_map_sg_sync(struct iommu_domain *domain, + unsigned long iova, + struct scatterlist *sg, + unsigned int nents, int prot) +{ + size_t size = domain->ops->map_sg(domain, iova, sg, nents, prot); + if (size > 0) { + iommu_tlb_range_add(domain, iova, size); + iommu_tlb_sync(domain); + } + + return size; +} + /* PCI device grouping function */ extern struct iommu_group *pci_device_group(struct device *dev); /* Generic device grouping function */ @@ -430,12 +475,24 @@ static inline int iommu_map(struct iommu_domain *domain, unsigned long iova, return -ENODEV; } +static inline int iommu_map_sync(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, int gfp_order, int prot) +{ + return -ENODEV; +} + static inline int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order) { return -ENODEV; } +static inline int iommu_unmap_sync(struct iommu_domain *domain, unsigned long iova, + int gfp_order) +{ + return -ENODEV; +} + static inline size_t iommu_map_sg(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot) @@ -443,6 +500,27 @@ static inline size_t iommu_map_sg(struct iommu_domain *domain, return -ENODEV; } +static inline size_t iommu_map_sg_sync(struct iommu_domain *domain, + unsigned long iova, + struct scatterlist *sg, + unsigned int nents, int prot) +{ + return -ENODEV; +} + +static inline void iommu_flush_tlb_all(struct iommu_domain *domain) +{ +} + +static inline void iommu_tlb_range_add(struct iommu_domain *domain, + unsigned long iova, size_t size) +{ +} + +static inline void iommu_tlb_sync(struct iommu_domain *domain) +{ +} + static inline int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr, phys_addr_t paddr, u64 size, int prot) -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
[parent not found: <1502974596-23835-3-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing [not found] ` <1502974596-23835-3-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> @ 2017-08-17 16:32 ` Will Deacon [not found] ` <20170817163234.GB30719-5wv7dgnIgG8@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Will Deacon @ 2017-08-17 16:32 UTC (permalink / raw) To: Joerg Roedel Cc: Joerg Roedel, linux-kernel-u79uwXL29TY76Z2rM5mHXA, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Hi Joerg, I really like the idea of this, but I have a couple of questions and comments below. On Thu, Aug 17, 2017 at 02:56:25PM +0200, Joerg Roedel wrote: > From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> > > With the current IOMMU-API the hardware TLBs have to be > flushed in every iommu_map(), iommu_map_sg(), and > iommu_unmap() call. > > For unmapping large amounts of address space, like it > happens when a KVM domain with assigned devices is > destroyed, this causes thousands of unnecessary TLB flushes > in the IOMMU hardware because the unmap call-back runs for > every unmapped physical page. > > With the TLB Flush Interface introduced here the need to > clean the hardware TLBs is removed from the iommu_map/unmap > functions. Users now have to explicitly call these functions > to sync the page-table changes to the hardware. > > Three functions are introduced: > > * iommu_flush_tlb_all() - Flushes all TLB entries > associated with that > domain. TLBs entries are > flushed when this function > returns. > > * iommu_tlb_range_add() - This will add a given > range to the flush queue > for this domain. > > * iommu_tlb_sync() - Flushes all queued ranges from > the hardware TLBs. Returns when > the flush is finished. > > The semantic of this interface is intentionally similar to > the iommu_gather_ops from the io-pgtable code. > > Additionally, this patch introduces synchronized versions of > the iommu_map(), iommu_map_sg(), and iommu_unmap() > functions. They will be used by current users of the > IOMMU-API, before they are optimized to the unsynchronized > versions. > > Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> > Cc: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> > Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> > Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> > --- > drivers/iommu/iommu.c | 26 +++++++++++++++++ > include/linux/iommu.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++- > 2 files changed, 105 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 3f6ea16..816e248 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -527,6 +527,8 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group, > > } > > + iommu_flush_tlb_all(domain); > + > out: > iommu_put_resv_regions(dev, &mappings); > > @@ -1556,6 +1558,18 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, > } > EXPORT_SYMBOL_GPL(iommu_map); > > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova, > + phys_addr_t paddr, size_t size, int prot) > +{ > + int ret = iommu_map(domain, iova, paddr, size, prot); > + > + iommu_tlb_range_add(domain, iova, size); > + iommu_tlb_sync(domain); Many IOMMUs don't need these callbacks on ->map operations, but they won't be able to distinguish them easily with this API. Could you add a flags parameter or something to the iommu_tlb_* functions, please? > + > + return ret; > +} > +EXPORT_SYMBOL_GPL(iommu_map_sync); > + > size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) > { > size_t unmapped_page, unmapped = 0; > @@ -1608,6 +1622,18 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) > } > EXPORT_SYMBOL_GPL(iommu_unmap); > > +size_t iommu_unmap_sync(struct iommu_domain *domain, > + unsigned long iova, size_t size) > +{ > + size_t ret = iommu_unmap(domain, iova, size); > + > + iommu_tlb_range_add(domain, iova, size); > + iommu_tlb_sync(domain); I think we will struggle to implement this efficiently on ARM SMMUv3. The way invalidation works there is that there is a single in-memory command queue into which we can put TLB invalidation commands (they are inserted under a lock). These are then processed asynchronously by the hardware, and you can complete them by inserting a SYNC command and waiting for that to be consumed by the SMMU. Sounds like a perfect fit, right? The problem is that we want to add those invalidation commands as early as possible, so that they can be processed by the hardware concurrently with us unmapping other pages. That means adding the invalidation commands in the ->unmap callback and not bothering to implement ->iotlb_range_add callback at all. Then, we will do the sync in ->iotlb_sync. This falls apart if somebody decides to use iommu_flush_tlb_all(), where we would prefer not to insert all of the invalidation commands in unmap and instead insert a single invalidate-all command, followed up with a SYNC. In other words, we really need the information about the invalidation as part of the unmap call. Any ideas? Will ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20170817163234.GB30719-5wv7dgnIgG8@public.gmane.org>]
* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing [not found] ` <20170817163234.GB30719-5wv7dgnIgG8@public.gmane.org> @ 2017-08-17 16:50 ` Joerg Roedel [not found] ` <20170817165040.GK2853-l3A5Bk7waGM@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 16:50 UTC (permalink / raw) To: Will Deacon Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi Will, On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote: > I really like the idea of this, but I have a couple of questions and > comments below. Great, this together with the common iova-flush it should make it possible to solve the performance problems of the dma-iommu code. > > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova, > > + phys_addr_t paddr, size_t size, int prot) > > +{ > > + int ret = iommu_map(domain, iova, paddr, size, prot); > > + > > + iommu_tlb_range_add(domain, iova, size); > > + iommu_tlb_sync(domain); > > Many IOMMUs don't need these callbacks on ->map operations, but they won't > be able to distinguish them easily with this API. Could you add a flags > parameter or something to the iommu_tlb_* functions, please? Yeah, this is only needed for virtualized IOMMUs that have a non-present cache. My idea was to let the iommu-drivers tell the common code whether the iommu needs it and the code above just checks a flag and omits the calls to the flush-functions then. Problem currently is how to get this information from 'struct iommu_device' to 'struct iommu_domain'. As a workaround I consider a per-domain flag in the iommu drivers which checks whether any unmap has happened and just do nothing on the flush-call-back if there were none. > I think we will struggle to implement this efficiently on ARM SMMUv3. The > way invalidation works there is that there is a single in-memory command > queue into which we can put TLB invalidation commands (they are inserted > under a lock). These are then processed asynchronously by the hardware, and > you can complete them by inserting a SYNC command and waiting for that to > be consumed by the SMMU. Sounds like a perfect fit, right? Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d. > The problem is that we want to add those invalidation commands as early > as possible, so that they can be processed by the hardware concurrently > with us unmapping other pages. I think that's a bad idea, because then you re-introduce the performance problems again because everyone will spin on the cmd-queue lock in the unmap path of the dma-api. > That means adding the invalidation commands in the ->unmap callback > and not bothering to implement ->iotlb_range_add callback at all. > Then, we will do the sync in ->iotlb_sync. This falls apart if > somebody decides to use iommu_flush_tlb_all(), where we would prefer > not to insert all of the invalidation commands in unmap and instead > insert a single invalidate-all command, followed up with a SYNC. This problem can be solved with the deferred iova flushing code I posted to the ML. When a queue fills up, iommu_flush_tlb_all() is called and every entry that was unmapped before can be released. This works well on x86, are there reasons it wouldn't on ARM? Regards, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20170817165040.GK2853-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing [not found] ` <20170817165040.GK2853-l3A5Bk7waGM@public.gmane.org> @ 2017-08-17 17:17 ` Will Deacon 2017-08-17 21:20 ` Joerg Roedel 0 siblings, 1 reply; 34+ messages in thread From: Will Deacon @ 2017-08-17 17:17 UTC (permalink / raw) To: Joerg Roedel Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi Joerg, On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote: > On Thu, Aug 17, 2017 at 05:32:35PM +0100, Will Deacon wrote: > > I really like the idea of this, but I have a couple of questions and > > comments below. > > Great, this together with the common iova-flush it should make it > possible to solve the performance problems of the dma-iommu code. > > > > +int iommu_map_sync(struct iommu_domain *domain, unsigned long iova, > > > + phys_addr_t paddr, size_t size, int prot) > > > +{ > > > + int ret = iommu_map(domain, iova, paddr, size, prot); > > > + > > > + iommu_tlb_range_add(domain, iova, size); > > > + iommu_tlb_sync(domain); > > > > Many IOMMUs don't need these callbacks on ->map operations, but they won't > > be able to distinguish them easily with this API. Could you add a flags > > parameter or something to the iommu_tlb_* functions, please? > > Yeah, this is only needed for virtualized IOMMUs that have a non-present > cache. My idea was to let the iommu-drivers tell the common code whether > the iommu needs it and the code above just checks a flag and omits the > calls to the flush-functions then. > > Problem currently is how to get this information from > 'struct iommu_device' to 'struct iommu_domain'. As a workaround I > consider a per-domain flag in the iommu drivers which checks whether any > unmap has happened and just do nothing on the flush-call-back if there > were none. Given that this can all happen concurrently, I really don't like the idea of having to track things with a flag. We'd end up introducing atomics and/or over-invalidating the TLBs. > > I think we will struggle to implement this efficiently on ARM SMMUv3. The > > way invalidation works there is that there is a single in-memory command > > queue into which we can put TLB invalidation commands (they are inserted > > under a lock). These are then processed asynchronously by the hardware, and > > you can complete them by inserting a SYNC command and waiting for that to > > be consumed by the SMMU. Sounds like a perfect fit, right? > > Yes, its basically the same as way as it works on AMD-Vi and Intel VT-d. > > > The problem is that we want to add those invalidation commands as early > > as possible, so that they can be processed by the hardware concurrently > > with us unmapping other pages. > > I think that's a bad idea, because then you re-introduce the performance > problems again because everyone will spin on the cmd-queue lock in the > unmap path of the dma-api. We don't actually tend to see issues adding the TLB invalidation commands under the lock -- the vast majority of the overhead comes from the SYNC. Besides, I don't see how adding the commands in the ->iotlb_range_add callback is any better: it still happens on unmap and it still needs to take the lock. If we had something like an ->unmap_all_sync callback, we could incorporate the TLB invalidation into that. > > That means adding the invalidation commands in the ->unmap callback > > and not bothering to implement ->iotlb_range_add callback at all. > > Then, we will do the sync in ->iotlb_sync. This falls apart if > > somebody decides to use iommu_flush_tlb_all(), where we would prefer > > not to insert all of the invalidation commands in unmap and instead > > insert a single invalidate-all command, followed up with a SYNC. > > This problem can be solved with the deferred iova flushing code I posted > to the ML. When a queue fills up, iommu_flush_tlb_all() is called and > every entry that was unmapped before can be released. This works well on > x86, are there reasons it wouldn't on ARM? There are a few reasons I'm not rushing to move to the deferred flushing code for ARM: 1. The performance numbers we have suggest that we can achieve near-native performance without needing to do that. 2. We can free page-table pages in unmap, but that's not safe if we defer flushing 3. Flushing the whole TLB is undesirable and not something we currently need to do 4. There are security implications of deferring the unmap and I'm aware of a security research group that use this to gain root privileges. 5. *If* performance figures end up showing that deferring the flush is worthwhile, I would rather use an RCU-based approach for protecting the page tables, like we do on the CPU. Will ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing 2017-08-17 17:17 ` Will Deacon @ 2017-08-17 21:20 ` Joerg Roedel [not found] ` <20170817212054.GL2853-l3A5Bk7waGM@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 21:20 UTC (permalink / raw) To: Will Deacon Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit, Alex Williamson, Robin Murphy Hi Will, On Thu, Aug 17, 2017 at 06:17:05PM +0100, Will Deacon wrote: > On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote: > > Problem currently is how to get this information from > > 'struct iommu_device' to 'struct iommu_domain'. As a workaround I > > consider a per-domain flag in the iommu drivers which checks whether any > > unmap has happened and just do nothing on the flush-call-back if there > > were none. > > Given that this can all happen concurrently, I really don't like the idea of > having to track things with a flag. We'd end up introducing atomics and/or > over-invalidating the TLBs. Okay, I look into a better solution for that. > We don't actually tend to see issues adding the TLB invalidation commands > under the lock -- the vast majority of the overhead comes from the SYNC. > Besides, I don't see how adding the commands in the ->iotlb_range_add > callback is any better: it still happens on unmap and it still needs to > take the lock. With the deferred flushing you don't flush anything in the unmap path in most cases. All you do there is to add the unmapped iova-range to a per-cpu list (its actually a ring-buffer). Only when that buffer is full you do a flush_tlb_all() on the domain and then free all the iova ranges. With the flush-counters you can also see which entries in your buffer have already been flushed from the IO/TLB by another CPU, so that you can release them right away without any further flush. This way its less likely that the buffer fills up. In my tests on x86 I got the flush-rate down to ~1800 flushes/sec at a network packet rate of 1.45 million pps. > There are a few reasons I'm not rushing to move to the deferred flushing > code for ARM: > > 1. The performance numbers we have suggest that we can achieve near-native > performance without needing to do that. Hard to believe when all CPUs fight for the cmd-buffer lock, especially when you have around 96 CPUs :) Can you share the performance numbers you have and what you measured? > 2. We can free page-table pages in unmap, but that's not safe if we defer > flushing Right, VT-d has the same problem and solved it with a free-list of pages that is passed to the deferred flushing code. When the IO/TLB is flushed it calls back into the driver which then frees the pages. > 3. Flushing the whole TLB is undesirable and not something we currently > need to do Is the TLB-refill cost higher than the time needed to add a flush-command for every unmapped range? > 4. There are security implications of deferring the unmap and I'm aware > of a security research group that use this to gain root privileges. Interesting, can you share more about that? > 5. *If* performance figures end up showing that deferring the flush is > worthwhile, I would rather use an RCU-based approach for protecting > the page tables, like we do on the CPU. Yeah, I don't like the entry_dtor_cb() I introduced for that very much, so if there are better solutions I am all ears. Regards, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20170817212054.GL2853-l3A5Bk7waGM@public.gmane.org>]
* Re: [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing [not found] ` <20170817212054.GL2853-l3A5Bk7waGM@public.gmane.org> @ 2017-08-18 15:16 ` Will Deacon 0 siblings, 0 replies; 34+ messages in thread From: Will Deacon @ 2017-08-18 15:16 UTC (permalink / raw) To: Joerg Roedel Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi Joerg, On Thu, Aug 17, 2017 at 11:20:54PM +0200, Joerg Roedel wrote: > On Thu, Aug 17, 2017 at 06:17:05PM +0100, Will Deacon wrote: > > On Thu, Aug 17, 2017 at 06:50:40PM +0200, Joerg Roedel wrote: > > > Problem currently is how to get this information from > > > 'struct iommu_device' to 'struct iommu_domain'. As a workaround I > > > consider a per-domain flag in the iommu drivers which checks whether any > > > unmap has happened and just do nothing on the flush-call-back if there > > > were none. > > > > Given that this can all happen concurrently, I really don't like the idea of > > having to track things with a flag. We'd end up introducing atomics and/or > > over-invalidating the TLBs. > > Okay, I look into a better solution for that. Thanks. One possibility is that IOMMU drivers requiring TLB invalidation on ->map can set a flag on the domain when they allocate it, which the IOMMU core can test for later on. > > We don't actually tend to see issues adding the TLB invalidation commands > > under the lock -- the vast majority of the overhead comes from the SYNC. > > Besides, I don't see how adding the commands in the ->iotlb_range_add > > callback is any better: it still happens on unmap and it still needs to > > take the lock. > > With the deferred flushing you don't flush anything in the unmap path in > most cases. All you do there is to add the unmapped iova-range to a > per-cpu list (its actually a ring-buffer). Only when that buffer is full > you do a flush_tlb_all() on the domain and then free all the iova > ranges. > > With the flush-counters you can also see which entries in your buffer > have already been flushed from the IO/TLB by another CPU, so that you > can release them right away without any further flush. This way its less > likely that the buffer fills up. > > In my tests on x86 I got the flush-rate down to ~1800 flushes/sec at a > network packet rate of 1.45 million pps. > > > There are a few reasons I'm not rushing to move to the deferred flushing > > code for ARM: > > > > 1. The performance numbers we have suggest that we can achieve near-native > > performance without needing to do that. > > Hard to believe when all CPUs fight for the cmd-buffer lock, especially > when you have around 96 CPUs :) Can you share the performance numbers > you have and what you measured? I can only point to numbers that have already been posted to the list, but the numbers in this thread here are promising: https://marc.info/?i=c1d85f28-c57b-4414-3504-16afb3a19ce0%40codeaurora.org Note that patch 1 in that series doesn't remove the locking, it just removes the heavy barrier instruction after advancing the queue pointer (and we can easily do that with your series, deferring it to the sync). As I said, if the locking does turn out to be a problem and this is backed up by profiling data, then I'll look into it, but at the moment it doesn't appear to be the case. Just a thought, but if we could return a token from unmap to pass to the TLB invalidation functions, then it would be possible to do a spin_trylock on the command queue lock in ->unmap. If you get the lock, then you put the command in, if you don't then you set something in the token and the add_range would see that flag and insert the commands then. > > 2. We can free page-table pages in unmap, but that's not safe if we defer > > flushing > > Right, VT-d has the same problem and solved it with a free-list of pages > that is passed to the deferred flushing code. When the IO/TLB is flushed > it calls back into the driver which then frees the pages. There are some situations where we cannot defer. For example, if we map 2MB at the PMD level in the page table, but 4k of that region is then unmapped. In this case, we have to allocate a new level but we cannot plumb it into the table without performing (and completing) TLB maintenance. > > 3. Flushing the whole TLB is undesirable and not something we currently > > need to do > > Is the TLB-refill cost higher than the time needed to add a > flush-command for every unmapped range? The TLB-refill cost is likely to be significantly higher for an SMMU than the CPU, because the TLBs tend to be distributed with a centralised table walker. Furthermore, doing an invalidate-all on a domain may cause other masters in the domain to miss a deadline, which is a common complaint we've had from graphics folks in the past (who basically pin their buffers and rely on never missing). > > 4. There are security implications of deferring the unmap and I'm aware > > of a security research group that use this to gain root privileges. > > Interesting, can you share more about that? Unfortunately, the paper isn't public yet so I can't say more right now. When it's published, I'll point you to it! > > 5. *If* performance figures end up showing that deferring the flush is > > worthwhile, I would rather use an RCU-based approach for protecting > > the page tables, like we do on the CPU. > > Yeah, I don't like the entry_dtor_cb() I introduced for that very much, so if > there are better solutions I am all ears. I think it should be possible to protect the page tables with RCU and then run the TLB invalidation, freeing and IOVA reclaim as part of the callback. It's not simple to implement, though, because you run into concurrency issues where the callback can run in parallel with unmappers etc. Will ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 2017-08-17 12:56 ` [PATCH 01/13] iommu/amd: Rename a few flush functions Joerg Roedel 2017-08-17 12:56 ` [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 12:56 ` [PATCH 04/13] iommu/dma: " Joerg Roedel ` (6 subsequent siblings) 9 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, linux-kernel-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Alex Williamson <alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: kvm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/vfio/vfio_iommu_type1.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 8549cb1..4ad83d4 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -672,7 +672,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, struct vfio_domain, next); list_for_each_entry_continue(d, &iommu->domain_list, next) { - iommu_unmap(d->domain, dma->iova, dma->size); + iommu_unmap_sync(d->domain, dma->iova, dma->size); cond_resched(); } @@ -687,9 +687,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, } /* - * To optimize for fewer iommu_unmap() calls, each of which - * may require hardware cache flushing, try to find the - * largest contiguous physical memory chunk to unmap. + * To optimize for fewer iommu_unmap_sync() calls, each of which + * may require hardware cache flushing, try to find the largest + * contiguous physical memory chunk to unmap. */ for (len = PAGE_SIZE; !domain->fgsp && iova + len < end; len += PAGE_SIZE) { @@ -698,7 +698,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, break; } - unmapped = iommu_unmap(domain->domain, iova, len); + unmapped = iommu_unmap_sync(domain->domain, iova, len); if (WARN_ON(!unmapped)) break; @@ -877,15 +877,15 @@ static int map_try_harder(struct vfio_domain *domain, dma_addr_t iova, int ret = 0; for (i = 0; i < npage; i++, pfn++, iova += PAGE_SIZE) { - ret = iommu_map(domain->domain, iova, - (phys_addr_t)pfn << PAGE_SHIFT, - PAGE_SIZE, prot | domain->prot); + ret = iommu_map_sync(domain->domain, iova, + (phys_addr_t)pfn << PAGE_SHIFT, + PAGE_SIZE, prot | domain->prot); if (ret) break; } for (; i < npage && i > 0; i--, iova -= PAGE_SIZE) - iommu_unmap(domain->domain, iova, PAGE_SIZE); + iommu_unmap_sync(domain->domain, iova, PAGE_SIZE); return ret; } @@ -897,8 +897,9 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova, int ret; list_for_each_entry(d, &iommu->domain_list, next) { - ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT, - npage << PAGE_SHIFT, prot | d->prot); + ret = iommu_map_sync(d->domain, iova, + (phys_addr_t)pfn << PAGE_SHIFT, + npage << PAGE_SHIFT, prot | d->prot); if (ret) { if (ret != -EBUSY || map_try_harder(d, iova, pfn, npage, prot)) @@ -912,7 +913,7 @@ static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova, unwind: list_for_each_entry_continue_reverse(d, &iommu->domain_list, next) - iommu_unmap(d->domain, iova, npage << PAGE_SHIFT); + iommu_unmap_sync(d->domain, iova, npage << PAGE_SHIFT); return ret; } @@ -1102,8 +1103,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, size = npage << PAGE_SHIFT; } - ret = iommu_map(domain->domain, iova, phys, - size, dma->prot | domain->prot); + ret = iommu_map_sync(domain->domain, iova, phys, + size, dma->prot | domain->prot); if (ret) return ret; @@ -1133,13 +1134,14 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain) if (!pages) return; - ret = iommu_map(domain->domain, 0, page_to_phys(pages), PAGE_SIZE * 2, - IOMMU_READ | IOMMU_WRITE | domain->prot); + ret = iommu_map_sync(domain->domain, 0, page_to_phys(pages), + PAGE_SIZE * 2, + IOMMU_READ | IOMMU_WRITE | domain->prot); if (!ret) { - size_t unmapped = iommu_unmap(domain->domain, 0, PAGE_SIZE); + size_t unmapped = iommu_unmap_sync(domain->domain, 0, PAGE_SIZE); if (unmapped == PAGE_SIZE) - iommu_unmap(domain->domain, PAGE_SIZE, PAGE_SIZE); + iommu_unmap_sync(domain->domain, PAGE_SIZE, PAGE_SIZE); else domain->fgsp = true; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 04/13] iommu/dma: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> ` (2 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 12:56 ` [PATCH 05/13] arm: dma-mapping: " Joerg Roedel ` (5 subsequent siblings) 9 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org> Cc: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> Cc: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> Cc: Eric Auger <eric.auger-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: Mitchel Humpherys <mitchelh-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/iommu/dma-iommu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 9d1cebe..38c41a2 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -417,7 +417,7 @@ static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr, dma_addr -= iova_off; size = iova_align(iovad, size + iova_off); - WARN_ON(iommu_unmap(domain, dma_addr, size) != size); + WARN_ON(iommu_unmap_sync(domain, dma_addr, size) != size); iommu_dma_free_iova(cookie, dma_addr, size); } @@ -572,7 +572,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp, sg_miter_stop(&miter); } - if (iommu_map_sg(domain, iova, sgt.sgl, sgt.orig_nents, prot) + if (iommu_map_sg_sync(domain, iova, sgt.sgl, sgt.orig_nents, prot) < size) goto out_free_sg; @@ -631,7 +631,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys, if (!iova) return IOMMU_MAPPING_ERROR; - if (iommu_map(domain, iova, phys - iova_off, size, prot)) { + if (iommu_map_sync(domain, iova, phys - iova_off, size, prot)) { iommu_dma_free_iova(cookie, iova, size); return IOMMU_MAPPING_ERROR; } @@ -791,7 +791,7 @@ int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, * We'll leave any physical concatenation to the IOMMU driver's * implementation - it knows better than we do. */ - if (iommu_map_sg(domain, iova, sg, nents, prot) < iova_len) + if (iommu_map_sg_sync(domain, iova, sg, nents, prot) < iova_len) goto out_free_iova; return __finalise_sg(dev, sg, nents, iova); -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 05/13] arm: dma-mapping: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> ` (3 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 04/13] iommu/dma: " Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 12:56 ` [PATCH 06/13] drm/etnaviv: " Joerg Roedel ` (4 subsequent siblings) 9 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Laurent Pinchart, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Russell King, Joerg Roedel, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Russell King <linux-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org> Cc: Laurent Pinchart <laurent.pinchart+renesas-ryLnwIuWjnjg/C1BVhZhaw@public.gmane.org> Cc: Sricharan R <sricharan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- arch/arm/mm/dma-mapping.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index fcf1473..fae2398 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1444,7 +1444,7 @@ __iommu_create_mapping(struct device *dev, struct page **pages, size_t size, break; len = (j - i) << PAGE_SHIFT; - ret = iommu_map(mapping->domain, iova, phys, len, + ret = iommu_map_sync(mapping->domain, iova, phys, len, __dma_info_to_prot(DMA_BIDIRECTIONAL, attrs)); if (ret < 0) goto fail; @@ -1453,7 +1453,7 @@ __iommu_create_mapping(struct device *dev, struct page **pages, size_t size, } return dma_addr; fail: - iommu_unmap(mapping->domain, dma_addr, iova-dma_addr); + iommu_unmap_sync(mapping->domain, dma_addr, iova-dma_addr); __free_iova(mapping, dma_addr, size); return ARM_MAPPING_ERROR; } @@ -1469,7 +1469,7 @@ static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t si size = PAGE_ALIGN((iova & ~PAGE_MASK) + size); iova &= PAGE_MASK; - iommu_unmap(mapping->domain, iova, size); + iommu_unmap_sync(mapping->domain, iova, size); __free_iova(mapping, iova, size); return 0; } @@ -1730,7 +1730,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, prot = __dma_info_to_prot(dir, attrs); - ret = iommu_map(mapping->domain, iova, phys, len, prot); + ret = iommu_map_sync(mapping->domain, iova, phys, len, prot); if (ret < 0) goto fail; count += len >> PAGE_SHIFT; @@ -1740,7 +1740,7 @@ static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, return 0; fail: - iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE); + iommu_unmap_sync(mapping->domain, iova_base, count * PAGE_SIZE); __free_iova(mapping, iova_base, size); return ret; } @@ -1938,7 +1938,8 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p prot = __dma_info_to_prot(dir, attrs); - ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot); + ret = iommu_map_sync(mapping->domain, dma_addr, page_to_phys(page), + len, prot); if (ret < 0) goto fail; @@ -1988,7 +1989,7 @@ static void arm_coherent_iommu_unmap_page(struct device *dev, dma_addr_t handle, if (!iova) return; - iommu_unmap(mapping->domain, iova, len); + iommu_unmap_sync(mapping->domain, iova, len); __free_iova(mapping, iova, len); } @@ -2016,7 +2017,7 @@ static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, if ((attrs & DMA_ATTR_SKIP_CPU_SYNC) == 0) __dma_page_dev_to_cpu(page, offset, size, dir); - iommu_unmap(mapping->domain, iova, len); + iommu_unmap_sync(mapping->domain, iova, len); __free_iova(mapping, iova, len); } @@ -2044,7 +2045,7 @@ static dma_addr_t arm_iommu_map_resource(struct device *dev, prot = __dma_info_to_prot(dir, attrs) | IOMMU_MMIO; - ret = iommu_map(mapping->domain, dma_addr, addr, len, prot); + ret = iommu_map_sync(mapping->domain, dma_addr, addr, len, prot); if (ret < 0) goto fail; @@ -2073,7 +2074,7 @@ static void arm_iommu_unmap_resource(struct device *dev, dma_addr_t dma_handle, if (!iova) return; - iommu_unmap(mapping->domain, iova, len); + iommu_unmap_sync(mapping->domain, iova, len); __free_iova(mapping, iova, len); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> ` (4 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 05/13] arm: dma-mapping: " Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel [not found] ` <1502974596-23835-7-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 2017-08-17 12:56 ` [PATCH 08/13] drm/nouveau/imem/gk20a: " Joerg Roedel ` (3 subsequent siblings) 9 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, David Airlie, linux-kernel-u79uwXL29TY76Z2rM5mHXA, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Christian Gmeiner, Russell King, Lucas Stach From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Lucas Stach <l.stach-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> Cc: Russell King <linux+etnaviv-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org> Cc: Christian Gmeiner <christian.gmeiner-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: David Airlie <airlied-cv59FeDIM0c@public.gmane.org> Cc: etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c index f103e78..ae0247c 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c @@ -47,7 +47,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova, VERB("map[%d]: %08x %08x(%zx)", i, iova, pa, bytes); - ret = iommu_map(domain, da, pa, bytes, prot); + ret = iommu_map_sync(domain, da, pa, bytes, prot); if (ret) goto fail; @@ -62,7 +62,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova, for_each_sg(sgt->sgl, sg, i, j) { size_t bytes = sg_dma_len(sg) + sg->offset; - iommu_unmap(domain, da, bytes); + iommu_unmap_sync(domain, da, bytes); da += bytes; } return ret; @@ -80,7 +80,7 @@ int etnaviv_iommu_unmap(struct etnaviv_iommu *iommu, u32 iova, size_t bytes = sg_dma_len(sg) + sg->offset; size_t unmapped; - unmapped = iommu_unmap(domain, da, bytes); + unmapped = iommu_unmap_sync(domain, da, bytes); if (unmapped < bytes) return unmapped; @@ -338,7 +338,7 @@ int etnaviv_iommu_get_suballoc_va(struct etnaviv_gpu *gpu, dma_addr_t paddr, mutex_unlock(&mmu->lock); return ret; } - ret = iommu_map(mmu->domain, vram_node->start, paddr, size, + ret = iommu_map_sync(mmu->domain, vram_node->start, paddr, size, IOMMU_READ); if (ret < 0) { drm_mm_remove_node(vram_node); @@ -362,7 +362,7 @@ void etnaviv_iommu_put_suballoc_va(struct etnaviv_gpu *gpu, if (mmu->version == ETNAVIV_IOMMU_V2) { mutex_lock(&mmu->lock); - iommu_unmap(mmu->domain,iova, size); + iommu_unmap_sync(mmu->domain,iova, size); drm_mm_remove_node(vram_node); mutex_unlock(&mmu->lock); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
[parent not found: <1502974596-23835-7-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-7-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> @ 2017-08-17 13:32 ` Lucas Stach [not found] ` <1502976758.19806.25.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Lucas Stach @ 2017-08-17 13:32 UTC (permalink / raw) To: Joerg Roedel Cc: Joerg Roedel, David Airlie, Christian Gmeiner, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, linux-kernel-u79uwXL29TY76Z2rM5mHXA, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King Hi Joerg, Am Donnerstag, den 17.08.2017, 14:56 +0200 schrieb Joerg Roedel: > From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> > > The map and unmap functions of the IOMMU-API changed their > semantics: They do no longer guarantee that the hardware > TLBs are synchronized with the page-table updates they made. > > To make conversion easier, new synchronized functions have > been introduced which give these guarantees again until the > code is converted to use the new TLB-flush interface of the > IOMMU-API, which allows certain optimizations. > > But for now, just convert this code to use the synchronized > functions so that it will behave as before. I don't think this is necessary. Etnaviv has managed and batched up TLB flushes from day 1, as they don't happen through the MMU MMIO interface, but through the GPU command stream. So if my understanding of this series is correct, Etnaviv is just fine with the changed semantics of the unsynchronized map/unmap calls. Regards, Lucas > > Cc: Lucas Stach <l.stach-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> > Cc: Russell King <linux+etnaviv-I+IVW8TIWO2tmTQ+vhA3Yw@public.gmane.org> > Cc: Christian Gmeiner <christian.gmeiner-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > Cc: David Airlie <airlied-cv59FeDIM0c@public.gmane.org> > Cc: etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> > --- > drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c > index f103e78..ae0247c 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_mmu.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_mmu.c > @@ -47,7 +47,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova, > > VERB("map[%d]: %08x %08x(%zx)", i, iova, pa, bytes); > > - ret = iommu_map(domain, da, pa, bytes, prot); > + ret = iommu_map_sync(domain, da, pa, bytes, prot); > if (ret) > goto fail; > > @@ -62,7 +62,7 @@ int etnaviv_iommu_map(struct etnaviv_iommu *iommu, u32 iova, > for_each_sg(sgt->sgl, sg, i, j) { > size_t bytes = sg_dma_len(sg) + sg->offset; > > - iommu_unmap(domain, da, bytes); > + iommu_unmap_sync(domain, da, bytes); > da += bytes; > } > return ret; > @@ -80,7 +80,7 @@ int etnaviv_iommu_unmap(struct etnaviv_iommu *iommu, u32 iova, > size_t bytes = sg_dma_len(sg) + sg->offset; > size_t unmapped; > > - unmapped = iommu_unmap(domain, da, bytes); > + unmapped = iommu_unmap_sync(domain, da, bytes); > if (unmapped < bytes) > return unmapped; > > @@ -338,7 +338,7 @@ int etnaviv_iommu_get_suballoc_va(struct etnaviv_gpu *gpu, dma_addr_t paddr, > mutex_unlock(&mmu->lock); > return ret; > } > - ret = iommu_map(mmu->domain, vram_node->start, paddr, size, > + ret = iommu_map_sync(mmu->domain, vram_node->start, paddr, size, > IOMMU_READ); > if (ret < 0) { > drm_mm_remove_node(vram_node); > @@ -362,7 +362,7 @@ void etnaviv_iommu_put_suballoc_va(struct etnaviv_gpu *gpu, > > if (mmu->version == ETNAVIV_IOMMU_V2) { > mutex_lock(&mmu->lock); > - iommu_unmap(mmu->domain,iova, size); > + iommu_unmap_sync(mmu->domain,iova, size); > drm_mm_remove_node(vram_node); > mutex_unlock(&mmu->lock); > } ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <1502976758.19806.25.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>]
* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API [not found] ` <1502976758.19806.25.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> @ 2017-08-17 13:45 ` Joerg Roedel 2017-08-17 14:03 ` Lucas Stach 0 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 13:45 UTC (permalink / raw) To: Lucas Stach Cc: David Airlie, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Christian Gmeiner, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King Hi Lucas, On Thu, Aug 17, 2017 at 03:32:38PM +0200, Lucas Stach wrote: > I don't think this is necessary. Etnaviv has managed and batched up TLB > flushes from day 1, as they don't happen through the MMU MMIO interface, > but through the GPU command stream. > > So if my understanding of this series is correct, Etnaviv is just fine > with the changed semantics of the unsynchronized map/unmap calls. This is not about any TLB on the GPU that could be flushed through the GPU command stream, but about the TLB in the IOMMU device. Or is this actually the same on this hardware? Which IOMMU-driver is use there? Regards, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API 2017-08-17 13:45 ` Joerg Roedel @ 2017-08-17 14:03 ` Lucas Stach [not found] ` <1502978634.19806.27.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Lucas Stach @ 2017-08-17 14:03 UTC (permalink / raw) To: Joerg Roedel Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit, Russell King, Christian Gmeiner, David Airlie, etnaviv, dri-devel Am Donnerstag, den 17.08.2017, 15:45 +0200 schrieb Joerg Roedel: > Hi Lucas, > > On Thu, Aug 17, 2017 at 03:32:38PM +0200, Lucas Stach wrote: > > I don't think this is necessary. Etnaviv has managed and batched up TLB > > flushes from day 1, as they don't happen through the MMU MMIO interface, > > but through the GPU command stream. > > > > So if my understanding of this series is correct, Etnaviv is just fine > > with the changed semantics of the unsynchronized map/unmap calls. > > This is not about any TLB on the GPU that could be flushed through the > GPU command stream, but about the TLB in the IOMMU device. Or is this > actually the same on this hardware? Which IOMMU-driver is use there? There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API to manage the GPU internal MMU, see drivers/gpu/drm/etnaviv/etnaviv_iommu.c Regards, Lucas ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <1502978634.19806.27.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>]
* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API [not found] ` <1502978634.19806.27.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org> @ 2017-08-17 14:18 ` Joerg Roedel [not found] ` <20170817141858.GG16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 14:18 UTC (permalink / raw) To: Lucas Stach Cc: Joerg Roedel, David Airlie, Christian Gmeiner, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, linux-kernel-u79uwXL29TY76Z2rM5mHXA, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King On Thu, Aug 17, 2017 at 04:03:54PM +0200, Lucas Stach wrote: > There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API > to manage the GPU internal MMU, see > drivers/gpu/drm/etnaviv/etnaviv_iommu.c That looks like a very fragile construct, because it relies on internal behavior of the iommu code that can change in the future. I strongly suggest to either make etnaviv_iommu.c a real iommu driver an move it to drivers/iommu, or (prefered by me) just call directly into the map/unmap functions of this driver from the rest of the etnaviv_iommu.c. I don't really see a reason why the IOMMU-API needs to be used there as another layer of indirection. Regards, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20170817141858.GG16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API [not found] ` <20170817141858.GG16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> @ 2017-08-17 14:30 ` Lucas Stach 2017-08-17 14:35 ` Joerg Roedel 0 siblings, 1 reply; 34+ messages in thread From: Lucas Stach @ 2017-08-17 14:30 UTC (permalink / raw) To: Joerg Roedel Cc: Joerg Roedel, David Airlie, Christian Gmeiner, etnaviv-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, linux-kernel-u79uwXL29TY76Z2rM5mHXA, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Russell King Am Donnerstag, den 17.08.2017, 16:18 +0200 schrieb Joerg Roedel: > On Thu, Aug 17, 2017 at 04:03:54PM +0200, Lucas Stach wrote: > > There is no IOMMU driver in use. Etnaviv just uses part of the IOMMU API > > to manage the GPU internal MMU, see > > drivers/gpu/drm/etnaviv/etnaviv_iommu.c > > That looks like a very fragile construct, because it relies on internal > behavior of the iommu code that can change in the future. > > I strongly suggest to either make etnaviv_iommu.c a real iommu driver an > move it to drivers/iommu, or (prefered by me) just call directly into > the map/unmap functions of this driver from the rest of the > etnaviv_iommu.c. I don't really see a reason why the IOMMU-API needs to > be used there as another layer of indirection. Yeah, the IOMMU API being used internally is a historical accident, that we didn't get around to rectify yet. It's on my things-we-need-to-do list to prune the usage of the IOMMU API in etnaviv. Regards, Lucas ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 06/13] drm/etnaviv: Use sychronized interface of the IOMMU-API 2017-08-17 14:30 ` Lucas Stach @ 2017-08-17 14:35 ` Joerg Roedel 0 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 14:35 UTC (permalink / raw) To: Lucas Stach Cc: Joerg Roedel, iommu, linux-kernel, Suravee Suthikulpanit, Russell King, Christian Gmeiner, David Airlie, etnaviv, dri-devel On Thu, Aug 17, 2017 at 04:30:48PM +0200, Lucas Stach wrote: > Yeah, the IOMMU API being used internally is a historical accident, that > we didn't get around to rectify yet. It's on my things-we-need-to-do > list to prune the usage of the IOMMU API in etnaviv. Okay, so for the time being, I drop the etnaviv patch from this series. Thanks, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 08/13] drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> ` (5 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 06/13] drm/etnaviv: " Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel ` (2 subsequent siblings) 9 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, David Airlie, nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, linux-kernel-u79uwXL29TY76Z2rM5mHXA, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Ben Skeggs From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Ben Skeggs <bskeggs-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Cc: David Airlie <airlied-cv59FeDIM0c@public.gmane.org> Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c index cd5adbe..3f0de47 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c @@ -322,8 +322,9 @@ gk20a_instobj_dtor_iommu(struct nvkm_memory *memory) /* Unmap pages from GPU address space and free them */ for (i = 0; i < node->base.mem.size; i++) { - iommu_unmap(imem->domain, - (r->offset + i) << imem->iommu_pgshift, PAGE_SIZE); + iommu_unmap_sync(imem->domain, + (r->offset + i) << imem->iommu_pgshift, + PAGE_SIZE); dma_unmap_page(dev, node->dma_addrs[i], PAGE_SIZE, DMA_BIDIRECTIONAL); __free_page(node->pages[i]); @@ -458,14 +459,15 @@ gk20a_instobj_ctor_iommu(struct gk20a_instmem *imem, u32 npages, u32 align, for (i = 0; i < npages; i++) { u32 offset = (r->offset + i) << imem->iommu_pgshift; - ret = iommu_map(imem->domain, offset, node->dma_addrs[i], - PAGE_SIZE, IOMMU_READ | IOMMU_WRITE); + ret = iommu_map_sync(imem->domain, offset, node->dma_addrs[i], + PAGE_SIZE, IOMMU_READ | IOMMU_WRITE); if (ret < 0) { nvkm_error(subdev, "IOMMU mapping failure: %d\n", ret); while (i-- > 0) { offset -= PAGE_SIZE; - iommu_unmap(imem->domain, offset, PAGE_SIZE); + iommu_unmap_sync(imem->domain, offset, + PAGE_SIZE); } goto release_area; } -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH 10/13] drm/tegra: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> ` (6 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 08/13] drm/nouveau/imem/gk20a: " Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel 2017-08-17 13:28 ` Thierry Reding 2017-08-17 12:56 ` [PATCH 11/13] gpu: host1x: " Joerg Roedel 2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson 9 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, David Airlie, linux-kernel-u79uwXL29TY76Z2rM5mHXA, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Jonathan Hunter, Thierry Reding, linux-tegra-u79uwXL29TY76Z2rM5mHXA From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Thierry Reding <thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: David Airlie <airlied-cv59FeDIM0c@public.gmane.org> Cc: Jonathan Hunter <jonathanh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/gpu/drm/tegra/drm.c | 6 +++--- drivers/gpu/drm/tegra/gem.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c index 518f4b6..bc4528ee 100644 --- a/drivers/gpu/drm/tegra/drm.c +++ b/drivers/gpu/drm/tegra/drm.c @@ -1156,8 +1156,8 @@ void *tegra_drm_alloc(struct tegra_drm *tegra, size_t size, } *dma = iova_dma_addr(&tegra->carveout.domain, alloc); - err = iommu_map(tegra->domain, *dma, virt_to_phys(virt), - size, IOMMU_READ | IOMMU_WRITE); + err = iommu_map_sync(tegra->domain, *dma, virt_to_phys(virt), + size, IOMMU_READ | IOMMU_WRITE); if (err < 0) goto free_iova; @@ -1180,7 +1180,7 @@ void tegra_drm_free(struct tegra_drm *tegra, size_t size, void *virt, size = PAGE_ALIGN(size); if (tegra->domain) { - iommu_unmap(tegra->domain, dma, size); + iommu_unmap_sync(tegra->domain, dma, size); free_iova(&tegra->carveout.domain, iova_pfn(&tegra->carveout.domain, dma)); } diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c index 7a39a35..639bc75 100644 --- a/drivers/gpu/drm/tegra/gem.c +++ b/drivers/gpu/drm/tegra/gem.c @@ -135,8 +135,8 @@ static int tegra_bo_iommu_map(struct tegra_drm *tegra, struct tegra_bo *bo) bo->paddr = bo->mm->start; - err = iommu_map_sg(tegra->domain, bo->paddr, bo->sgt->sgl, - bo->sgt->nents, prot); + err = iommu_map_sg_sync(tegra->domain, bo->paddr, bo->sgt->sgl, + bo->sgt->nents, prot); if (err < 0) { dev_err(tegra->drm->dev, "failed to map buffer: %zd\n", err); goto remove; @@ -162,7 +162,7 @@ static int tegra_bo_iommu_unmap(struct tegra_drm *tegra, struct tegra_bo *bo) return 0; mutex_lock(&tegra->mm_lock); - iommu_unmap(tegra->domain, bo->paddr, bo->size); + iommu_unmap_sync(tegra->domain, bo->paddr, bo->size); drm_mm_remove_node(bo->mm); mutex_unlock(&tegra->mm_lock); -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH 10/13] drm/tegra: Use sychronized interface of the IOMMU-API 2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel @ 2017-08-17 13:28 ` Thierry Reding 0 siblings, 0 replies; 34+ messages in thread From: Thierry Reding @ 2017-08-17 13:28 UTC (permalink / raw) To: Joerg Roedel Cc: iommu, linux-kernel, Suravee Suthikulpanit, Joerg Roedel, David Airlie, Jonathan Hunter, dri-devel, linux-tegra [-- Attachment #1: Type: text/plain, Size: 1105 bytes --] On Thu, Aug 17, 2017 at 02:56:33PM +0200, Joerg Roedel wrote: > From: Joerg Roedel <jroedel@suse.de> > > The map and unmap functions of the IOMMU-API changed their > semantics: They do no longer guarantee that the hardware > TLBs are synchronized with the page-table updates they made. > > To make conversion easier, new synchronized functions have > been introduced which give these guarantees again until the > code is converted to use the new TLB-flush interface of the > IOMMU-API, which allows certain optimizations. > > But for now, just convert this code to use the synchronized > functions so that it will behave as before. > > Cc: Thierry Reding <thierry.reding@gmail.com> > Cc: David Airlie <airlied@linux.ie> > Cc: Jonathan Hunter <jonathanh@nvidia.com> > Cc: dri-devel@lists.freedesktop.org > Cc: linux-tegra@vger.kernel.org > Signed-off-by: Joerg Roedel <jroedel@suse.de> > --- > drivers/gpu/drm/tegra/drm.c | 6 +++--- > drivers/gpu/drm/tegra/gem.c | 6 +++--- > 2 files changed, 6 insertions(+), 6 deletions(-) Acked-by: Thierry Reding <treding@nvidia.com> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH 11/13] gpu: host1x: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> ` (7 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel @ 2017-08-17 12:56 ` Joerg Roedel [not found] ` <1502974596-23835-12-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson 9 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 12:56 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Joerg Roedel, linux-kernel-u79uwXL29TY76Z2rM5mHXA, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Thierry Reding, linux-tegra-u79uwXL29TY76Z2rM5mHXA From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> The map and unmap functions of the IOMMU-API changed their semantics: They do no longer guarantee that the hardware TLBs are synchronized with the page-table updates they made. To make conversion easier, new synchronized functions have been introduced which give these guarantees again until the code is converted to use the new TLB-flush interface of the IOMMU-API, which allows certain optimizations. But for now, just convert this code to use the synchronized functions so that it will behave as before. Cc: Thierry Reding <thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> --- drivers/gpu/host1x/cdma.c | 6 +++--- drivers/gpu/host1x/job.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c index 28541b2..bdf557e 100644 --- a/drivers/gpu/host1x/cdma.c +++ b/drivers/gpu/host1x/cdma.c @@ -55,7 +55,7 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb) return; if (host1x->domain) { - iommu_unmap(host1x->domain, pb->dma, pb->alloc_size); + iommu_unmap_sync(host1x->domain, pb->dma, pb->alloc_size); free_iova(&host1x->iova, iova_pfn(&host1x->iova, pb->dma)); } @@ -105,8 +105,8 @@ static int host1x_pushbuffer_init(struct push_buffer *pb) } pb->dma = iova_dma_addr(&host1x->iova, alloc); - err = iommu_map(host1x->domain, pb->dma, pb->phys, size, - IOMMU_READ); + err = iommu_map_sync(host1x->domain, pb->dma, pb->phys, size, + IOMMU_READ); if (err) goto iommu_free_iova; } else { diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c index bee5044..70a029c 100644 --- a/drivers/gpu/host1x/job.c +++ b/drivers/gpu/host1x/job.c @@ -243,7 +243,7 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job) goto unpin; } - err = iommu_map_sg(host->domain, + err = iommu_map_sg_sync(host->domain, iova_dma_addr(&host->iova, alloc), sgt->sgl, sgt->nents, IOMMU_READ); if (err == 0) { @@ -695,8 +695,8 @@ void host1x_job_unpin(struct host1x_job *job) struct host1x_job_unpin_data *unpin = &job->unpins[i]; if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) && host->domain) { - iommu_unmap(host->domain, job->addr_phys[i], - unpin->size); + iommu_unmap_sync(host->domain, job->addr_phys[i], + unpin->size); free_iova(&host->iova, iova_pfn(&host->iova, job->addr_phys[i])); } -- 2.7.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
[parent not found: <1502974596-23835-12-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 11/13] gpu: host1x: Use sychronized interface of the IOMMU-API [not found] ` <1502974596-23835-12-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> @ 2017-08-17 13:29 ` Thierry Reding 0 siblings, 0 replies; 34+ messages in thread From: Thierry Reding @ 2017-08-17 13:29 UTC (permalink / raw) To: Joerg Roedel Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Suravee Suthikulpanit, Joerg Roedel, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, linux-tegra-u79uwXL29TY76Z2rM5mHXA [-- Attachment #1: Type: text/plain, Size: 1169 bytes --] On Thu, Aug 17, 2017 at 02:56:34PM +0200, Joerg Roedel wrote: > From: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> > > The map and unmap functions of the IOMMU-API changed their > semantics: They do no longer guarantee that the hardware > TLBs are synchronized with the page-table updates they made. > > To make conversion easier, new synchronized functions have > been introduced which give these guarantees again until the > code is converted to use the new TLB-flush interface of the > IOMMU-API, which allows certain optimizations. > > But for now, just convert this code to use the synchronized > functions so that it will behave as before. > > Cc: Thierry Reding <thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> > Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > Signed-off-by: Joerg Roedel <jroedel-l3A5Bk7waGM@public.gmane.org> > --- > drivers/gpu/host1x/cdma.c | 6 +++--- > drivers/gpu/host1x/job.c | 6 +++--- > 2 files changed, 6 insertions(+), 6 deletions(-) Acked-by: Thierry Reding <treding-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface [not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> ` (8 preceding siblings ...) 2017-08-17 12:56 ` [PATCH 11/13] gpu: host1x: " Joerg Roedel @ 2017-08-17 14:35 ` Alex Williamson 2017-08-17 14:43 ` Joerg Roedel 9 siblings, 1 reply; 34+ messages in thread From: Alex Williamson @ 2017-08-17 14:35 UTC (permalink / raw) To: Joerg Roedel Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, 17 Aug 2017 14:56:23 +0200 Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote: > Hi, > > here is a patch-set to introduce an explicit interface to > the IOMMU-API to flush IOMMU and device IO/TLBs. Currently > the iommu_map(), iommu_map_sg(), and iommu_unmap() functions > have to make sure all IO/TLBs in the system are synchronized > with the page-table updates they made. > > This is very inefficient in some scenarios, for example when > a large address space is unmapped and an IO/TLB flush has to > be done in every call of iommu_unmap(). Or in a scenario > where it makes sense to queue up some changes to the > page-tables and flush them together. > > To optimize these scenarios, the need to synchronize with > the IOMMU and device TLBs has been removed from the > map/unmap functions of the IOMMU-API and an interface to > explicitly do the flushes has been introduced. > > To make the conversion of existing users of the IOMMU-API > easier, new functions - iommu_map_sync(), iommu_map_sg_sync(), > and iommu_unmap_sync() - have been introduced. These > functions guarantee that the IO/TLBs are synchronized with > any page-table update when they return. The optimizations > possible with the new interface are subject to separate > patch-sets. Hi Joerg, Wouldn't it be much more friendly to downstreams and out-of-tree drivers to introduce new functions for the async semantics? ie. iommu_map_async(), etc. The API also seems a little cleaner that iommu_map() stands alone, it's synchronous, iommu_map_async() is explicitly asynchronous and a _flush() call is needed to finalize it. What do you see as the advantage to the approach here? Thanks, Alex > Patch 1 just renames a few functions in the AMD-Vi driver > that would otherwise collide with the new TLB-flush > functions from the IOMMU-API. > > Patch 2 introduces the new IO/TLB Flush-Interface. > > Patch 3-13 convert existing users of the IOMMU-API to use > the *_sync functions for now. > > Please review. > > Thanks, > > Joerg > > Joerg Roedel (13): > iommu/amd: Rename a few flush functions > iommu: Introduce Interface for IOMMU TLB Flushing > vfio/type1: Use sychronized interface of the IOMMU-API > iommu/dma: Use sychronized interface of the IOMMU-API > arm: dma-mapping: Use sychronized interface of the IOMMU-API > drm/etnaviv: Use sychronized interface of the IOMMU-API > drm/msm: Use sychronized interface of the IOMMU-API > drm/nouveau/imem/gk20a: Use sychronized interface of the IOMMU-API > drm/rockchip: Use sychronized interface of the IOMMU-API > drm/tegra: Use sychronized interface of the IOMMU-API > gpu: host1x: Use sychronized interface of the IOMMU-API > IB/usnic: Use sychronized interface of the IOMMU-API > remoteproc: Use sychronized interface of the IOMMU-API > > arch/arm/mm/dma-mapping.c | 21 +++--- > drivers/gpu/drm/etnaviv/etnaviv_mmu.c | 10 +-- > drivers/gpu/drm/msm/msm_iommu.c | 5 +- > .../gpu/drm/nouveau/nvkm/subdev/instmem/gk20a.c | 12 ++-- > drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 6 +- > drivers/gpu/drm/tegra/drm.c | 6 +- > drivers/gpu/drm/tegra/gem.c | 6 +- > drivers/gpu/host1x/cdma.c | 6 +- > drivers/gpu/host1x/job.c | 6 +- > drivers/infiniband/hw/usnic/usnic_uiom.c | 10 +-- > drivers/iommu/amd_iommu.c | 16 ++--- > drivers/iommu/dma-iommu.c | 8 +-- > drivers/iommu/iommu.c | 26 +++++++ > drivers/remoteproc/remoteproc_core.c | 10 +-- > drivers/vfio/vfio_iommu_type1.c | 38 +++++----- > include/linux/iommu.h | 80 +++++++++++++++++++++- > 16 files changed, 189 insertions(+), 77 deletions(-) > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface 2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson @ 2017-08-17 14:43 ` Joerg Roedel [not found] ` <20170817144308.GI16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 14:43 UTC (permalink / raw) To: Alex Williamson; +Cc: iommu, linux-kernel, Suravee Suthikulpanit Hi Alex, On Thu, Aug 17, 2017 at 08:35:20AM -0600, Alex Williamson wrote: > Wouldn't it be much more friendly to downstreams and out-of-tree > drivers to introduce new functions for the async semantics? ie. > iommu_map_async(), etc. The API also seems a little cleaner that > iommu_map() stands alone, it's synchronous, iommu_map_async() is > explicitly asynchronous and a _flush() call is needed to finalize it. > What do you see as the advantage to the approach here? Thanks, The reason I did it this way was that I want the iommu_map(), iommu_unmap(), and iomu_map_sg() functions be considered the _default_ to chose when using the IOMMU-API, because their use is faster than using the _sync() variants. Or in other words, I want the _sync function names to imply that they are slower versions of the default ones. Regards, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20170817144308.GI16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface [not found] ` <20170817144308.GI16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> @ 2017-08-17 14:54 ` Alex Williamson [not found] ` <20170817085407.3de4e755-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Alex Williamson @ 2017-08-17 14:54 UTC (permalink / raw) To: Joerg Roedel Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, 17 Aug 2017 16:43:08 +0200 Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> wrote: > Hi Alex, > > On Thu, Aug 17, 2017 at 08:35:20AM -0600, Alex Williamson wrote: > > Wouldn't it be much more friendly to downstreams and out-of-tree > > drivers to introduce new functions for the async semantics? ie. > > iommu_map_async(), etc. The API also seems a little cleaner that > > iommu_map() stands alone, it's synchronous, iommu_map_async() is > > explicitly asynchronous and a _flush() call is needed to finalize it. > > What do you see as the advantage to the approach here? Thanks, > > The reason I did it this way was that I want the iommu_map(), > iommu_unmap(), and iomu_map_sg() functions be considered the _default_ > to chose when using the IOMMU-API, because their use is faster than > using the _sync() variants. Or in other words, I want the _sync function > names to imply that they are slower versions of the default ones. So _sync() does imply that they're slower, but iommu_map() does not imply that a _flush() is required. One is a performance issue, the other is an API usability issue. If the sync version is used sub-optimally, it's a performance issue, not a correctness issue. If the async version is used without an explicit flush, it's a correctness issue. Therefore, I would lean towards making the asynchronous mode explicit and providing good documentation and comments to steer developers to the async version. I think it makes the API harder to use incorrectly. Thanks, Alex ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20170817085407.3de4e755-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>]
* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface [not found] ` <20170817085407.3de4e755-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org> @ 2017-08-17 15:22 ` Joerg Roedel [not found] ` <20170817152220.GK16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> 0 siblings, 1 reply; 34+ messages in thread From: Joerg Roedel @ 2017-08-17 15:22 UTC (permalink / raw) To: Alex Williamson Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, Aug 17, 2017 at 08:54:07AM -0600, Alex Williamson wrote: > So _sync() does imply that they're slower, but iommu_map() does not > imply that a _flush() is required. One is a performance issue, the > other is an API usability issue. If the sync version is used > sub-optimally, it's a performance issue, not a correctness issue. If > the async version is used without an explicit flush, it's a correctness > issue. Therefore, I would lean towards making the asynchronous mode > explicit and providing good documentation and comments to steer > developers to the async version. I think it makes the API harder to > use incorrectly. I agree that it makes the API a bit more complicated to use. But that can be solved by documenting it and by converting the main users (for me that is VFIO and dma-iommu.c) of it to the unsynchronized interface, because people will look at existing users and just do what they do. Introducing _sync functions instead of _async ones also has the side-effect that it puts pressure on the maintainers of the code to convert it to the async interface, because they now see explicitly that they use the slow version and start looking for ways to make things faster. What I absolutly don't want is that the whole explicit TLB flushing of the IOMMU-API (as introduced in this patch-set) is considered some optional part of the API, as it would be when I just introduce _async versions of map/unmap/map_sg. Regards, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
[parent not found: <20170817152220.GK16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface [not found] ` <20170817152220.GK16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org> @ 2017-08-23 12:09 ` Joerg Roedel 0 siblings, 0 replies; 34+ messages in thread From: Joerg Roedel @ 2017-08-23 12:09 UTC (permalink / raw) To: Alex Williamson Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, linux-kernel-u79uwXL29TY76Z2rM5mHXA On Thu, Aug 17, 2017 at 05:22:20PM +0200, Joerg Roedel wrote: > What I absolutly don't want is that the whole explicit TLB flushing > of the IOMMU-API (as introduced in this patch-set) is considered some > optional part of the API, as it would be when I just introduce _async > versions of map/unmap/map_sg. Okay, forget that :) The discussions I had around this interface made me change it a little bit in the version 2 of the patch-set which I will post soon. I thought a bit more about the iommu_map() code-path. It really doesn't make any sense to remove the tlb-sync requirement from it, because in almost all cases the hardware doesn't require any flushes after a map operation anyway. And in the rare cases where it does - because the hardware is emulated and slow - the iommu-driver can handle that by doing a flush in its iommu_ops->map() call-back. So I removed the iommu_map_sync() and iommu_map_sg_sync() functions from this series. With those changes it also doesn't make sense anymore to have different tlb-sync semantics between iommu_map() and iommu_unmap(). So I ended up introducing a new iommu_unmap_fast() function which can unmap ranges and return with dirty io-tlbs. This makes the extension of couse look somewhat optional, which I tried to avoid, but I hope the '_fast' part of the name is enough motivation for iommu-api users to look into ways to use it in their code. Regards, Joerg ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2017-08-24 19:06 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-08-17 12:56 [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Joerg Roedel
2017-08-17 12:56 ` [PATCH 07/13] drm/msm: Use sychronized interface of the IOMMU-API Joerg Roedel
2017-08-19 15:39 ` Rob Clark
2017-08-17 12:56 ` [PATCH 09/13] drm/rockchip: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 12/13] IB/usnic: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 13/13] remoteproc: " Joerg Roedel
2017-08-24 19:06 ` Bjorn Andersson
[not found] ` <1502974596-23835-1-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-08-17 12:56 ` [PATCH 01/13] iommu/amd: Rename a few flush functions Joerg Roedel
2017-08-17 12:56 ` [PATCH 02/13] iommu: Introduce Interface for IOMMU TLB Flushing Joerg Roedel
[not found] ` <1502974596-23835-3-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-08-17 16:32 ` Will Deacon
[not found] ` <20170817163234.GB30719-5wv7dgnIgG8@public.gmane.org>
2017-08-17 16:50 ` Joerg Roedel
[not found] ` <20170817165040.GK2853-l3A5Bk7waGM@public.gmane.org>
2017-08-17 17:17 ` Will Deacon
2017-08-17 21:20 ` Joerg Roedel
[not found] ` <20170817212054.GL2853-l3A5Bk7waGM@public.gmane.org>
2017-08-18 15:16 ` Will Deacon
2017-08-17 12:56 ` [PATCH 03/13] vfio/type1: Use sychronized interface of the IOMMU-API Joerg Roedel
2017-08-17 12:56 ` [PATCH 04/13] iommu/dma: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 05/13] arm: dma-mapping: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 06/13] drm/etnaviv: " Joerg Roedel
[not found] ` <1502974596-23835-7-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-08-17 13:32 ` Lucas Stach
[not found] ` <1502976758.19806.25.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2017-08-17 13:45 ` Joerg Roedel
2017-08-17 14:03 ` Lucas Stach
[not found] ` <1502978634.19806.27.camel-bIcnvbaLZ9MEGnE8C9+IrQ@public.gmane.org>
2017-08-17 14:18 ` Joerg Roedel
[not found] ` <20170817141858.GG16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-08-17 14:30 ` Lucas Stach
2017-08-17 14:35 ` Joerg Roedel
2017-08-17 12:56 ` [PATCH 08/13] drm/nouveau/imem/gk20a: " Joerg Roedel
2017-08-17 12:56 ` [PATCH 10/13] drm/tegra: " Joerg Roedel
2017-08-17 13:28 ` Thierry Reding
2017-08-17 12:56 ` [PATCH 11/13] gpu: host1x: " Joerg Roedel
[not found] ` <1502974596-23835-12-git-send-email-joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-08-17 13:29 ` Thierry Reding
2017-08-17 14:35 ` [PATCH 00/13] Introduce IOMMU-API TLB Flushing Interface Alex Williamson
2017-08-17 14:43 ` Joerg Roedel
[not found] ` <20170817144308.GI16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-08-17 14:54 ` Alex Williamson
[not found] ` <20170817085407.3de4e755-DGNDKt5SQtizQB+pC5nmwQ@public.gmane.org>
2017-08-17 15:22 ` Joerg Roedel
[not found] ` <20170817152220.GK16908-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2017-08-23 12:09 ` Joerg Roedel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).