* [PATCH v3 0/3] Initial Panfrost driver @ 2019-04-09 20:54 Rob Herring 2019-04-09 20:54 ` [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format Rob Herring 0 siblings, 1 reply; 10+ messages in thread From: Rob Herring @ 2019-04-09 20:54 UTC (permalink / raw) To: dri-devel Cc: Sean Paul, Lyude Paul, Eric Anholt, Maxime Ripard, Maarten Lankhorst, Joerg Roedel, Neil Armstrong, Will Deacon, linux-kernel, Steven Price, David Airlie, iommu, Boris Brezillon, Alyssa Rosenzweig, Daniel Vetter, Robin Murphy, linux-arm-kernel Here's v3 of the panfrost driver. Lot's of changes from review comments and further testing. Details are in each patch. Of note, a problem with MMU page faults has been addressed improving the stability. In the process, the TLB invalidate has been optimized which Tomeu says has improved the performance some. Several dependencies have been applied already, but the first 2 patches are the remaining dependencies. We need to take the iommu change via drm-misc or we need a stable branch. I'm hoping this is the last version. I'm hoping to apply this to drm-misc this week before -rc5 cutoff. A git branch is here[1]. Rob [1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git panfrost-rebase-v3 Rob Herring (3): iommu: io-pgtable: Add ARM Mali midgard MMU page table format drm: Add a drm_gem_objects_lookup helper drm/panfrost: Add initial panfrost driver MAINTAINERS | 9 + drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/drm_gem.c | 93 ++- drivers/gpu/drm/panfrost/Kconfig | 14 + drivers/gpu/drm/panfrost/Makefile | 12 + drivers/gpu/drm/panfrost/TODO | 27 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 218 ++++++++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 14 + drivers/gpu/drm/panfrost/panfrost_device.c | 252 +++++++++ drivers/gpu/drm/panfrost/panfrost_device.h | 124 ++++ drivers/gpu/drm/panfrost/panfrost_drv.c | 460 +++++++++++++++ drivers/gpu/drm/panfrost/panfrost_features.h | 309 ++++++++++ drivers/gpu/drm/panfrost/panfrost_gem.c | 95 ++++ drivers/gpu/drm/panfrost/panfrost_gem.h | 29 + drivers/gpu/drm/panfrost/panfrost_gpu.c | 362 ++++++++++++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 19 + drivers/gpu/drm/panfrost/panfrost_issues.h | 176 ++++++ drivers/gpu/drm/panfrost/panfrost_job.c | 560 +++++++++++++++++++ drivers/gpu/drm/panfrost/panfrost_job.h | 51 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 369 ++++++++++++ drivers/gpu/drm/panfrost/panfrost_mmu.h | 17 + drivers/gpu/drm/panfrost/panfrost_regs.h | 298 ++++++++++ drivers/iommu/io-pgtable-arm.c | 91 ++- drivers/iommu/io-pgtable.c | 1 + include/drm/drm_gem.h | 2 + include/linux/io-pgtable.h | 7 + include/uapi/drm/panfrost_drm.h | 142 +++++ 28 files changed, 3722 insertions(+), 32 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/Kconfig create mode 100644 drivers/gpu/drm/panfrost/Makefile create mode 100644 drivers/gpu/drm/panfrost/TODO create mode 100644 drivers/gpu/drm/panfrost/panfrost_devfreq.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_devfreq.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_drv.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_features.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_issues.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_regs.h create mode 100644 include/uapi/drm/panfrost_drm.h -- 2.19.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-09 20:54 [PATCH v3 0/3] Initial Panfrost driver Rob Herring @ 2019-04-09 20:54 ` Rob Herring 0 siblings, 0 replies; 10+ messages in thread From: Rob Herring @ 2019-04-09 20:54 UTC (permalink / raw) To: dri-devel Cc: Joerg Roedel, Will Deacon, linux-kernel, iommu, Alyssa Rosenzweig, Robin Murphy, linux-arm-kernel ARM Mali midgard GPU is similar to standard 64-bit stage 1 page tables, but have a few differences. Add a new format type to represent the format. The input address size is 48-bits and the output address size is 40-bits (and possibly less?). Note that the later bifrost GPUs follow the standard 64-bit stage 1 format. The differences in the format compared to 64-bit stage 1 format are: The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. The access flags are not read-only and unprivileged, but read and write. This is similar to stage 2 entries, but the memory attributes field matches stage 1 being an index. The nG bit is not set by the vendor driver. This one didn't seem to matter, but we'll keep it aligned to the vendor driver. Cc: Will Deacon <will.deacon@arm.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux-foundation.org Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Rob Herring <robh@kernel.org> --- This is really v4 of the patch. v3 is the series version. Joerg, please ack so we can take this via the drm tree. v3: - Incorporated refactoring from Robin drivers/iommu/io-pgtable-arm.c | 91 ++++++++++++++++++++++++++-------- drivers/iommu/io-pgtable.c | 1 + include/linux/io-pgtable.h | 7 +++ 3 files changed, 77 insertions(+), 22 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index d3700ec15cbd..4e21efbc4459 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -172,6 +172,10 @@ #define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2 +#define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0) +#define ARM_MALI_LPAE_TTBR_READ_INNER BIT(2) +#define ARM_MALI_LPAE_TTBR_SHARE_OUTER BIT(4) + /* IOPTE accessors */ #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) @@ -180,11 +184,6 @@ #define iopte_prot(pte) ((pte) & ARM_LPAE_PTE_ATTR_MASK) -#define iopte_leaf(pte,l) \ - (l == (ARM_LPAE_MAX_LEVELS - 1) ? \ - (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) : \ - (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK)) - struct arm_lpae_io_pgtable { struct io_pgtable iop; @@ -198,6 +197,15 @@ struct arm_lpae_io_pgtable { typedef u64 arm_lpae_iopte; +static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl, + enum io_pgtable_fmt fmt) +{ + if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE) + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_PAGE; + + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_BLOCK; +} + static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, struct arm_lpae_io_pgtable *data) { @@ -303,12 +311,14 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS) pte |= ARM_LPAE_PTE_NS; - if (lvl == ARM_LPAE_MAX_LEVELS - 1) + if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1) pte |= ARM_LPAE_PTE_TYPE_PAGE; else pte |= ARM_LPAE_PTE_TYPE_BLOCK; - pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS; + if (data->iop.fmt != ARM_MALI_LPAE) + pte |= ARM_LPAE_PTE_AF; + pte |= ARM_LPAE_PTE_SH_IS; pte |= paddr_to_iopte(paddr, data); __arm_lpae_set_pte(ptep, pte, &data->iop.cfg); @@ -321,7 +331,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, { arm_lpae_iopte pte = *ptep; - if (iopte_leaf(pte, lvl)) { + if (iopte_leaf(pte, lvl, data->iop.fmt)) { /* We require an unmap first */ WARN_ON(!selftest_running); return -EEXIST; @@ -409,7 +419,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, __arm_lpae_sync_pte(ptep, cfg); } - if (pte && !iopte_leaf(pte, lvl)) { + if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) { cptep = iopte_deref(pte, data); } else if (pte) { /* We require an unmap first */ @@ -429,31 +439,37 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, if (data->iop.fmt == ARM_64_LPAE_S1 || data->iop.fmt == ARM_32_LPAE_S1) { pte = ARM_LPAE_PTE_nG; - if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ)) pte |= ARM_LPAE_PTE_AP_RDONLY; - if (!(prot & IOMMU_PRIV)) pte |= ARM_LPAE_PTE_AP_UNPRIV; - - if (prot & IOMMU_MMIO) - pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV - << ARM_LPAE_PTE_ATTRINDX_SHIFT); - else if (prot & IOMMU_CACHE) - pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE - << ARM_LPAE_PTE_ATTRINDX_SHIFT); } else { pte = ARM_LPAE_PTE_HAP_FAULT; if (prot & IOMMU_READ) pte |= ARM_LPAE_PTE_HAP_READ; if (prot & IOMMU_WRITE) pte |= ARM_LPAE_PTE_HAP_WRITE; + } + + /* + * Note that this logic is structured to accommodate Mali LPAE + * having stage-1-like attributes but stage-2-like permissions. + */ + if (data->iop.fmt == ARM_64_LPAE_S2 || + data->iop.fmt == ARM_32_LPAE_S2) { if (prot & IOMMU_MMIO) pte |= ARM_LPAE_PTE_MEMATTR_DEV; else if (prot & IOMMU_CACHE) pte |= ARM_LPAE_PTE_MEMATTR_OIWB; else pte |= ARM_LPAE_PTE_MEMATTR_NC; + } else { + if (prot & IOMMU_MMIO) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV + << ARM_LPAE_PTE_ATTRINDX_SHIFT); + else if (prot & IOMMU_CACHE) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE + << ARM_LPAE_PTE_ATTRINDX_SHIFT); } if (prot & IOMMU_NOEXEC) @@ -511,7 +527,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl, while (ptep != end) { arm_lpae_iopte pte = *ptep++; - if (!pte || iopte_leaf(pte, lvl)) + if (!pte || iopte_leaf(pte, lvl, data->iop.fmt)) continue; __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); @@ -602,7 +618,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { __arm_lpae_set_pte(ptep, 0, &iop->cfg); - if (!iopte_leaf(pte, lvl)) { + if (!iopte_leaf(pte, lvl, iop->fmt)) { /* Also flush any partial walks */ io_pgtable_tlb_add_flush(iop, iova, size, ARM_LPAE_GRANULE(data), false); @@ -621,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, } return size; - } else if (iopte_leaf(pte, lvl)) { + } else if (iopte_leaf(pte, lvl, iop->fmt)) { /* * Insert a table at the next level to map the old region, * minus the part we want to unmap @@ -669,7 +685,7 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, return 0; /* Leaf entry? */ - if (iopte_leaf(pte,lvl)) + if (iopte_leaf(pte, lvl, data->iop.fmt)) goto found_translation; /* Take it to the next level */ @@ -995,6 +1011,32 @@ arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) return iop; } +static struct io_pgtable * +arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) +{ + struct io_pgtable *iop; + + if (cfg->ias != 48 || cfg->oas > 40) + return NULL; + + cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G); + iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie); + if (iop) { + u64 mair, ttbr; + + /* Copy values as union fields overlap */ + mair = cfg->arm_lpae_s1_cfg.mair[0]; + ttbr = cfg->arm_lpae_s1_cfg.ttbr[0]; + + cfg->arm_mali_lpae_cfg.memattr = mair; + cfg->arm_mali_lpae_cfg.transtab = ttbr | + ARM_MALI_LPAE_TTBR_READ_INNER | + ARM_MALI_LPAE_TTBR_ADRMODE_TABLE; + } + + return iop; +} + struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { .alloc = arm_64_lpae_alloc_pgtable_s1, .free = arm_lpae_free_pgtable, @@ -1015,6 +1057,11 @@ struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { .free = arm_lpae_free_pgtable, }; +struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { + .alloc = arm_mali_lpae_alloc_pgtable, + .free = arm_lpae_free_pgtable, +}; + #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST static struct io_pgtable_cfg *cfg_cookie; diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c index 93f2880be6c6..5227cfdbb65b 100644 --- a/drivers/iommu/io-pgtable.c +++ b/drivers/iommu/io-pgtable.c @@ -30,6 +30,7 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = { [ARM_32_LPAE_S2] = &io_pgtable_arm_32_lpae_s2_init_fns, [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns, [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns, + [ARM_MALI_LPAE] = &io_pgtable_arm_mali_lpae_init_fns, #endif #ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S [ARM_V7S] = &io_pgtable_arm_v7s_init_fns, diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 47d5ae559329..76969a564831 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -12,6 +12,7 @@ enum io_pgtable_fmt { ARM_64_LPAE_S1, ARM_64_LPAE_S2, ARM_V7S, + ARM_MALI_LPAE, IO_PGTABLE_NUM_FMTS, }; @@ -108,6 +109,11 @@ struct io_pgtable_cfg { u32 nmrr; u32 prrr; } arm_v7s_cfg; + + struct { + u64 transtab; + u64 memattr; + } arm_mali_lpae_cfg; }; }; @@ -209,5 +215,6 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns; +extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns; #endif /* __IO_PGTABLE_H */ -- 2.19.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 0/3] Initial Panfrost driver @ 2019-04-01 7:47 Rob Herring 2019-04-01 7:47 ` [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format Rob Herring 0 siblings, 1 reply; 10+ messages in thread From: Rob Herring @ 2019-04-01 7:47 UTC (permalink / raw) To: dri-devel Cc: Sean Paul, Lyude Paul, Eric Anholt, Maxime Ripard, Maarten Lankhorst, Joerg Roedel, Neil Armstrong, Will Deacon, linux-kernel, David Airlie, iommu, Alyssa Rosenzweig, Daniel Vetter, Robin Murphy, linux-arm-kernel Here's v2 of the panfrost driver. Lots of improvements from the RFC primarily with support for job hangs resetting the GPU and runtime-pm thanks to Tomeu. Several dependencies have been applied already, but the first 2 patches are the remaining dependencies. We need to take the iommu change via drm-misc or we need a stable branch. A git branch is here[1]. Rob [1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git panfrost-rebase-v2 Rob Herring (3): iommu: io-pgtable: Add ARM Mali midgard MMU page table format drm: Add a drm_gem_objects_lookup helper drm/panfrost: Add initial panfrost driver drivers/gpu/drm/Kconfig | 2 + drivers/gpu/drm/Makefile | 1 + drivers/gpu/drm/drm_gem.c | 46 +- drivers/gpu/drm/panfrost/Kconfig | 14 + drivers/gpu/drm/panfrost/Makefile | 12 + drivers/gpu/drm/panfrost/TODO | 27 + drivers/gpu/drm/panfrost/panfrost_devfreq.c | 191 +++++++ drivers/gpu/drm/panfrost/panfrost_devfreq.h | 14 + drivers/gpu/drm/panfrost/panfrost_device.c | 227 ++++++++ drivers/gpu/drm/panfrost/panfrost_device.h | 118 ++++ drivers/gpu/drm/panfrost/panfrost_drv.c | 484 ++++++++++++++++ drivers/gpu/drm/panfrost/panfrost_features.h | 309 +++++++++++ drivers/gpu/drm/panfrost/panfrost_gem.c | 92 +++ drivers/gpu/drm/panfrost/panfrost_gem.h | 29 + drivers/gpu/drm/panfrost/panfrost_gpu.c | 374 +++++++++++++ drivers/gpu/drm/panfrost/panfrost_gpu.h | 19 + drivers/gpu/drm/panfrost/panfrost_issues.h | 176 ++++++ drivers/gpu/drm/panfrost/panfrost_job.c | 556 +++++++++++++++++++ drivers/gpu/drm/panfrost/panfrost_job.h | 51 ++ drivers/gpu/drm/panfrost/panfrost_mmu.c | 366 ++++++++++++ drivers/gpu/drm/panfrost/panfrost_mmu.h | 17 + drivers/gpu/drm/panfrost/panfrost_regs.h | 298 ++++++++++ drivers/iommu/io-pgtable-arm.c | 93 +++- drivers/iommu/io-pgtable.c | 1 + include/drm/drm_gem.h | 2 + include/linux/io-pgtable.h | 7 + include/uapi/drm/panfrost_drm.h | 140 +++++ 27 files changed, 3633 insertions(+), 33 deletions(-) create mode 100644 drivers/gpu/drm/panfrost/Kconfig create mode 100644 drivers/gpu/drm/panfrost/Makefile create mode 100644 drivers/gpu/drm/panfrost/TODO create mode 100644 drivers/gpu/drm/panfrost/panfrost_devfreq.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_devfreq.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_device.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_drv.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_features.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_gem.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_gpu.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_issues.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_job.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.c create mode 100644 drivers/gpu/drm/panfrost/panfrost_mmu.h create mode 100644 drivers/gpu/drm/panfrost/panfrost_regs.h create mode 100644 include/uapi/drm/panfrost_drm.h -- 2.19.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-01 7:47 [PATCH v2 0/3] Initial Panfrost driver Rob Herring @ 2019-04-01 7:47 ` Rob Herring 2019-04-01 19:11 ` Robin Murphy 2019-04-05 9:42 ` Steven Price 0 siblings, 2 replies; 10+ messages in thread From: Rob Herring @ 2019-04-01 7:47 UTC (permalink / raw) To: dri-devel Cc: Sean Paul, Lyude Paul, Eric Anholt, Maxime Ripard, Maarten Lankhorst, Joerg Roedel, Neil Armstrong, Will Deacon, linux-kernel, David Airlie, iommu, Alyssa Rosenzweig, Daniel Vetter, Robin Murphy, linux-arm-kernel ARM Mali midgard GPU is similar to standard 64-bit stage 1 page tables, but have a few differences. Add a new format type to represent the format. The input address size is 48-bits and the output address size is 40-bits (and possibly less?). Note that the later bifrost GPUs follow the standard 64-bit stage 1 format. The differences in the format compared to 64-bit stage 1 format are: The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. The access flags are not read-only and unprivileged, but read and write. This is similar to stage 2 entries, but the memory attributes field matches stage 1 being an index. The nG bit is not set by the vendor driver. This one didn't seem to matter, but we'll keep it aligned to the vendor driver. Cc: Will Deacon <will.deacon@arm.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: linux-arm-kernel@lists.infradead.org Cc: iommu@lists.linux-foundation.org Signed-off-by: Rob Herring <robh@kernel.org> --- Please ack this as I need to apply it to the drm-misc tree. Or we need a stable branch with this patch. v3: - Rework arm_lpae_prot_to_pte as written by Robin - Fill in complete register values for TRANSTAB and MEMATTR registers drivers/iommu/io-pgtable-arm.c | 93 +++++++++++++++++++++++++--------- drivers/iommu/io-pgtable.c | 1 + include/linux/io-pgtable.h | 7 +++ 3 files changed, 77 insertions(+), 24 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index d3700ec15cbd..98551d0cff59 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -172,6 +172,10 @@ #define ARM_LPAE_MAIR_ATTR_IDX_CACHE 1 #define ARM_LPAE_MAIR_ATTR_IDX_DEV 2 +#define ARM_MALI_LPAE_TTBR_ADRMODE_TABLE (3u << 0) +#define ARM_MALI_LPAE_TTBR_READ_INNER BIT(2) +#define ARM_MALI_LPAE_TTBR_SHARE_OUTER BIT(4) + /* IOPTE accessors */ #define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d)) @@ -180,11 +184,6 @@ #define iopte_prot(pte) ((pte) & ARM_LPAE_PTE_ATTR_MASK) -#define iopte_leaf(pte,l) \ - (l == (ARM_LPAE_MAX_LEVELS - 1) ? \ - (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) : \ - (iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK)) - struct arm_lpae_io_pgtable { struct io_pgtable iop; @@ -198,6 +197,14 @@ struct arm_lpae_io_pgtable { typedef u64 arm_lpae_iopte; +static inline bool iopte_leaf(arm_lpae_iopte pte, int l, enum io_pgtable_fmt fmt) +{ + if ((l == (ARM_LPAE_MAX_LEVELS - 1)) && (fmt != ARM_MALI_LPAE)) + return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE; + + return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK; +} + static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, struct arm_lpae_io_pgtable *data) { @@ -303,12 +310,17 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS) pte |= ARM_LPAE_PTE_NS; - if (lvl == ARM_LPAE_MAX_LEVELS - 1) - pte |= ARM_LPAE_PTE_TYPE_PAGE; - else + if (lvl == ARM_LPAE_MAX_LEVELS - 1) { + if (data->iop.fmt == ARM_MALI_LPAE) + pte |= ARM_LPAE_PTE_TYPE_BLOCK; + else + pte |= ARM_LPAE_PTE_TYPE_PAGE; + } else pte |= ARM_LPAE_PTE_TYPE_BLOCK; - pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS; + if (data->iop.fmt != ARM_MALI_LPAE) + pte |= ARM_LPAE_PTE_AF; + pte |= ARM_LPAE_PTE_SH_IS; pte |= paddr_to_iopte(paddr, data); __arm_lpae_set_pte(ptep, pte, &data->iop.cfg); @@ -321,7 +333,7 @@ static int arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, { arm_lpae_iopte pte = *ptep; - if (iopte_leaf(pte, lvl)) { + if (iopte_leaf(pte, lvl, data->iop.fmt)) { /* We require an unmap first */ WARN_ON(!selftest_running); return -EEXIST; @@ -409,7 +421,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, __arm_lpae_sync_pte(ptep, cfg); } - if (pte && !iopte_leaf(pte, lvl)) { + if (pte && !iopte_leaf(pte, lvl, data->iop.fmt)) { cptep = iopte_deref(pte, data); } else if (pte) { /* We require an unmap first */ @@ -429,31 +441,33 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, if (data->iop.fmt == ARM_64_LPAE_S1 || data->iop.fmt == ARM_32_LPAE_S1) { pte = ARM_LPAE_PTE_nG; - if (!(prot & IOMMU_WRITE) && (prot & IOMMU_READ)) pte |= ARM_LPAE_PTE_AP_RDONLY; - if (!(prot & IOMMU_PRIV)) pte |= ARM_LPAE_PTE_AP_UNPRIV; - - if (prot & IOMMU_MMIO) - pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV - << ARM_LPAE_PTE_ATTRINDX_SHIFT); - else if (prot & IOMMU_CACHE) - pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE - << ARM_LPAE_PTE_ATTRINDX_SHIFT); } else { pte = ARM_LPAE_PTE_HAP_FAULT; if (prot & IOMMU_READ) pte |= ARM_LPAE_PTE_HAP_READ; if (prot & IOMMU_WRITE) pte |= ARM_LPAE_PTE_HAP_WRITE; + } + + if (data->iop.fmt == ARM_64_LPAE_S2 || + data->iop.fmt == ARM_32_LPAE_S2) { if (prot & IOMMU_MMIO) pte |= ARM_LPAE_PTE_MEMATTR_DEV; else if (prot & IOMMU_CACHE) pte |= ARM_LPAE_PTE_MEMATTR_OIWB; else pte |= ARM_LPAE_PTE_MEMATTR_NC; + } else { + if (prot & IOMMU_MMIO) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV + << ARM_LPAE_PTE_ATTRINDX_SHIFT); + else if (prot & IOMMU_CACHE) + pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE + << ARM_LPAE_PTE_ATTRINDX_SHIFT); } if (prot & IOMMU_NOEXEC) @@ -511,7 +525,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl, while (ptep != end) { arm_lpae_iopte pte = *ptep++; - if (!pte || iopte_leaf(pte, lvl)) + if (!pte || iopte_leaf(pte, lvl, data->iop.fmt)) continue; __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); @@ -602,7 +616,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) { __arm_lpae_set_pte(ptep, 0, &iop->cfg); - if (!iopte_leaf(pte, lvl)) { + if (!iopte_leaf(pte, lvl, iop->fmt)) { /* Also flush any partial walks */ io_pgtable_tlb_add_flush(iop, iova, size, ARM_LPAE_GRANULE(data), false); @@ -621,7 +635,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data, } return size; - } else if (iopte_leaf(pte, lvl)) { + } else if (iopte_leaf(pte, lvl, iop->fmt)) { /* * Insert a table at the next level to map the old region, * minus the part we want to unmap @@ -669,7 +683,7 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops, return 0; /* Leaf entry? */ - if (iopte_leaf(pte,lvl)) + if (iopte_leaf(pte, lvl, data->iop.fmt)) goto found_translation; /* Take it to the next level */ @@ -995,6 +1009,32 @@ arm_32_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) return iop; } +static struct io_pgtable * +arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) +{ + struct io_pgtable *iop; + + if (cfg->ias != 48 || cfg->oas > 40) + return NULL; + + cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G); + iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie); + if (iop) { + u64 mair, ttbr; + + /* Copy values as union fields overlap */ + mair = cfg->arm_lpae_s1_cfg.mair[0]; + ttbr = cfg->arm_lpae_s1_cfg.ttbr[0]; + + cfg->arm_mali_lpae_cfg.memattr = mair; + cfg->arm_mali_lpae_cfg.transtab = ttbr | + ARM_MALI_LPAE_TTBR_READ_INNER | + ARM_MALI_LPAE_TTBR_ADRMODE_TABLE; + } + + return iop; +} + struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { .alloc = arm_64_lpae_alloc_pgtable_s1, .free = arm_lpae_free_pgtable, @@ -1015,6 +1055,11 @@ struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { .free = arm_lpae_free_pgtable, }; +struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { + .alloc = arm_mali_lpae_alloc_pgtable, + .free = arm_lpae_free_pgtable, +}; + #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST static struct io_pgtable_cfg *cfg_cookie; diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c index 93f2880be6c6..5227cfdbb65b 100644 --- a/drivers/iommu/io-pgtable.c +++ b/drivers/iommu/io-pgtable.c @@ -30,6 +30,7 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = { [ARM_32_LPAE_S2] = &io_pgtable_arm_32_lpae_s2_init_fns, [ARM_64_LPAE_S1] = &io_pgtable_arm_64_lpae_s1_init_fns, [ARM_64_LPAE_S2] = &io_pgtable_arm_64_lpae_s2_init_fns, + [ARM_MALI_LPAE] = &io_pgtable_arm_mali_lpae_init_fns, #endif #ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S [ARM_V7S] = &io_pgtable_arm_v7s_init_fns, diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 47d5ae559329..76969a564831 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -12,6 +12,7 @@ enum io_pgtable_fmt { ARM_64_LPAE_S1, ARM_64_LPAE_S2, ARM_V7S, + ARM_MALI_LPAE, IO_PGTABLE_NUM_FMTS, }; @@ -108,6 +109,11 @@ struct io_pgtable_cfg { u32 nmrr; u32 prrr; } arm_v7s_cfg; + + struct { + u64 transtab; + u64 memattr; + } arm_mali_lpae_cfg; }; }; @@ -209,5 +215,6 @@ extern struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns; +extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns; #endif /* __IO_PGTABLE_H */ -- 2.19.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-01 7:47 ` [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format Rob Herring @ 2019-04-01 19:11 ` Robin Murphy 2019-04-05 10:02 ` Robin Murphy 2019-04-11 13:15 ` Joerg Roedel 2019-04-05 9:42 ` Steven Price 1 sibling, 2 replies; 10+ messages in thread From: Robin Murphy @ 2019-04-01 19:11 UTC (permalink / raw) To: Rob Herring, dri-devel Cc: Lyude Paul, Eric Anholt, Maxime Ripard, Will Deacon, Joerg Roedel, Neil Armstrong, Maarten Lankhorst, linux-kernel, David Airlie, iommu, Alyssa Rosenzweig, Daniel Vetter, Sean Paul, linux-arm-kernel On 01/04/2019 08:47, Rob Herring wrote: > ARM Mali midgard GPU is similar to standard 64-bit stage 1 page tables, but > have a few differences. Add a new format type to represent the format. The > input address size is 48-bits and the output address size is 40-bits (and > possibly less?). Note that the later bifrost GPUs follow the standard > 64-bit stage 1 format. > > The differences in the format compared to 64-bit stage 1 format are: > > The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. > > The access flags are not read-only and unprivileged, but read and write. > This is similar to stage 2 entries, but the memory attributes field matches > stage 1 being an index. > > The nG bit is not set by the vendor driver. This one didn't seem to matter, > but we'll keep it aligned to the vendor driver. > > Cc: Will Deacon <will.deacon@arm.com> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: linux-arm-kernel@lists.infradead.org > Cc: iommu@lists.linux-foundation.org > Signed-off-by: Rob Herring <robh@kernel.org> > --- > Please ack this as I need to apply it to the drm-misc tree. Or we need a > stable branch with this patch. With the diff below squashed in to address my outstanding style nits, Acked-by: Robin Murphy <robin.murphy@arm.com> I don't foresee any conflicting io-pgtable changes to prevent this going via DRM, but I'll leave the final say up to Joerg. Thanks, Robin. ----->8----- diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 98551d0cff59..55ed039da166 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -197,12 +197,13 @@ struct arm_lpae_io_pgtable { typedef u64 arm_lpae_iopte; -static inline bool iopte_leaf(arm_lpae_iopte pte, int l, enum io_pgtable_fmt fmt) +static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl, + enum io_pgtable_fmt fmt) { - if ((l == (ARM_LPAE_MAX_LEVELS - 1)) && (fmt != ARM_MALI_LPAE)) - return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE; + if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE) + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_PAGE; - return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK; + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_BLOCK; } static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, @@ -310,13 +311,10 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS) pte |= ARM_LPAE_PTE_NS; - if (lvl == ARM_LPAE_MAX_LEVELS - 1) { - if (data->iop.fmt == ARM_MALI_LPAE) - pte |= ARM_LPAE_PTE_TYPE_BLOCK; - else - pte |= ARM_LPAE_PTE_TYPE_PAGE; - } else + if (data->iop.fmt != ARM_MALI_LPAE && lvl != ARM_LPAE_MAX_LEVELS - 1) pte |= ARM_LPAE_PTE_TYPE_BLOCK; + else + pte |= ARM_LPAE_PTE_TYPE_PAGE; if (data->iop.fmt != ARM_MALI_LPAE) pte |= ARM_LPAE_PTE_AF; _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-01 19:11 ` Robin Murphy @ 2019-04-05 10:02 ` Robin Murphy 2019-04-11 13:15 ` Joerg Roedel 1 sibling, 0 replies; 10+ messages in thread From: Robin Murphy @ 2019-04-05 10:02 UTC (permalink / raw) To: Rob Herring, dri-devel Cc: Lyude Paul, Neil Armstrong, Maxime Ripard, Will Deacon, David Airlie, Maarten Lankhorst, linux-kernel, Eric Anholt, iommu, linux-arm-kernel, Daniel Vetter, Sean Paul, Alyssa Rosenzweig On 01/04/2019 20:11, Robin Murphy wrote: > On 01/04/2019 08:47, Rob Herring wrote: >> ARM Mali midgard GPU is similar to standard 64-bit stage 1 page >> tables, but >> have a few differences. Add a new format type to represent the format. >> The >> input address size is 48-bits and the output address size is 40-bits (and >> possibly less?). Note that the later bifrost GPUs follow the standard >> 64-bit stage 1 format. >> >> The differences in the format compared to 64-bit stage 1 format are: >> >> The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. >> >> The access flags are not read-only and unprivileged, but read and write. >> This is similar to stage 2 entries, but the memory attributes field >> matches >> stage 1 being an index. >> >> The nG bit is not set by the vendor driver. This one didn't seem to >> matter, >> but we'll keep it aligned to the vendor driver. >> >> Cc: Will Deacon <will.deacon@arm.com> >> Cc: Robin Murphy <robin.murphy@arm.com> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: linux-arm-kernel@lists.infradead.org >> Cc: iommu@lists.linux-foundation.org >> Signed-off-by: Rob Herring <robh@kernel.org> >> --- >> Please ack this as I need to apply it to the drm-misc tree. Or we need a >> stable branch with this patch. > > With the diff below squashed in to address my outstanding style nits, > > Acked-by: Robin Murphy <robin.murphy@arm.com> > > I don't foresee any conflicting io-pgtable changes to prevent this going > via DRM, but I'll leave the final say up to Joerg. Urgh, sorry, turns out I flipped one condition too many there. On reflection I may also forget my clever trick in future and inadvertently break it, so it probably warrants a comment. Please supersede my previous request with the (actually tested) diff below :) Thanks, Robin. ----->8----- diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 98551d0cff59..4f7be5a3e19b 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -197,12 +197,13 @@ struct arm_lpae_io_pgtable { typedef u64 arm_lpae_iopte; -static inline bool iopte_leaf(arm_lpae_iopte pte, int l, enum io_pgtable_fmt fmt) +static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl, + enum io_pgtable_fmt fmt) { - if ((l == (ARM_LPAE_MAX_LEVELS - 1)) && (fmt != ARM_MALI_LPAE)) - return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE; + if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE) + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_PAGE; - return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK; + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_BLOCK; } static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, @@ -310,12 +311,9 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data, if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS) pte |= ARM_LPAE_PTE_NS; - if (lvl == ARM_LPAE_MAX_LEVELS - 1) { - if (data->iop.fmt == ARM_MALI_LPAE) - pte |= ARM_LPAE_PTE_TYPE_BLOCK; - else - pte |= ARM_LPAE_PTE_TYPE_PAGE; - } else + if (data->iop.fmt != ARM_MALI_LPAE && lvl == ARM_LPAE_MAX_LEVELS - 1) + pte |= ARM_LPAE_PTE_TYPE_PAGE; + else pte |= ARM_LPAE_PTE_TYPE_BLOCK; if (data->iop.fmt != ARM_MALI_LPAE) @@ -452,7 +450,10 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data, if (prot & IOMMU_WRITE) pte |= ARM_LPAE_PTE_HAP_WRITE; } - + /* + * Note that this logic is structured to accommodate Mali LPAE + * having stage-1-like attributes but stage-2-like permissions. + */ if (data->iop.fmt == ARM_64_LPAE_S2 || data->iop.fmt == ARM_32_LPAE_S2) { if (prot & IOMMU_MMIO) _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-01 19:11 ` Robin Murphy 2019-04-05 10:02 ` Robin Murphy @ 2019-04-11 13:15 ` Joerg Roedel 1 sibling, 0 replies; 10+ messages in thread From: Joerg Roedel @ 2019-04-11 13:15 UTC (permalink / raw) To: Robin Murphy Cc: Rob Herring, Lyude Paul, Eric Anholt, Maxime Ripard, Maarten Lankhorst, Neil Armstrong, Will Deacon, linux-kernel, dri-devel, David Airlie, iommu, Alyssa Rosenzweig, Daniel Vetter, Sean Paul, linux-arm-kernel On Mon, Apr 01, 2019 at 08:11:00PM +0100, Robin Murphy wrote: > With the diff below squashed in to address my outstanding style nits, > > Acked-by: Robin Murphy <robin.murphy@arm.com> > > I don't foresee any conflicting io-pgtable changes to prevent this going via > DRM, but I'll leave the final say up to Joerg. No objection from my side with merging this via DRM. With Robin's concerns addressed: Acked-by: Joerg Roedel <jroedel@suse.de> > > Thanks, > Robin. > > ----->8----- > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 98551d0cff59..55ed039da166 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -197,12 +197,13 @@ struct arm_lpae_io_pgtable { > > typedef u64 arm_lpae_iopte; > > -static inline bool iopte_leaf(arm_lpae_iopte pte, int l, enum > io_pgtable_fmt fmt) > +static inline bool iopte_leaf(arm_lpae_iopte pte, int lvl, > + enum io_pgtable_fmt fmt) > { > - if ((l == (ARM_LPAE_MAX_LEVELS - 1)) && (fmt != ARM_MALI_LPAE)) > - return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE; > + if (lvl == (ARM_LPAE_MAX_LEVELS - 1) && fmt != ARM_MALI_LPAE) > + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_PAGE; > > - return iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK; > + return iopte_type(pte, lvl) == ARM_LPAE_PTE_TYPE_BLOCK; > } > > static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr, > @@ -310,13 +311,10 @@ static void __arm_lpae_init_pte(struct > arm_lpae_io_pgtable *data, > if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_NS) > pte |= ARM_LPAE_PTE_NS; > > - if (lvl == ARM_LPAE_MAX_LEVELS - 1) { > - if (data->iop.fmt == ARM_MALI_LPAE) > - pte |= ARM_LPAE_PTE_TYPE_BLOCK; > - else > - pte |= ARM_LPAE_PTE_TYPE_PAGE; > - } else > + if (data->iop.fmt != ARM_MALI_LPAE && lvl != ARM_LPAE_MAX_LEVELS - 1) > pte |= ARM_LPAE_PTE_TYPE_BLOCK; > + else > + pte |= ARM_LPAE_PTE_TYPE_PAGE; > > if (data->iop.fmt != ARM_MALI_LPAE) > pte |= ARM_LPAE_PTE_AF; _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-01 7:47 ` [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format Rob Herring 2019-04-01 19:11 ` Robin Murphy @ 2019-04-05 9:42 ` Steven Price 2019-04-05 9:51 ` Robin Murphy 1 sibling, 1 reply; 10+ messages in thread From: Steven Price @ 2019-04-05 9:42 UTC (permalink / raw) To: Rob Herring, dri-devel Cc: Neil Armstrong, Maxime Ripard, Robin Murphy, Will Deacon, linux-kernel, David Airlie, iommu, linux-arm-kernel, Sean Paul, Alyssa Rosenzweig First let me say congratulations to everyone working on Panfrost - it's an impressive achievement! Full disclosure: I used to work on the Mali kbase driver. And have been playing around with running the Mali user-space blob with the Panfrost kernel driver. On 01/04/2019 08:47, Rob Herring wrote: > ARM Mali midgard GPU is similar to standard 64-bit stage 1 page tables, but > have a few differences. Add a new format type to represent the format. The > input address size is 48-bits and the output address size is 40-bits (and > possibly less?). Note that the later bifrost GPUs follow the standard > 64-bit stage 1 format. > > The differences in the format compared to 64-bit stage 1 format are: > > The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. > > The access flags are not read-only and unprivileged, but read and write. > This is similar to stage 2 entries, but the memory attributes field matches > stage 1 being an index. > > The nG bit is not set by the vendor driver. This one didn't seem to matter, > but we'll keep it aligned to the vendor driver. The nG bit should be ignored by the hardware. The MMU in Midgard/Bifrost has a quirk similar to IO_PGTABLE_QUIRK_TLBI_ON_MAP - you must perform a cache flush for the GPU to (reliably) pick up new page table mappings. You may not have seen this because of the use of the JS_CONFIG_START_MMU bit - this effectively performs a cache flush and TLB invalidate before starting a job, however when using a GPU like T760 (e.g. on the Firefly RK3288) this bit isn't being set. In my testing on the Firefly board I saw GPU page faults because of this. There's two options for fixing this - a patch like below adds the quirk mode to the MMU. Or alternatively always set JS_CONFIG_START_MMU on jobs. In my testing both options solve the page faults. To be honest I don't know the reasoning behind kbase making the JS_CONFIG_START_MMU bit conditional - I'm not aware of any reason why it can't always be set. My guess is performance, but I haven't benchmarked the difference between this and JS_CONFIG_START_MMU. -----8<---------- From e3f75c7f04e43238dfc579029b8c11fb6b4a0c18 Mon Sep 17 00:00:00 2001 From: Steven Price <steven.price@arm.com> Date: Thu, 4 Apr 2019 15:53:17 +0100 Subject: [PATCH] iommu: io-pgtable: IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE Midgard/Bifrost GPUs require a TLB invalidation when mapping pages, implement IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE iommu page table formats and add the quirk bit to Panfrost. Signed-off-by: Steven Price <steven.price@arm.com> --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/iommu/io-pgtable-arm.c | 11 +++++++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index f3aad8591cf4..094312074d66 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -343,6 +343,7 @@ int panfrost_mmu_init(struct panfrost_device *pfdev) mmu_write(pfdev, MMU_INT_MASK, ~0); pfdev->mmu->pgtbl_cfg = (struct io_pgtable_cfg) { + .quirks = IO_PGTABLE_QUIRK_TLBI_ON_MAP, .pgsize_bitmap = SZ_4K, // | SZ_2M | SZ_1G), .ias = 48, .oas = 40, /* Should come from dma mask? */ diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 84beea1f47a7..45fd7bbdf9aa 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -505,7 +505,13 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova, * Synchronise all PTE updates for the new mapping before there's * a chance for anything to kick off a table walk for the new iova. */ - wmb(); + if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) { + io_pgtable_tlb_add_flush(&data->iop, iova, size, + ARM_LPAE_BLOCK_SIZE(2, data), false); + io_pgtable_tlb_sync(&data->iop); + } else { + wmb(); + } return ret; } @@ -800,7 +806,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) struct arm_lpae_io_pgtable *data; if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA | - IO_PGTABLE_QUIRK_NON_STRICT)) + IO_PGTABLE_QUIRK_NON_STRICT | + IO_PGTABLE_QUIRK_TLBI_ON_MAP)) return NULL; data = arm_lpae_alloc_pgtable(cfg); -- 2.20.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-05 9:42 ` Steven Price @ 2019-04-05 9:51 ` Robin Murphy 2019-04-05 10:36 ` Steven Price 0 siblings, 1 reply; 10+ messages in thread From: Robin Murphy @ 2019-04-05 9:51 UTC (permalink / raw) To: Steven Price, Rob Herring, dri-devel Cc: Neil Armstrong, Maxime Ripard, Will Deacon, linux-kernel, David Airlie, iommu, linux-arm-kernel, Sean Paul, Alyssa Rosenzweig Hi Steve, On 05/04/2019 10:42, Steven Price wrote: > First let me say congratulations to everyone working on Panfrost - it's > an impressive achievement! > > Full disclosure: I used to work on the Mali kbase driver. And have been > playing around with running the Mali user-space blob with the Panfrost > kernel driver. > > On 01/04/2019 08:47, Rob Herring wrote: >> ARM Mali midgard GPU is similar to standard 64-bit stage 1 page tables, but >> have a few differences. Add a new format type to represent the format. The >> input address size is 48-bits and the output address size is 40-bits (and >> possibly less?). Note that the later bifrost GPUs follow the standard >> 64-bit stage 1 format. >> >> The differences in the format compared to 64-bit stage 1 format are: >> >> The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. >> >> The access flags are not read-only and unprivileged, but read and write. >> This is similar to stage 2 entries, but the memory attributes field matches >> stage 1 being an index. >> >> The nG bit is not set by the vendor driver. This one didn't seem to matter, >> but we'll keep it aligned to the vendor driver. > > The nG bit should be ignored by the hardware. > > The MMU in Midgard/Bifrost has a quirk similar to > IO_PGTABLE_QUIRK_TLBI_ON_MAP - you must perform a cache flush for the > GPU to (reliably) pick up new page table mappings. > > You may not have seen this because of the use of the JS_CONFIG_START_MMU > bit - this effectively performs a cache flush and TLB invalidate before > starting a job, however when using a GPU like T760 (e.g. on the Firefly > RK3288) this bit isn't being set. In my testing on the Firefly board I > saw GPU page faults because of this. > > There's two options for fixing this - a patch like below adds the quirk > mode to the MMU. Or alternatively always set JS_CONFIG_START_MMU on > jobs. In my testing both options solve the page faults. > > To be honest I don't know the reasoning behind kbase making the > JS_CONFIG_START_MMU bit conditional - I'm not aware of any reason why it > can't always be set. My guess is performance, but I haven't benchmarked > the difference between this and JS_CONFIG_START_MMU. > > -----8<---------- > From e3f75c7f04e43238dfc579029b8c11fb6b4a0c18 Mon Sep 17 00:00:00 2001 > From: Steven Price <steven.price@arm.com> > Date: Thu, 4 Apr 2019 15:53:17 +0100 > Subject: [PATCH] iommu: io-pgtable: IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE > > Midgard/Bifrost GPUs require a TLB invalidation when mapping pages, > implement IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE iommu page table > formats and add the quirk bit to Panfrost. > > Signed-off-by: Steven Price <steven.price@arm.com> > --- > drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + > drivers/iommu/io-pgtable-arm.c | 11 +++++++++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c > b/drivers/gpu/drm/panfrost/panfrost_mmu.c > index f3aad8591cf4..094312074d66 100644 > --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c > +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c > @@ -343,6 +343,7 @@ int panfrost_mmu_init(struct panfrost_device *pfdev) > mmu_write(pfdev, MMU_INT_MASK, ~0); > > pfdev->mmu->pgtbl_cfg = (struct io_pgtable_cfg) { > + .quirks = IO_PGTABLE_QUIRK_TLBI_ON_MAP, > .pgsize_bitmap = SZ_4K, // | SZ_2M | SZ_1G), > .ias = 48, > .oas = 40, /* Should come from dma mask? */ > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 84beea1f47a7..45fd7bbdf9aa 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -505,7 +505,13 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, > unsigned long iova, > * Synchronise all PTE updates for the new mapping before there's > * a chance for anything to kick off a table walk for the new iova. > */ > - wmb(); > + if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) { > + io_pgtable_tlb_add_flush(&data->iop, iova, size, > + ARM_LPAE_BLOCK_SIZE(2, data), false); For correctness (in case this ever ends up used for something with VMSA-like invalidation behaviour), the granule would need to be "size" here, rather than effectively hard-coded. However, since Mali's invalidations appear to operate on arbitrary ranges, it would probably be a lot more efficient for the driver to handle this case directly, by just issuing a single big invalidation after the for_each_sg() loop in panfrost_mmu_map(). Robin. > + io_pgtable_tlb_sync(&data->iop); > + } else { > + wmb(); > + } > > return ret; > } > @@ -800,7 +806,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg > *cfg, void *cookie) > struct arm_lpae_io_pgtable *data; > > if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA | > - IO_PGTABLE_QUIRK_NON_STRICT)) > + IO_PGTABLE_QUIRK_NON_STRICT | > + IO_PGTABLE_QUIRK_TLBI_ON_MAP)) > return NULL; > > data = arm_lpae_alloc_pgtable(cfg); > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-05 9:51 ` Robin Murphy @ 2019-04-05 10:36 ` Steven Price 2019-04-08 8:56 ` Steven Price 0 siblings, 1 reply; 10+ messages in thread From: Steven Price @ 2019-04-05 10:36 UTC (permalink / raw) To: Robin Murphy, Rob Herring, dri-devel Cc: Neil Armstrong, Maxime Ripard, Will Deacon, linux-kernel, David Airlie, iommu, Alyssa Rosenzweig, Sean Paul, linux-arm-kernel On 05/04/2019 10:51, Robin Murphy wrote: > Hi Steve, > > On 05/04/2019 10:42, Steven Price wrote: >> First let me say congratulations to everyone working on Panfrost - it's >> an impressive achievement! >> >> Full disclosure: I used to work on the Mali kbase driver. And have been >> playing around with running the Mali user-space blob with the Panfrost >> kernel driver. >> >> On 01/04/2019 08:47, Rob Herring wrote: >>> ARM Mali midgard GPU is similar to standard 64-bit stage 1 page >>> tables, but >>> have a few differences. Add a new format type to represent the >>> format. The >>> input address size is 48-bits and the output address size is 40-bits >>> (and >>> possibly less?). Note that the later bifrost GPUs follow the standard >>> 64-bit stage 1 format. >>> >>> The differences in the format compared to 64-bit stage 1 format are: >>> >>> The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. >>> >>> The access flags are not read-only and unprivileged, but read and write. >>> This is similar to stage 2 entries, but the memory attributes field >>> matches >>> stage 1 being an index. >>> >>> The nG bit is not set by the vendor driver. This one didn't seem to >>> matter, >>> but we'll keep it aligned to the vendor driver. >> >> The nG bit should be ignored by the hardware. >> >> The MMU in Midgard/Bifrost has a quirk similar to >> IO_PGTABLE_QUIRK_TLBI_ON_MAP - you must perform a cache flush for the >> GPU to (reliably) pick up new page table mappings. >> >> You may not have seen this because of the use of the JS_CONFIG_START_MMU >> bit - this effectively performs a cache flush and TLB invalidate before >> starting a job, however when using a GPU like T760 (e.g. on the Firefly >> RK3288) this bit isn't being set. In my testing on the Firefly board I >> saw GPU page faults because of this. >> >> There's two options for fixing this - a patch like below adds the quirk >> mode to the MMU. Or alternatively always set JS_CONFIG_START_MMU on >> jobs. In my testing both options solve the page faults. >> >> To be honest I don't know the reasoning behind kbase making the >> JS_CONFIG_START_MMU bit conditional - I'm not aware of any reason why it >> can't always be set. My guess is performance, but I haven't benchmarked >> the difference between this and JS_CONFIG_START_MMU. >> >> -----8<---------- >> From e3f75c7f04e43238dfc579029b8c11fb6b4a0c18 Mon Sep 17 00:00:00 2001 >> From: Steven Price <steven.price@arm.com> >> Date: Thu, 4 Apr 2019 15:53:17 +0100 >> Subject: [PATCH] iommu: io-pgtable: IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE >> >> Midgard/Bifrost GPUs require a TLB invalidation when mapping pages, >> implement IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE iommu page table >> formats and add the quirk bit to Panfrost. >> >> Signed-off-by: Steven Price <steven.price@arm.com> >> --- >> drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + >> drivers/iommu/io-pgtable-arm.c | 11 +++++++++-- >> 2 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c >> b/drivers/gpu/drm/panfrost/panfrost_mmu.c >> index f3aad8591cf4..094312074d66 100644 >> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c >> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c >> @@ -343,6 +343,7 @@ int panfrost_mmu_init(struct panfrost_device *pfdev) >> mmu_write(pfdev, MMU_INT_MASK, ~0); >> >> pfdev->mmu->pgtbl_cfg = (struct io_pgtable_cfg) { >> + .quirks = IO_PGTABLE_QUIRK_TLBI_ON_MAP, >> .pgsize_bitmap = SZ_4K, // | SZ_2M | SZ_1G), >> .ias = 48, >> .oas = 40, /* Should come from dma mask? */ >> diff --git a/drivers/iommu/io-pgtable-arm.c >> b/drivers/iommu/io-pgtable-arm.c >> index 84beea1f47a7..45fd7bbdf9aa 100644 >> --- a/drivers/iommu/io-pgtable-arm.c >> +++ b/drivers/iommu/io-pgtable-arm.c >> @@ -505,7 +505,13 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, >> unsigned long iova, >> * Synchronise all PTE updates for the new mapping before there's >> * a chance for anything to kick off a table walk for the new iova. >> */ >> - wmb(); >> + if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) { >> + io_pgtable_tlb_add_flush(&data->iop, iova, size, >> + ARM_LPAE_BLOCK_SIZE(2, data), false); > > For correctness (in case this ever ends up used for something with > VMSA-like invalidation behaviour), the granule would need to be "size" > here, rather than effectively hard-coded. Ah yes - I did rather just copy/paste this from io-pgtable-arm-v7s with minor fix-ups. > However, since Mali's invalidations appear to operate on arbitrary > ranges, it would probably be a lot more efficient for the driver to > handle this case directly, by just issuing a single big invalidation > after the for_each_sg() loop in panfrost_mmu_map(). Yes - that would probably be a better option. Although I think personally I'd lean towards just using JS_CONFIG_START_MMU for most cases. The only thing that won't handle is modifying the MMU while the job is running (e.g. faulting in pages). But that can be handled internally in Panfrost by invalidating the exact region which is being populated. Steve > Robin. > >> + io_pgtable_tlb_sync(&data->iop); >> + } else { >> + wmb(); >> + } >> >> return ret; >> } >> @@ -800,7 +806,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg >> *cfg, void *cookie) >> struct arm_lpae_io_pgtable *data; >> >> if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | >> IO_PGTABLE_QUIRK_NO_DMA | >> - IO_PGTABLE_QUIRK_NON_STRICT)) >> + IO_PGTABLE_QUIRK_NON_STRICT | >> + IO_PGTABLE_QUIRK_TLBI_ON_MAP)) >> return NULL; >> >> data = arm_lpae_alloc_pgtable(cfg); >> > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format 2019-04-05 10:36 ` Steven Price @ 2019-04-08 8:56 ` Steven Price 0 siblings, 0 replies; 10+ messages in thread From: Steven Price @ 2019-04-08 8:56 UTC (permalink / raw) To: Robin Murphy, Rob Herring, dri-devel Cc: Neil Armstrong, Maxime Ripard, Will Deacon, linux-kernel, David Airlie, iommu, linux-arm-kernel, Sean Paul, Alyssa Rosenzweig On 05/04/2019 11:36, Steven Price wrote: > On 05/04/2019 10:51, Robin Murphy wrote: >> Hi Steve, >> >> On 05/04/2019 10:42, Steven Price wrote: >>> First let me say congratulations to everyone working on Panfrost - it's >>> an impressive achievement! >>> >>> Full disclosure: I used to work on the Mali kbase driver. And have been >>> playing around with running the Mali user-space blob with the Panfrost >>> kernel driver. >>> >>> On 01/04/2019 08:47, Rob Herring wrote: >>>> ARM Mali midgard GPU is similar to standard 64-bit stage 1 page >>>> tables, but >>>> have a few differences. Add a new format type to represent the >>>> format. The >>>> input address size is 48-bits and the output address size is 40-bits >>>> (and >>>> possibly less?). Note that the later bifrost GPUs follow the standard >>>> 64-bit stage 1 format. >>>> >>>> The differences in the format compared to 64-bit stage 1 format are: >>>> >>>> The 3rd level page entry bits are 0x1 instead of 0x3 for page entries. >>>> >>>> The access flags are not read-only and unprivileged, but read and write. >>>> This is similar to stage 2 entries, but the memory attributes field >>>> matches >>>> stage 1 being an index. >>>> >>>> The nG bit is not set by the vendor driver. This one didn't seem to >>>> matter, >>>> but we'll keep it aligned to the vendor driver. >>> >>> The nG bit should be ignored by the hardware. >>> >>> The MMU in Midgard/Bifrost has a quirk similar to >>> IO_PGTABLE_QUIRK_TLBI_ON_MAP - you must perform a cache flush for the >>> GPU to (reliably) pick up new page table mappings. >>> >>> You may not have seen this because of the use of the JS_CONFIG_START_MMU >>> bit - this effectively performs a cache flush and TLB invalidate before >>> starting a job, however when using a GPU like T760 (e.g. on the Firefly >>> RK3288) this bit isn't being set. In my testing on the Firefly board I >>> saw GPU page faults because of this. >>> >>> There's two options for fixing this - a patch like below adds the quirk >>> mode to the MMU. Or alternatively always set JS_CONFIG_START_MMU on >>> jobs. In my testing both options solve the page faults. >>> >>> To be honest I don't know the reasoning behind kbase making the >>> JS_CONFIG_START_MMU bit conditional - I'm not aware of any reason why it >>> can't always be set. My guess is performance, but I haven't benchmarked >>> the difference between this and JS_CONFIG_START_MMU. >>> >>> -----8<---------- >>> From e3f75c7f04e43238dfc579029b8c11fb6b4a0c18 Mon Sep 17 00:00:00 2001 >>> From: Steven Price <steven.price@arm.com> >>> Date: Thu, 4 Apr 2019 15:53:17 +0100 >>> Subject: [PATCH] iommu: io-pgtable: IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE >>> >>> Midgard/Bifrost GPUs require a TLB invalidation when mapping pages, >>> implement IO_PGTABLE_QUIRK_TLBI_ON_MAP for LPAE iommu page table >>> formats and add the quirk bit to Panfrost. >>> >>> Signed-off-by: Steven Price <steven.price@arm.com> >>> --- >>> drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + >>> drivers/iommu/io-pgtable-arm.c | 11 +++++++++-- >>> 2 files changed, 10 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c >>> b/drivers/gpu/drm/panfrost/panfrost_mmu.c >>> index f3aad8591cf4..094312074d66 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c >>> +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c >>> @@ -343,6 +343,7 @@ int panfrost_mmu_init(struct panfrost_device *pfdev) >>> mmu_write(pfdev, MMU_INT_MASK, ~0); >>> >>> pfdev->mmu->pgtbl_cfg = (struct io_pgtable_cfg) { >>> + .quirks = IO_PGTABLE_QUIRK_TLBI_ON_MAP, >>> .pgsize_bitmap = SZ_4K, // | SZ_2M | SZ_1G), >>> .ias = 48, >>> .oas = 40, /* Should come from dma mask? */ >>> diff --git a/drivers/iommu/io-pgtable-arm.c >>> b/drivers/iommu/io-pgtable-arm.c >>> index 84beea1f47a7..45fd7bbdf9aa 100644 >>> --- a/drivers/iommu/io-pgtable-arm.c >>> +++ b/drivers/iommu/io-pgtable-arm.c >>> @@ -505,7 +505,13 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, >>> unsigned long iova, >>> * Synchronise all PTE updates for the new mapping before there's >>> * a chance for anything to kick off a table walk for the new iova. >>> */ >>> - wmb(); >>> + if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_TLBI_ON_MAP) { >>> + io_pgtable_tlb_add_flush(&data->iop, iova, size, >>> + ARM_LPAE_BLOCK_SIZE(2, data), false); >> >> For correctness (in case this ever ends up used for something with >> VMSA-like invalidation behaviour), the granule would need to be "size" >> here, rather than effectively hard-coded. > > Ah yes - I did rather just copy/paste this from io-pgtable-arm-v7s with > minor fix-ups. > >> However, since Mali's invalidations appear to operate on arbitrary >> ranges, it would probably be a lot more efficient for the driver to >> handle this case directly, by just issuing a single big invalidation >> after the for_each_sg() loop in panfrost_mmu_map(). > > Yes - that would probably be a better option. Although I think > personally I'd lean towards just using JS_CONFIG_START_MMU for most > cases. The only thing that won't handle is modifying the MMU while the > job is running (e.g. faulting in pages). But that can be handled > internally in Panfrost by invalidating the exact region which is being > populated. I asked around. Apparently there are some interesting issues with START_MMU on some hardware revisions. So best to follow mali_kbase here and only use START_MMU on those hardware revisions that mali_kbase does (what Panfrost is already doing). Which means we'll definitely need this quirk in some form. Steve > > Steve > >> Robin. >> >>> + io_pgtable_tlb_sync(&data->iop); >>> + } else { >>> + wmb(); >>> + } >>> >>> return ret; >>> } >>> @@ -800,7 +806,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg >>> *cfg, void *cookie) >>> struct arm_lpae_io_pgtable *data; >>> >>> if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | >>> IO_PGTABLE_QUIRK_NO_DMA | >>> - IO_PGTABLE_QUIRK_NON_STRICT)) >>> + IO_PGTABLE_QUIRK_NON_STRICT | >>> + IO_PGTABLE_QUIRK_TLBI_ON_MAP)) >>> return NULL; >>> >>> data = arm_lpae_alloc_pgtable(cfg); >>> >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2019-04-11 13:16 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-04-09 20:54 [PATCH v3 0/3] Initial Panfrost driver Rob Herring 2019-04-09 20:54 ` [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format Rob Herring -- strict thread matches above, loose matches on Subject: below -- 2019-04-01 7:47 [PATCH v2 0/3] Initial Panfrost driver Rob Herring 2019-04-01 7:47 ` [PATCH v3 1/3] iommu: io-pgtable: Add ARM Mali midgard MMU page table format Rob Herring 2019-04-01 19:11 ` Robin Murphy 2019-04-05 10:02 ` Robin Murphy 2019-04-11 13:15 ` Joerg Roedel 2019-04-05 9:42 ` Steven Price 2019-04-05 9:51 ` Robin Murphy 2019-04-05 10:36 ` Steven Price 2019-04-08 8:56 ` Steven Price
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).