linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/2]  iommu/arm-smmu: hard iova_to_phys
@ 2014-10-29 21:13 Mitchel Humpherys
  2014-10-29 21:13 ` [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
  2014-10-29 21:13 ` [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
  0 siblings, 2 replies; 9+ messages in thread
From: Mitchel Humpherys @ 2014-10-29 21:13 UTC (permalink / raw)
  To: linux-arm-kernel

This series introduces support for performing iova-to-phys translations via
the ARM SMMU hardware on supported implementations. We also make use of
some new generic macros for polling hardware registers.

v6..v7:

  - iopoll: no changes. resending series due to arm-smmu change.
  - arm-smmu: added missing lock and fixed physical address mask

v5..v6:

  - iopoll: use shift instead of divide
  - arm-smmu: no changes, resending series due to iopoll change.

v4..v5:

  - iopoll: Added support for other accessor functions
  - iopoll: Unified atomic and non-atomic interfaces
  - iopoll: Fixed erroneous `might_sleep'
  - arm-smmu: Lowered timeout and moved to new iopoll atomic interface

v3..v4:

  - Updated the iopoll commit message to reflect the patch better
  - Added locking around address translation op
  - Return 0 on iova_to_phys failure

v2..v3:

  - Removed unnecessary `dev_name's

v1..v2:

  - Renamed one of the iopoll macros to use the more standard `_atomic'
    suffix
  - Removed some convenience iopoll wrappers to encourage explicitness


Matt Wagantall (1):
  iopoll: Introduce memory-mapped IO polling macros

Mitchel Humpherys (1):
  iommu/arm-smmu: add support for iova_to_phys through ATS1PR

 drivers/iommu/arm-smmu.c |  80 +++++++++++++++++-
 include/linux/iopoll.h   | 213 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 292 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/iopoll.h

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-29 21:13 [PATCH v7 0/2] iommu/arm-smmu: hard iova_to_phys Mitchel Humpherys
@ 2014-10-29 21:13 ` Mitchel Humpherys
  2014-10-30 11:41   ` Will Deacon
  2014-10-29 21:13 ` [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
  1 sibling, 1 reply; 9+ messages in thread
From: Mitchel Humpherys @ 2014-10-29 21:13 UTC (permalink / raw)
  To: linux-arm-kernel

From: Matt Wagantall <mattw@codeaurora.org>

It is sometimes necessary to poll a memory-mapped register until its value
satisfies some condition. Introduce a family of convenience macros that do
this. Tight-looping, sleeping, and timing out can all be accomplished using
these macros.

Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
---
Changes since v6:
  - No changes. Resending due to changes in the the next patch in the series.
---
 include/linux/iopoll.h | 213 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 213 insertions(+)
 create mode 100644 include/linux/iopoll.h

diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
new file mode 100644
index 0000000000..21dd41942b
--- /dev/null
+++ b/include/linux/iopoll.h
@@ -0,0 +1,213 @@
+/*
+ * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_IOPOLL_H
+#define _LINUX_IOPOLL_H
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/hrtimer.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+
+/**
+ * readx_poll_timeout - Periodically poll an address until a condition is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val. Must not
+ * be called from atomic context if sleep_us or timeout_us are used.
+ *
+ * Generally you'll want to use one of the specialized macros defined below
+ * rather than this macro directly.
+ */
+#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)	\
+({ \
+	ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+	might_sleep_if(sleep_us); \
+	for (;;) { \
+		(val) = op(addr); \
+		if (cond) \
+			break; \
+		if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
+			(val) = op(addr); \
+			break; \
+		} \
+		if (sleep_us) \
+			usleep_range((sleep_us >> 2) + 1, sleep_us); \
+	} \
+	(cond) ? 0 : -ETIMEDOUT; \
+})
+
+/**
+ * readx_poll_timeout_atomic - Periodically poll an address until a condition is met or a timeout occurs
+ * @op: accessor function (takes @addr as its only argument)
+ * @addr: Address to poll
+ * @val: Variable to read the value into
+ * @cond: Break condition (usually involving @val)
+ * @delay_us: Time to udelay between reads in us (0 tight-loops)
+ * @timeout_us: Timeout in us, 0 means never timeout
+ *
+ * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
+ * case, the last read value at @addr is stored in @val.
+ *
+ * Generally you'll want to use one of the specialized macros defined below
+ * rather than this macro directly.
+ */
+#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
+({ \
+	ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
+	for (;;) { \
+		(val) = op(addr); \
+		if (cond) \
+			break; \
+		if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
+			(val) = op(addr); \
+			break; \
+		} \
+		if (delay_us) \
+			udelay(delay_us);	\
+	} \
+	(cond) ? 0 : -ETIMEDOUT; \
+})
+
+
+#define readl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readl, addr, val, cond, delay_us, timeout_us)
+
+#define readl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readl, addr, val, cond, delay_us, timeout_us)
+
+#define readb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readb, addr, val, cond, delay_us, timeout_us)
+
+#define readb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readb, addr, val, cond, delay_us, timeout_us)
+
+#define readw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readw, addr, val, cond, delay_us, timeout_us)
+
+#define readw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readw, addr, val, cond, delay_us, timeout_us)
+
+#define readq_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readq, addr, val, cond, delay_us, timeout_us)
+
+#define readq_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readq, addr, val, cond, delay_us, timeout_us)
+
+#define readb_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readb_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define readb_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readb_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define readw_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readw_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define readw_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readw_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define readl_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readl_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define readq_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(readq_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define readq_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(readq_relaxed, addr, val, cond, delay_us, timeout_us)
+
+#define ioread8_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(ioread8, addr, val, cond, delay_us, timeout_us)
+
+#define ioread8_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(ioread8, addr, val, cond, delay_us, timeout_us)
+
+#define ioread16_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(ioread16, addr, val, cond, delay_us, timeout_us)
+
+#define ioread16_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(ioread16, addr, val, cond, delay_us, timeout_us)
+
+#define ioread16be_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(ioread16be, addr, val, cond, delay_us, timeout_us)
+
+#define ioread16be_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(ioread16be, addr, val, cond, delay_us, timeout_us)
+
+#define ioread32_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(ioread32, addr, val, cond, delay_us, timeout_us)
+
+#define ioread32_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(ioread32, addr, val, cond, delay_us, timeout_us)
+
+#define ioread32b3_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(ioread32b3, addr, val, cond, delay_us, timeout_us)
+
+#define ioread32b3_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(ioread32b3, addr, val, cond, delay_us, timeout_us)
+
+#define inb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(inb, addr, val, cond, delay_us, timeout_us)
+
+#define inb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(inb, addr, val, cond, delay_us, timeout_us)
+
+#define inb_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(inb_p, addr, val, cond, delay_us, timeout_us)
+
+#define inb_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(inb_p, addr, val, cond, delay_us, timeout_us)
+
+#define inw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(inw, addr, val, cond, delay_us, timeout_us)
+
+#define inw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(inw, addr, val, cond, delay_us, timeout_us)
+
+#define inw_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(inw_p, addr, val, cond, delay_us, timeout_us)
+
+#define inw_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(inw_p, addr, val, cond, delay_us, timeout_us)
+
+#define inw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(inw, addr, val, cond, delay_us, timeout_us)
+
+#define inw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(inw, addr, val, cond, delay_us, timeout_us)
+
+#define inl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(inl, addr, val, cond, delay_us, timeout_us)
+
+#define inl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(inl, addr, val, cond, delay_us, timeout_us)
+
+#define inl_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout(inl_p, addr, val, cond, delay_us, timeout_us)
+
+#define inl_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
+	readx_poll_timeout_atomic(inl_p, addr, val, cond, delay_us, timeout_us)
+
+
+#endif /* _LINUX_IOPOLL_H */
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  2014-10-29 21:13 [PATCH v7 0/2] iommu/arm-smmu: hard iova_to_phys Mitchel Humpherys
  2014-10-29 21:13 ` [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
@ 2014-10-29 21:13 ` Mitchel Humpherys
  2014-10-30 11:38   ` Will Deacon
  2015-01-20 14:16   ` Will Deacon
  1 sibling, 2 replies; 9+ messages in thread
From: Mitchel Humpherys @ 2014-10-29 21:13 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
---
Changes since v6:
  - added missing lock
  - fixed physical address mask
---
 drivers/iommu/arm-smmu.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 79 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 60558f7949..c6f96ba3b1 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -36,6 +36,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iommu.h>
+#include <linux/iopoll.h>
 #include <linux/mm.h>
 #include <linux/module.h>
 #include <linux/of.h>
@@ -140,6 +141,7 @@
 #define ID0_S2TS			(1 << 29)
 #define ID0_NTS				(1 << 28)
 #define ID0_SMS				(1 << 27)
+#define ID0_ATOSNS			(1 << 26)
 #define ID0_PTFS_SHIFT			24
 #define ID0_PTFS_MASK			0x2
 #define ID0_PTFS_V8_ONLY		0x2
@@ -233,11 +235,16 @@
 #define ARM_SMMU_CB_TTBR0_HI		0x24
 #define ARM_SMMU_CB_TTBCR		0x30
 #define ARM_SMMU_CB_S1_MAIR0		0x38
+#define ARM_SMMU_CB_PAR_LO		0x50
+#define ARM_SMMU_CB_PAR_HI		0x54
 #define ARM_SMMU_CB_FSR			0x58
 #define ARM_SMMU_CB_FAR_LO		0x60
 #define ARM_SMMU_CB_FAR_HI		0x64
 #define ARM_SMMU_CB_FSYNR0		0x68
 #define ARM_SMMU_CB_S1_TLBIASID		0x610
+#define ARM_SMMU_CB_ATS1PR_LO		0x800
+#define ARM_SMMU_CB_ATS1PR_HI		0x804
+#define ARM_SMMU_CB_ATSR		0x8f0
 
 #define SCTLR_S1_ASIDPNE		(1 << 12)
 #define SCTLR_CFCFG			(1 << 7)
@@ -249,6 +256,10 @@
 #define SCTLR_M				(1 << 0)
 #define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F			(1 << 0)
+
+#define ATSR_ACTIVE			(1 << 0)
+
 #define RESUME_RETRY			(0 << 0)
 #define RESUME_TERMINATE		(1 << 0)
 
@@ -366,6 +377,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1		(1 << 2)
 #define ARM_SMMU_FEAT_TRANS_S2		(1 << 3)
 #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
+#define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -1524,7 +1536,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 	return ret ? 0 : size;
 }
 
-static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
 					 dma_addr_t iova)
 {
 	pgd_t *pgdp, pgd;
@@ -1557,6 +1569,67 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
 	return __pfn_to_phys(pte_pfn(pte)) | (iova & ~PAGE_MASK);
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+					dma_addr_t iova)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct device *dev = smmu->dev;
+	void __iomem *cb_base;
+	u32 tmp;
+	u64 phys;
+	unsigned long flags;
+
+	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
+
+	spin_lock_irqsave(&smmu_domain->lock, flags);
+
+	if (smmu->version == 1) {
+		u32 reg = iova & ~0xfff;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+	} else {
+		u32 reg = iova & ~0xfff;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+		reg = (iova & ~0xfff) >> 32;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+	}
+
+	if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
+				!(tmp & ATSR_ACTIVE), 5, 50)) {
+		spin_unlock_irqrestore(&smmu_domain->lock, flags);
+		dev_err(dev,
+			"iova to phys timed out on 0x%pa. Falling back to software table walk.\n",
+			&iova);
+		return arm_smmu_iova_to_phys_soft(domain, iova);
+	}
+
+	phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+	phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32;
+
+	spin_unlock_irqrestore(&smmu_domain->lock, flags);
+
+	if (phys & CB_PAR_F) {
+		dev_err(dev, "translation fault!\n");
+		dev_err(dev, "PAR = 0x%llx\n", phys);
+		phys = 0;
+	} else {
+		phys = (phys & (PHYS_MASK & ~0xfffUL)) | (iova & 0xfff);
+	}
+
+	return phys;
+}
+
+static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+					dma_addr_t iova)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+
+	if (smmu_domain->smmu->features & ARM_SMMU_FEAT_TRANS_OPS)
+		return arm_smmu_iova_to_phys_hard(domain, iova);
+	return arm_smmu_iova_to_phys_soft(domain, iova);
+}
+
 static bool arm_smmu_capable(enum iommu_cap cap)
 {
 	switch (cap) {
@@ -1776,6 +1849,11 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		return -ENODEV;
 	}
 
+	if (smmu->version == 1 || (!(id & ID0_ATOSNS) && (id & ID0_S1TS))) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_OPS;
+		dev_notice(smmu->dev, "\taddress translation ops\n");
+	}
+
 	if (id & ID0_CTTW) {
 		smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
 		dev_notice(smmu->dev, "\tcoherent table walk\n");
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  2014-10-29 21:13 ` [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
@ 2014-10-30 11:38   ` Will Deacon
  2015-01-20 14:16   ` Will Deacon
  1 sibling, 0 replies; 9+ messages in thread
From: Will Deacon @ 2014-10-30 11:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Oct 29, 2014 at 09:13:40PM +0000, Mitchel Humpherys wrote:
> Currently, we provide the iommu_ops.iova_to_phys service by doing a
> table walk in software to translate IO virtual addresses to physical
> addresses. On SMMUs that support it, it can be useful to ask the SMMU
> itself to do the translation. This can be used to warm the TLBs for an
> SMMU. It can also be useful for testing and hardware validation.
> 
> Since the address translation registers are optional on SMMUv2, only
> enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
> and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.
> 
> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
> ---
> Changes since v6:
>   - added missing lock
>   - fixed physical address mask
> ---
>  drivers/iommu/arm-smmu.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 79 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 60558f7949..c6f96ba3b1 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -36,6 +36,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/iommu.h>
> +#include <linux/iopoll.h>
>  #include <linux/mm.h>
>  #include <linux/module.h>
>  #include <linux/of.h>
> @@ -140,6 +141,7 @@
>  #define ID0_S2TS			(1 << 29)
>  #define ID0_NTS				(1 << 28)
>  #define ID0_SMS				(1 << 27)
> +#define ID0_ATOSNS			(1 << 26)
>  #define ID0_PTFS_SHIFT			24
>  #define ID0_PTFS_MASK			0x2
>  #define ID0_PTFS_V8_ONLY		0x2
> @@ -233,11 +235,16 @@
>  #define ARM_SMMU_CB_TTBR0_HI		0x24
>  #define ARM_SMMU_CB_TTBCR		0x30
>  #define ARM_SMMU_CB_S1_MAIR0		0x38
> +#define ARM_SMMU_CB_PAR_LO		0x50
> +#define ARM_SMMU_CB_PAR_HI		0x54
>  #define ARM_SMMU_CB_FSR			0x58
>  #define ARM_SMMU_CB_FAR_LO		0x60
>  #define ARM_SMMU_CB_FAR_HI		0x64
>  #define ARM_SMMU_CB_FSYNR0		0x68
>  #define ARM_SMMU_CB_S1_TLBIASID		0x610
> +#define ARM_SMMU_CB_ATS1PR_LO		0x800
> +#define ARM_SMMU_CB_ATS1PR_HI		0x804
> +#define ARM_SMMU_CB_ATSR		0x8f0
>  
>  #define SCTLR_S1_ASIDPNE		(1 << 12)
>  #define SCTLR_CFCFG			(1 << 7)
> @@ -249,6 +256,10 @@
>  #define SCTLR_M				(1 << 0)
>  #define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
>  
> +#define CB_PAR_F			(1 << 0)
> +
> +#define ATSR_ACTIVE			(1 << 0)
> +
>  #define RESUME_RETRY			(0 << 0)
>  #define RESUME_TERMINATE		(1 << 0)
>  
> @@ -366,6 +377,7 @@ struct arm_smmu_device {
>  #define ARM_SMMU_FEAT_TRANS_S1		(1 << 2)
>  #define ARM_SMMU_FEAT_TRANS_S2		(1 << 3)
>  #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
> +#define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
>  	u32				features;
>  
>  #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> @@ -1524,7 +1536,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
>  	return ret ? 0 : size;
>  }
>  
> -static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
> +static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain,
>  					 dma_addr_t iova)
>  {
>  	pgd_t *pgdp, pgd;
> @@ -1557,6 +1569,67 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
>  	return __pfn_to_phys(pte_pfn(pte)) | (iova & ~PAGE_MASK);
>  }
>  
> +static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
> +					dma_addr_t iova)
> +{
> +	struct arm_smmu_domain *smmu_domain = domain->priv;
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	struct device *dev = smmu->dev;
> +	void __iomem *cb_base;
> +	u32 tmp;
> +	u64 phys;
> +	unsigned long flags;
> +
> +	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
> +
> +	spin_lock_irqsave(&smmu_domain->lock, flags);
> +
> +	if (smmu->version == 1) {
> +		u32 reg = iova & ~0xfff;
> +		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
> +	} else {
> +		u32 reg = iova & ~0xfff;
> +		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
> +		reg = (iova & ~0xfff) >> 32;
> +		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
> +	}
> +
> +	if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
> +				!(tmp & ATSR_ACTIVE), 5, 50)) {
> +		spin_unlock_irqrestore(&smmu_domain->lock, flags);
> +		dev_err(dev,
> +			"iova to phys timed out on 0x%pa. Falling back to software table walk.\n",
> +			&iova);
> +		return arm_smmu_iova_to_phys_soft(domain, iova);
> +	}
> +
> +	phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
> +	phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32;
> +
> +	spin_unlock_irqrestore(&smmu_domain->lock, flags);
> +
> +	if (phys & CB_PAR_F) {
> +		dev_err(dev, "translation fault!\n");
> +		dev_err(dev, "PAR = 0x%llx\n", phys);
> +		phys = 0;
> +	} else {
> +		phys = (phys & (PHYS_MASK & ~0xfffUL)) | (iova & 0xfff);

That probably wants to be ~0xfffULL for LPAE kernels.

With that:

  Acked-by: Will Deacon <will.deacon@arm.com>

I'm not sure how we should merge this, given the dependency on patch 1. If
you can get some acks there, then we can work out whether to take this via
the IOMMU tree or elsewhere.

Cheers,

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-29 21:13 ` [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
@ 2014-10-30 11:41   ` Will Deacon
  2014-10-30 12:00     ` Arnd Bergmann
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2014-10-30 11:41 UTC (permalink / raw)
  To: linux-arm-kernel

[adding Arnd, since he reviewed an earlier version of this and we could do
 with his ack]

On Wed, Oct 29, 2014 at 09:13:39PM +0000, Mitchel Humpherys wrote:
> From: Matt Wagantall <mattw@codeaurora.org>
> 
> It is sometimes necessary to poll a memory-mapped register until its value
> satisfies some condition. Introduce a family of convenience macros that do
> this. Tight-looping, sleeping, and timing out can all be accomplished using
> these macros.
> 
> Cc: Thierry Reding <thierry.reding@gmail.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
> ---
> Changes since v6:
>   - No changes. Resending due to changes in the the next patch in the series.
> ---
>  include/linux/iopoll.h | 213 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 213 insertions(+)
>  create mode 100644 include/linux/iopoll.h
> 
> diff --git a/include/linux/iopoll.h b/include/linux/iopoll.h
> new file mode 100644
> index 0000000000..21dd41942b
> --- /dev/null
> +++ b/include/linux/iopoll.h
> @@ -0,0 +1,213 @@
> +/*
> + * Copyright (c) 2012-2014 The Linux Foundation. All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 and
> + * only version 2 as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + */
> +
> +#ifndef _LINUX_IOPOLL_H
> +#define _LINUX_IOPOLL_H
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/hrtimer.h>
> +#include <linux/delay.h>
> +#include <linux/errno.h>
> +#include <linux/io.h>
> +
> +/**
> + * readx_poll_timeout - Periodically poll an address until a condition is met or a timeout occurs
> + * @op: accessor function (takes @addr as its only argument)
> + * @addr: Address to poll
> + * @val: Variable to read the value into
> + * @cond: Break condition (usually involving @val)
> + * @sleep_us: Maximum time to sleep between reads in us (0 tight-loops)
> + * @timeout_us: Timeout in us, 0 means never timeout
> + *
> + * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
> + * case, the last read value at @addr is stored in @val. Must not
> + * be called from atomic context if sleep_us or timeout_us are used.
> + *
> + * Generally you'll want to use one of the specialized macros defined below
> + * rather than this macro directly.
> + */
> +#define readx_poll_timeout(op, addr, val, cond, sleep_us, timeout_us)	\
> +({ \
> +	ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
> +	might_sleep_if(sleep_us); \
> +	for (;;) { \
> +		(val) = op(addr); \
> +		if (cond) \
> +			break; \
> +		if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
> +			(val) = op(addr); \
> +			break; \
> +		} \
> +		if (sleep_us) \
> +			usleep_range((sleep_us >> 2) + 1, sleep_us); \
> +	} \
> +	(cond) ? 0 : -ETIMEDOUT; \
> +})
> +
> +/**
> + * readx_poll_timeout_atomic - Periodically poll an address until a condition is met or a timeout occurs
> + * @op: accessor function (takes @addr as its only argument)
> + * @addr: Address to poll
> + * @val: Variable to read the value into
> + * @cond: Break condition (usually involving @val)
> + * @delay_us: Time to udelay between reads in us (0 tight-loops)
> + * @timeout_us: Timeout in us, 0 means never timeout
> + *
> + * Returns 0 on success and -ETIMEDOUT upon a timeout. In either
> + * case, the last read value at @addr is stored in @val.
> + *
> + * Generally you'll want to use one of the specialized macros defined below
> + * rather than this macro directly.
> + */
> +#define readx_poll_timeout_atomic(op, addr, val, cond, delay_us, timeout_us) \
> +({ \
> +	ktime_t timeout = ktime_add_us(ktime_get(), timeout_us); \
> +	for (;;) { \
> +		(val) = op(addr); \
> +		if (cond) \
> +			break; \
> +		if (timeout_us && ktime_compare(ktime_get(), timeout) > 0) { \
> +			(val) = op(addr); \
> +			break; \
> +		} \
> +		if (delay_us) \
> +			udelay(delay_us);	\
> +	} \
> +	(cond) ? 0 : -ETIMEDOUT; \
> +})
> +
> +
> +#define readl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readl, addr, val, cond, delay_us, timeout_us)
> +
> +#define readl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readl, addr, val, cond, delay_us, timeout_us)
> +
> +#define readb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readb, addr, val, cond, delay_us, timeout_us)
> +
> +#define readb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readb, addr, val, cond, delay_us, timeout_us)
> +
> +#define readw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readw, addr, val, cond, delay_us, timeout_us)
> +
> +#define readw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readw, addr, val, cond, delay_us, timeout_us)
> +
> +#define readq_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readq, addr, val, cond, delay_us, timeout_us)
> +
> +#define readq_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readq, addr, val, cond, delay_us, timeout_us)
> +
> +#define readb_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readb_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define readb_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readb_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define readw_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readw_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define readw_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readw_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define readl_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define readl_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readl_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define readq_relaxed_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(readq_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define readq_relaxed_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(readq_relaxed, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread8_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(ioread8, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread8_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(ioread8, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread16_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(ioread16, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread16_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(ioread16, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread16be_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(ioread16be, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread16be_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(ioread16be, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread32_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(ioread32, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread32_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(ioread32, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread32b3_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(ioread32b3, addr, val, cond, delay_us, timeout_us)
> +
> +#define ioread32b3_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(ioread32b3, addr, val, cond, delay_us, timeout_us)
> +
> +#define inb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(inb, addr, val, cond, delay_us, timeout_us)
> +
> +#define inb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(inb, addr, val, cond, delay_us, timeout_us)
> +
> +#define inb_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(inb_p, addr, val, cond, delay_us, timeout_us)
> +
> +#define inb_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(inb_p, addr, val, cond, delay_us, timeout_us)
> +
> +#define inw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(inw, addr, val, cond, delay_us, timeout_us)
> +
> +#define inw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(inw, addr, val, cond, delay_us, timeout_us)
> +
> +#define inw_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(inw_p, addr, val, cond, delay_us, timeout_us)
> +
> +#define inw_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(inw_p, addr, val, cond, delay_us, timeout_us)
> +
> +#define inw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(inw, addr, val, cond, delay_us, timeout_us)
> +
> +#define inw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(inw, addr, val, cond, delay_us, timeout_us)
> +
> +#define inl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(inl, addr, val, cond, delay_us, timeout_us)
> +
> +#define inl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(inl, addr, val, cond, delay_us, timeout_us)
> +
> +#define inl_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout(inl_p, addr, val, cond, delay_us, timeout_us)
> +
> +#define inl_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> +	readx_poll_timeout_atomic(inl_p, addr, val, cond, delay_us, timeout_us)
> +
> +
> +#endif /* _LINUX_IOPOLL_H */
> -- 
> Qualcomm Innovation Center, Inc.
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-30 11:41   ` Will Deacon
@ 2014-10-30 12:00     ` Arnd Bergmann
  2014-10-30 18:17       ` Mitchel Humpherys
  0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2014-10-30 12:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 30 October 2014 11:41:00 Will Deacon wrote:
> > +
> > +#define readl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(readl, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define readl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(readl, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define readb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(readb, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define readb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(readb, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define readw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(readw, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define readw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(readw, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define readq_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(readq, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define readq_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(readq, addr, val, cond, delay_us, timeout_us)

Sort these by size (b, w, l, q) maybe?

> > +#define ioread32_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(ioread32, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define ioread32_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(ioread32, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define ioread32b3_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(ioread32b3, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define ioread32b3_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(ioread32b3, addr, val, cond, delay_us, timeout_us)

What is ioread32b3?

> > +#define inb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(inb, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define inb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(inb, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define inb_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout(inb_p, addr, val, cond, delay_us, timeout_us)
> > +
> > +#define inb_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
> > +	readx_poll_timeout_atomic(inb_p, addr, val, cond, delay_us, timeout_us)

I would leave out the _p variants, they are very rarely used anyway.

Looking at the long list, I wonder if we should really define each variant,
or just expect drivers to call readx_poll_timeout{,_atomic} directly and
pass whichever accessor they want.

	Arnd

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros
  2014-10-30 12:00     ` Arnd Bergmann
@ 2014-10-30 18:17       ` Mitchel Humpherys
  0 siblings, 0 replies; 9+ messages in thread
From: Mitchel Humpherys @ 2014-10-30 18:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Oct 30 2014 at 05:00:23 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 30 October 2014 11:41:00 Will Deacon wrote:
>> > +
>> > +#define readl_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(readl, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define readl_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(readl, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define readb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(readb, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define readb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(readb, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define readw_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(readw, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define readw_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(readw, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define readq_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(readq, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define readq_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(readq, addr, val, cond, delay_us, timeout_us)
>
> Sort these by size (b, w, l, q) maybe?

Sure

>
>> > +#define ioread32_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(ioread32, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define ioread32_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(ioread32, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define ioread32b3_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(ioread32b3, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define ioread32b3_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(ioread32b3, addr, val, cond, delay_us, timeout_us)
>
> What is ioread32b3?

Looks like it's a... typo!  It was supposed to be ioread32be.

>
>> > +#define inb_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(inb, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define inb_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(inb, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define inb_p_poll_timeout(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout(inb_p, addr, val, cond, delay_us, timeout_us)
>> > +
>> > +#define inb_p_poll_timeout_atomic(addr, val, cond, delay_us, timeout_us) \
>> > +	readx_poll_timeout_atomic(inb_p, addr, val, cond, delay_us, timeout_us)
>
> I would leave out the _p variants, they are very rarely used anyway.
>
> Looking at the long list, I wonder if we should really define each variant,
> or just expect drivers to call readx_poll_timeout{,_atomic} directly and
> pass whichever accessor they want.

That sounds reasonable although I think we'd at least want to include
the readX family of functions.


-Mitch

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  2014-10-29 21:13 ` [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
  2014-10-30 11:38   ` Will Deacon
@ 2015-01-20 14:16   ` Will Deacon
  2015-01-20 19:45     ` Mitchel Humpherys
  1 sibling, 1 reply; 9+ messages in thread
From: Will Deacon @ 2015-01-20 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hey Mitch,

On Wed, Oct 29, 2014 at 09:13:40PM +0000, Mitchel Humpherys wrote:
> Currently, we provide the iommu_ops.iova_to_phys service by doing a
> table walk in software to translate IO virtual addresses to physical
> addresses. On SMMUs that support it, it can be useful to ask the SMMU
> itself to do the translation. This can be used to warm the TLBs for an
> SMMU. It can also be useful for testing and hardware validation.
> 
> Since the address translation registers are optional on SMMUv2, only
> enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
> and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.
> 
> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
> ---
> Changes since v6:
>   - added missing lock
>   - fixed physical address mask

I had a go at rebasing this onto my current queue, but ended up making
quite a few changes. Can you take a look at the result, please?

Patch below (also on my for-joerg/arm-smmu/updates branch).

Thanks,

Will

--->8

>From 859a732e4f713270152c78df6e09accbde006734 Mon Sep 17 00:00:00 2001
From: Mitchel Humpherys <mitchelh@codeaurora.org>
Date: Wed, 29 Oct 2014 21:13:40 +0000
Subject: [PATCH] iommu/arm-smmu: add support for iova_to_phys through ATS1PR

Currently, we provide the iommu_ops.iova_to_phys service by doing a
table walk in software to translate IO virtual addresses to physical
addresses. On SMMUs that support it, it can be useful to ask the SMMU
itself to do the translation. This can be used to warm the TLBs for an
SMMU. It can also be useful for testing and hardware validation.

Since the address translation registers are optional on SMMUv2, only
enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.

Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
[will: reworked on top of generic iopgtbl changes]
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/iommu/arm-smmu.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 67 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 006f006c35e9..1d6d43bb3395 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -34,6 +34,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/iommu.h>
+#include <linux/iopoll.h>
 #include <linux/module.h>
 #include <linux/of.h>
 #include <linux/pci.h>
@@ -100,6 +101,7 @@
 #define ID0_S2TS			(1 << 29)
 #define ID0_NTS				(1 << 28)
 #define ID0_SMS				(1 << 27)
+#define ID0_ATOSNS			(1 << 26)
 #define ID0_CTTW			(1 << 14)
 #define ID0_NUMIRPT_SHIFT		16
 #define ID0_NUMIRPT_MASK		0xff
@@ -189,6 +191,8 @@
 #define ARM_SMMU_CB_TTBCR		0x30
 #define ARM_SMMU_CB_S1_MAIR0		0x38
 #define ARM_SMMU_CB_S1_MAIR1		0x3c
+#define ARM_SMMU_CB_PAR_LO		0x50
+#define ARM_SMMU_CB_PAR_HI		0x54
 #define ARM_SMMU_CB_FSR			0x58
 #define ARM_SMMU_CB_FAR_LO		0x60
 #define ARM_SMMU_CB_FAR_HI		0x64
@@ -198,6 +202,9 @@
 #define ARM_SMMU_CB_S1_TLBIVAL		0x620
 #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
 #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
+#define ARM_SMMU_CB_ATS1PR_LO		0x800
+#define ARM_SMMU_CB_ATS1PR_HI		0x804
+#define ARM_SMMU_CB_ATSR		0x8f0
 
 #define SCTLR_S1_ASIDPNE		(1 << 12)
 #define SCTLR_CFCFG			(1 << 7)
@@ -209,6 +216,10 @@
 #define SCTLR_M				(1 << 0)
 #define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
 
+#define CB_PAR_F			(1 << 0)
+
+#define ATSR_ACTIVE			(1 << 0)
+
 #define RESUME_RETRY			(0 << 0)
 #define RESUME_TERMINATE		(1 << 0)
 
@@ -282,6 +293,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_S1		(1 << 2)
 #define ARM_SMMU_FEAT_TRANS_S2		(1 << 3)
 #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
+#define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -1220,8 +1232,52 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
 	return ret;
 }
 
+static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
+					      dma_addr_t iova)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct io_pgtable_ops *ops= smmu_domain->pgtbl_ops;
+	struct device *dev = smmu->dev;
+	void __iomem *cb_base;
+	u32 tmp;
+	u64 phys;
+
+	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
+
+	if (smmu->version == 1) {
+		u32 reg = iova & ~0xfff;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+	} else {
+		u32 reg = iova & ~0xfff;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO);
+		reg = (iova & ~0xfff) >> 32;
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI);
+	}
+
+	if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
+				      !(tmp & ATSR_ACTIVE), 5, 50)) {
+		dev_err(dev,
+			"iova to phys timed out on 0x%pad. Falling back to software table walk.\n",
+			&iova);
+		return ops->iova_to_phys(ops, iova);
+	}
+
+	phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
+	phys |= ((u64)readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32;
+
+	if (phys & CB_PAR_F) {
+		dev_err(dev, "translation fault!\n");
+		dev_err(dev, "PAR = 0x%llx\n", phys);
+		return 0;
+	}
+
+	return (phys & GENMASK_ULL(39, 12)) | (iova & 0xfff);
+}
+
 static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
-					 dma_addr_t iova)
+					dma_addr_t iova)
 {
 	phys_addr_t ret;
 	unsigned long flags;
@@ -1232,8 +1288,12 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
 		return 0;
 
 	spin_lock_irqsave(&smmu_domain->pgtbl_lock, flags);
-	ret = ops->iova_to_phys(ops, iova);
+	if (smmu_domain->smmu->features & ARM_SMMU_FEAT_TRANS_OPS)
+		ret = arm_smmu_iova_to_phys_hard(domain, iova);
+	else
+		ret = ops->iova_to_phys(ops, iova);
 	spin_unlock_irqrestore(&smmu_domain->pgtbl_lock, flags);
+
 	return ret;
 }
 
@@ -1496,6 +1556,11 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		return -ENODEV;
 	}
 
+	if (smmu->version == 1 || (!(id & ID0_ATOSNS) && (id & ID0_S1TS))) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_OPS;
+		dev_notice(smmu->dev, "\taddress translation ops\n");
+	}
+
 	if (id & ID0_CTTW) {
 		smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
 		dev_notice(smmu->dev, "\tcoherent table walk\n");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR
  2015-01-20 14:16   ` Will Deacon
@ 2015-01-20 19:45     ` Mitchel Humpherys
  0 siblings, 0 replies; 9+ messages in thread
From: Mitchel Humpherys @ 2015-01-20 19:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 20 2015 at 06:16:43 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hey Mitch,
>
> On Wed, Oct 29, 2014 at 09:13:40PM +0000, Mitchel Humpherys wrote:
>> Currently, we provide the iommu_ops.iova_to_phys service by doing a
>> table walk in software to translate IO virtual addresses to physical
>> addresses. On SMMUs that support it, it can be useful to ask the SMMU
>> itself to do the translation. This can be used to warm the TLBs for an
>> SMMU. It can also be useful for testing and hardware validation.
>> 
>> Since the address translation registers are optional on SMMUv2, only
>> enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1
>> and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec.
>> 
>> Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org>
>> ---
>> Changes since v6:
>>   - added missing lock
>>   - fixed physical address mask
>
> I had a go at rebasing this onto my current queue, but ended up making
> quite a few changes. Can you take a look at the result, please?
>
> Patch below (also on my for-joerg/arm-smmu/updates branch).

The modified patch looks good to me.  Thanks!

-Mitch


-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-01-20 19:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-29 21:13 [PATCH v7 0/2] iommu/arm-smmu: hard iova_to_phys Mitchel Humpherys
2014-10-29 21:13 ` [PATCH v7 1/2] iopoll: Introduce memory-mapped IO polling macros Mitchel Humpherys
2014-10-30 11:41   ` Will Deacon
2014-10-30 12:00     ` Arnd Bergmann
2014-10-30 18:17       ` Mitchel Humpherys
2014-10-29 21:13 ` [PATCH v7 2/2] iommu/arm-smmu: add support for iova_to_phys through ATS1PR Mitchel Humpherys
2014-10-30 11:38   ` Will Deacon
2015-01-20 14:16   ` Will Deacon
2015-01-20 19:45     ` Mitchel Humpherys

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).