All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Miscellaneous ARM SMMU patches
@ 2016-01-26 18:06 ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

Hi Will,

Here's my current "miscellaneous SMMU enhancements" branch for your
consideration. Patch #1 solves a subtle corner case that cropped up
while exercising stage 1 context formats on the DMA330-MMU500 model;
#3 will be wanted fairly soon for DMA ops integration so may as well
get some exposure now; #2 and #4 are more nice-to-haves.

Thanks,
Robin.

Robin Murphy (4):
  iommu/arm-smmu: Treat all device transactions as unprivileged
  iommu/arm-smmu: Allow disabling unmatched stream bypass
  iommu/arm-smmu: Support DMA-API domains
  iommu/arm-smmu: Use per-context TLB sync as appropriate

 drivers/iommu/arm-smmu-v3.c | 10 +++++-
 drivers/iommu/arm-smmu.c    | 79 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 69 insertions(+), 20 deletions(-)

-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/4] Miscellaneous ARM SMMU patches
@ 2016-01-26 18:06 ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

Here's my current "miscellaneous SMMU enhancements" branch for your
consideration. Patch #1 solves a subtle corner case that cropped up
while exercising stage 1 context formats on the DMA330-MMU500 model;
#3 will be wanted fairly soon for DMA ops integration so may as well
get some exposure now; #2 and #4 are more nice-to-haves.

Thanks,
Robin.

Robin Murphy (4):
  iommu/arm-smmu: Treat all device transactions as unprivileged
  iommu/arm-smmu: Allow disabling unmatched stream bypass
  iommu/arm-smmu: Support DMA-API domains
  iommu/arm-smmu: Use per-context TLB sync as appropriate

 drivers/iommu/arm-smmu-v3.c | 10 +++++-
 drivers/iommu/arm-smmu.c    | 79 ++++++++++++++++++++++++++++++++++-----------
 2 files changed, 69 insertions(+), 20 deletions(-)

-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as unprivileged
  2016-01-26 18:06 ` Robin Murphy
@ 2016-01-26 18:06     ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

The IOMMU API has no concept of privilege so assumes all devices and
mappings are equal, and indeed most non-CPU master devices on an AMBA
interconnect make little use of the attribute bits on the bus thus by
default perform unprivileged data accesses.

Some devices, however, believe themselves more equal than others, such
as programmable DMA controllers whose 'master' thread issues bus
transactions marked as privileged instruction fetches, while the data
accesses of its channel threads (under the control of Linux, at least)
are marked as unprivileged. This poses a problem for implementing the
DMA API on an IOMMU conforming to ARM VMSAv8, under which a page that is
unprivileged-writeable is also implicitly privileged-execute-never.
Given that, there is no one set of attributes with which iommu_map() can
implement, say, dma_alloc_coherent() that will allow every possible type
of access without something running into unexecepted permission faults.

Fortunately the SMMU architecture provides a means to mitigate such
issues by overriding the incoming attributes of a transaction; make use
of that to strip the privileged/unprivileged status off incoming
transactions, leaving just the instruction/data dichotomy which the
IOMMU API does at least understand; Four states good, two states better.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 59ee4b8..1f9093d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -167,6 +167,9 @@
 #define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
 #define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
 
+#define S2CR_PRIVCFG_SHIFT		24
+#define S2CR_PRIVCFG_UNPRIV		(2 << S2CR_PRIVCFG_SHIFT)
+
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
 #define CBAR_VMID_SHIFT			0
@@ -1083,7 +1086,7 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
 		u32 idx, s2cr;
 
 		idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
-		s2cr = S2CR_TYPE_TRANS |
+		s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
 		       (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
 		writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
 	}
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as unprivileged
@ 2016-01-26 18:06     ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

The IOMMU API has no concept of privilege so assumes all devices and
mappings are equal, and indeed most non-CPU master devices on an AMBA
interconnect make little use of the attribute bits on the bus thus by
default perform unprivileged data accesses.

Some devices, however, believe themselves more equal than others, such
as programmable DMA controllers whose 'master' thread issues bus
transactions marked as privileged instruction fetches, while the data
accesses of its channel threads (under the control of Linux, at least)
are marked as unprivileged. This poses a problem for implementing the
DMA API on an IOMMU conforming to ARM VMSAv8, under which a page that is
unprivileged-writeable is also implicitly privileged-execute-never.
Given that, there is no one set of attributes with which iommu_map() can
implement, say, dma_alloc_coherent() that will allow every possible type
of access without something running into unexecepted permission faults.

Fortunately the SMMU architecture provides a means to mitigate such
issues by overriding the incoming attributes of a transaction; make use
of that to strip the privileged/unprivileged status off incoming
transactions, leaving just the instruction/data dichotomy which the
IOMMU API does at least understand; Four states good, two states better.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 59ee4b8..1f9093d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -167,6 +167,9 @@
 #define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
 #define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
 
+#define S2CR_PRIVCFG_SHIFT		24
+#define S2CR_PRIVCFG_UNPRIV		(2 << S2CR_PRIVCFG_SHIFT)
+
 /* Context bank attribute registers */
 #define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
 #define CBAR_VMID_SHIFT			0
@@ -1083,7 +1086,7 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
 		u32 idx, s2cr;
 
 		idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
-		s2cr = S2CR_TYPE_TRANS |
+		s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
 		       (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
 		writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
 	}
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] iommu/arm-smmu: Allow disabling unmatched stream bypass
  2016-01-26 18:06 ` Robin Murphy
@ 2016-01-26 18:06     ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
debugging/security feature so that unmatched stream IDs (i.e. devices
not attached to an IOMMU domain) may be configured to fault.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1f9093d..d1b7dc1 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -141,6 +141,8 @@
 #define ID2_PTFS_16K			(1 << 13)
 #define ID2_PTFS_64K			(1 << 14)
 
+#define sGFSR_USF			(1 << 1)
+
 /* Global TLB invalidation */
 #define ARM_SMMU_GR0_TLBIVMID		0x64
 #define ARM_SMMU_GR0_TLBIALLNSNH	0x68
@@ -263,6 +265,10 @@ static int force_stage;
 module_param_named(force_stage, force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
 	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
+static bool disable_bypass;
+module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_bypass,
+	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
 enum arm_smmu_arch_version {
 	ARM_SMMU_V1 = 1,
@@ -704,8 +710,12 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
 	if (!gfsr)
 		return IRQ_NONE;
 
-	dev_err_ratelimited(smmu->dev,
-		"Unexpected global fault, this could be serious\n");
+	if (gfsr & sGFSR_USF)
+		dev_err_ratelimited(smmu->dev, "Unmatched stream fault: ID 0x%x\n",
+				    gfsynr1 & SMR_ID_MASK);
+	else
+		dev_err_ratelimited(smmu->dev,
+			"Unexpected global fault, this could be serious\n");
 	dev_err_ratelimited(smmu->dev,
 		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
 		gfsr, gfsynr0, gfsynr1, gfsynr2);
@@ -1111,9 +1121,9 @@ static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
 	 */
 	for (i = 0; i < cfg->num_streamids; ++i) {
 		u32 idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
+		u32 reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			       gr0_base + ARM_SMMU_GR0_S2CR(idx));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(idx));
 	}
 
 	arm_smmu_master_free_smrs(smmu, cfg);
@@ -1476,11 +1486,11 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 
-	/* Mark all SMRn as invalid and all S2CRn as bypass */
+	/* Mark all SMRn as invalid and all S2CRn as bypass unless overridden */
+	reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 	for (i = 0; i < smmu->num_mapping_groups; ++i) {
 		writel_relaxed(0, gr0_base + ARM_SMMU_GR0_SMR(i));
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			gr0_base + ARM_SMMU_GR0_S2CR(i));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(i));
 	}
 
 	/* Make sure all context banks are disabled and clear CB_FSR  */
@@ -1502,8 +1512,12 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	/* Disable TLB broadcasting. */
 	reg |= (sCR0_VMIDPNE | sCR0_PTM);
 
-	/* Enable client access, but bypass when no mapping is found */
-	reg &= ~(sCR0_CLIENTPD | sCR0_USFCFG);
+	/* Enable client access, handling unmatched streams as appropriate */
+	reg &= ~sCR0_CLIENTPD;
+	if (disable_bypass)
+		reg |= sCR0_USFCFG;
+	else
+		reg &= ~sCR0_USFCFG;
 
 	/* Disable forced broadcasting */
 	reg &= ~sCR0_FB;
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/4] iommu/arm-smmu: Allow disabling unmatched stream bypass
@ 2016-01-26 18:06     ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
debugging/security feature so that unmatched stream IDs (i.e. devices
not attached to an IOMMU domain) may be configured to fault.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 32 +++++++++++++++++++++++---------
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1f9093d..d1b7dc1 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -141,6 +141,8 @@
 #define ID2_PTFS_16K			(1 << 13)
 #define ID2_PTFS_64K			(1 << 14)
 
+#define sGFSR_USF			(1 << 1)
+
 /* Global TLB invalidation */
 #define ARM_SMMU_GR0_TLBIVMID		0x64
 #define ARM_SMMU_GR0_TLBIALLNSNH	0x68
@@ -263,6 +265,10 @@ static int force_stage;
 module_param_named(force_stage, force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
 	"Force SMMU mappings to be installed@a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
+static bool disable_bypass;
+module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_bypass,
+	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
 enum arm_smmu_arch_version {
 	ARM_SMMU_V1 = 1,
@@ -704,8 +710,12 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
 	if (!gfsr)
 		return IRQ_NONE;
 
-	dev_err_ratelimited(smmu->dev,
-		"Unexpected global fault, this could be serious\n");
+	if (gfsr & sGFSR_USF)
+		dev_err_ratelimited(smmu->dev, "Unmatched stream fault: ID 0x%x\n",
+				    gfsynr1 & SMR_ID_MASK);
+	else
+		dev_err_ratelimited(smmu->dev,
+			"Unexpected global fault, this could be serious\n");
 	dev_err_ratelimited(smmu->dev,
 		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
 		gfsr, gfsynr0, gfsynr1, gfsynr2);
@@ -1111,9 +1121,9 @@ static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
 	 */
 	for (i = 0; i < cfg->num_streamids; ++i) {
 		u32 idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
+		u32 reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			       gr0_base + ARM_SMMU_GR0_S2CR(idx));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(idx));
 	}
 
 	arm_smmu_master_free_smrs(smmu, cfg);
@@ -1476,11 +1486,11 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 
-	/* Mark all SMRn as invalid and all S2CRn as bypass */
+	/* Mark all SMRn as invalid and all S2CRn as bypass unless overridden */
+	reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 	for (i = 0; i < smmu->num_mapping_groups; ++i) {
 		writel_relaxed(0, gr0_base + ARM_SMMU_GR0_SMR(i));
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			gr0_base + ARM_SMMU_GR0_S2CR(i));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(i));
 	}
 
 	/* Make sure all context banks are disabled and clear CB_FSR  */
@@ -1502,8 +1512,12 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	/* Disable TLB broadcasting. */
 	reg |= (sCR0_VMIDPNE | sCR0_PTM);
 
-	/* Enable client access, but bypass when no mapping is found */
-	reg &= ~(sCR0_CLIENTPD | sCR0_USFCFG);
+	/* Enable client access, handling unmatched streams as appropriate */
+	reg &= ~sCR0_CLIENTPD;
+	if (disable_bypass)
+		reg |= sCR0_USFCFG;
+	else
+		reg &= ~sCR0_USFCFG;
 
 	/* Disable forced broadcasting */
 	reg &= ~sCR0_FB;
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] iommu/arm-smmu: Support DMA-API domains
  2016-01-26 18:06 ` Robin Murphy
@ 2016-01-26 18:06     ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

With DMA mapping ops provided by the iommu-dma code, only a minimal
contribution from the IOMMU driver is needed to create a suitable
DMA-API domain for them to use. Implement this for the ARM SMMUs.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu-v3.c | 10 +++++++++-
 drivers/iommu/arm-smmu.c    | 10 +++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2087534..f003b3a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -21,6 +21,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/dma-iommu.h>
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
@@ -1396,7 +1397,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
 
 	/*
@@ -1408,6 +1409,12 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	if (!smmu_domain)
 		return NULL;
 
+	if (type == IOMMU_DOMAIN_DMA &&
+	    iommu_get_dma_cookie(&smmu_domain->domain)) {
+		kfree(smmu_domain);
+		return NULL;
+	}
+
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 	return &smmu_domain->domain;
@@ -1436,6 +1443,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
+	iommu_put_dma_cookie(domain);
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
 	/* Free the CD and ASID, if we allocated them */
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d1b7dc1..18e0e10 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -29,6 +29,7 @@
 #define pr_fmt(fmt) "arm-smmu: " fmt
 
 #include <linux/delay.h>
+#include <linux/dma-iommu.h>
 #include <linux/dma-mapping.h>
 #include <linux/err.h>
 #include <linux/interrupt.h>
@@ -976,7 +977,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
 	/*
 	 * Allocate the domain and initialise some of its data structures.
@@ -987,6 +988,12 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	if (!smmu_domain)
 		return NULL;
 
+	if (type == IOMMU_DOMAIN_DMA &&
+	    iommu_get_dma_cookie(&smmu_domain->domain)) {
+		kfree(smmu_domain);
+		return NULL;
+	}
+
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 
@@ -1001,6 +1008,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	 * Free the domain resources. We assume that all devices have
 	 * already been detached.
 	 */
+	iommu_put_dma_cookie(domain);
 	arm_smmu_destroy_domain_context(domain);
 	kfree(smmu_domain);
 }
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/4] iommu/arm-smmu: Support DMA-API domains
@ 2016-01-26 18:06     ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

With DMA mapping ops provided by the iommu-dma code, only a minimal
contribution from the IOMMU driver is needed to create a suitable
DMA-API domain for them to use. Implement this for the ARM SMMUs.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c | 10 +++++++++-
 drivers/iommu/arm-smmu.c    | 10 +++++++++-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2087534..f003b3a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -21,6 +21,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/dma-iommu.h>
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/iommu.h>
@@ -1396,7 +1397,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
 
 	/*
@@ -1408,6 +1409,12 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	if (!smmu_domain)
 		return NULL;
 
+	if (type == IOMMU_DOMAIN_DMA &&
+	    iommu_get_dma_cookie(&smmu_domain->domain)) {
+		kfree(smmu_domain);
+		return NULL;
+	}
+
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 	return &smmu_domain->domain;
@@ -1436,6 +1443,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
+	iommu_put_dma_cookie(domain);
 	free_io_pgtable_ops(smmu_domain->pgtbl_ops);
 
 	/* Free the CD and ASID, if we allocated them */
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d1b7dc1..18e0e10 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -29,6 +29,7 @@
 #define pr_fmt(fmt) "arm-smmu: " fmt
 
 #include <linux/delay.h>
+#include <linux/dma-iommu.h>
 #include <linux/dma-mapping.h>
 #include <linux/err.h>
 #include <linux/interrupt.h>
@@ -976,7 +977,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
 	struct arm_smmu_domain *smmu_domain;
 
-	if (type != IOMMU_DOMAIN_UNMANAGED)
+	if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
 		return NULL;
 	/*
 	 * Allocate the domain and initialise some of its data structures.
@@ -987,6 +988,12 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	if (!smmu_domain)
 		return NULL;
 
+	if (type == IOMMU_DOMAIN_DMA &&
+	    iommu_get_dma_cookie(&smmu_domain->domain)) {
+		kfree(smmu_domain);
+		return NULL;
+	}
+
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->pgtbl_lock);
 
@@ -1001,6 +1008,7 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	 * Free the domain resources. We assume that all devices have
 	 * already been detached.
 	 */
+	iommu_put_dma_cookie(domain);
 	arm_smmu_destroy_domain_context(domain);
 	kfree(smmu_domain);
 }
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
  2016-01-26 18:06 ` Robin Murphy
@ 2016-01-26 18:06     ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

TLB synchronisation is a mighty big hammmer to bring down on the
transaction stream, typically stalling all in-flight transactions until
the sync completes. Since in most cases (except at stage 2 on SMMUv1)
a per-context sync operation is available, prefer that over the global
operation when performing TLB maintenance for a single domain, to avoid
unecessarily disrupting ongoing traffic in other contexts.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 18e0e10..bf1895c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -219,6 +219,8 @@
 #define ARM_SMMU_CB_S1_TLBIVAL		0x620
 #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
 #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
+#define ARM_SMMU_CB_TLBSYNC		0x7f0
+#define ARM_SMMU_CB_TLBSTATUS		0x7f4
 #define ARM_SMMU_CB_ATS1PR		0x800
 #define ARM_SMMU_CB_ATSR		0x8f0
 
@@ -546,14 +548,22 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 }
 
 /* Wait for any pending TLB invalidations to complete */
-static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
+static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int cbndx)
 {
 	int count = 0;
-	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	void __iomem *base, __iomem *status;
 
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
-	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
-	       & sTLBGSTATUS_GSACTIVE) {
+	if (cbndx < 0) {
+		base = ARM_SMMU_GR0(smmu);
+		status = base + ARM_SMMU_GR0_sTLBGSTATUS;
+		writel_relaxed(0, base + ARM_SMMU_GR0_sTLBGSYNC);
+	} else {
+		base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cbndx);
+		status = base + ARM_SMMU_CB_TLBSTATUS;
+		writel_relaxed(0, base + ARM_SMMU_CB_TLBSYNC);
+	}
+
+	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
 		cpu_relax();
 		if (++count == TLB_LOOP_TIMEOUT) {
 			dev_err_ratelimited(smmu->dev,
@@ -567,7 +577,13 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 static void arm_smmu_tlb_sync(void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
-	__arm_smmu_tlb_sync(smmu_domain->smmu);
+	int cbndx = smmu_domain->cfg.cbndx;
+
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
+	    smmu_domain->smmu->version < ARM_SMMU_V2)
+		cbndx = -1;
+
+	__arm_smmu_tlb_sync(smmu_domain->smmu, cbndx);
 }
 
 static void arm_smmu_tlb_inv_context(void *cookie)
@@ -588,7 +604,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 			       base + ARM_SMMU_GR0_TLBIVMID);
 	}
 
-	__arm_smmu_tlb_sync(smmu);
+	__arm_smmu_tlb_sync(smmu, cfg->cbndx);
 }
 
 static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
@@ -1534,7 +1550,7 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	reg &= ~(sCR0_BSU_MASK << sCR0_BSU_SHIFT);
 
 	/* Push the button */
-	__arm_smmu_tlb_sync(smmu);
+	__arm_smmu_tlb_sync(smmu, -1);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 }
 
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
@ 2016-01-26 18:06     ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-01-26 18:06 UTC (permalink / raw)
  To: linux-arm-kernel

TLB synchronisation is a mighty big hammmer to bring down on the
transaction stream, typically stalling all in-flight transactions until
the sync completes. Since in most cases (except at stage 2 on SMMUv1)
a per-context sync operation is available, prefer that over the global
operation when performing TLB maintenance for a single domain, to avoid
unecessarily disrupting ongoing traffic in other contexts.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 18e0e10..bf1895c 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -219,6 +219,8 @@
 #define ARM_SMMU_CB_S1_TLBIVAL		0x620
 #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
 #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
+#define ARM_SMMU_CB_TLBSYNC		0x7f0
+#define ARM_SMMU_CB_TLBSTATUS		0x7f4
 #define ARM_SMMU_CB_ATS1PR		0x800
 #define ARM_SMMU_CB_ATSR		0x8f0
 
@@ -546,14 +548,22 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 }
 
 /* Wait for any pending TLB invalidations to complete */
-static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
+static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int cbndx)
 {
 	int count = 0;
-	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	void __iomem *base, __iomem *status;
 
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
-	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
-	       & sTLBGSTATUS_GSACTIVE) {
+	if (cbndx < 0) {
+		base = ARM_SMMU_GR0(smmu);
+		status = base + ARM_SMMU_GR0_sTLBGSTATUS;
+		writel_relaxed(0, base + ARM_SMMU_GR0_sTLBGSYNC);
+	} else {
+		base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cbndx);
+		status = base + ARM_SMMU_CB_TLBSTATUS;
+		writel_relaxed(0, base + ARM_SMMU_CB_TLBSYNC);
+	}
+
+	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
 		cpu_relax();
 		if (++count == TLB_LOOP_TIMEOUT) {
 			dev_err_ratelimited(smmu->dev,
@@ -567,7 +577,13 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 static void arm_smmu_tlb_sync(void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
-	__arm_smmu_tlb_sync(smmu_domain->smmu);
+	int cbndx = smmu_domain->cfg.cbndx;
+
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
+	    smmu_domain->smmu->version < ARM_SMMU_V2)
+		cbndx = -1;
+
+	__arm_smmu_tlb_sync(smmu_domain->smmu, cbndx);
 }
 
 static void arm_smmu_tlb_inv_context(void *cookie)
@@ -588,7 +604,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
 			       base + ARM_SMMU_GR0_TLBIVMID);
 	}
 
-	__arm_smmu_tlb_sync(smmu);
+	__arm_smmu_tlb_sync(smmu, cfg->cbndx);
 }
 
 static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
@@ -1534,7 +1550,7 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	reg &= ~(sCR0_BSU_MASK << sCR0_BSU_SHIFT);
 
 	/* Push the button */
-	__arm_smmu_tlb_sync(smmu);
+	__arm_smmu_tlb_sync(smmu, -1);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 }
 
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* RE: [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as unprivileged
  2016-01-26 18:06     ` Robin Murphy
@ 2016-01-27  6:00         ` Anup Patel
  -1 siblings, 0 replies; 32+ messages in thread
From: Anup Patel @ 2016-01-27  6:00 UTC (permalink / raw)
  To: Robin Murphy, will.deacon-5wv7dgnIgG8@public.gmane.org
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	bcm-kernel-feedback-list,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org

Hi Robin,

> -----Original Message-----
> From: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org [mailto:iommu-
> bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org] On Behalf Of Robin Murphy
> Sent: 26 January 2016 23:37
> To: will.deacon-5wv7dgnIgG8@public.gmane.org
> Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org; linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org;
> tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org
> Subject: [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as
> unprivileged
> 
> The IOMMU API has no concept of privilege so assumes all devices and
> mappings are equal, and indeed most non-CPU master devices on an AMBA
> interconnect make little use of the attribute bits on the bus thus by default
> perform unprivileged data accesses.
> 
> Some devices, however, believe themselves more equal than others, such as
> programmable DMA controllers whose 'master' thread issues bus transactions
> marked as privileged instruction fetches, while the data accesses of its channel
> threads (under the control of Linux, at least) are marked as unprivileged. This
> poses a problem for implementing the DMA API on an IOMMU conforming to
> ARM VMSAv8, under which a page that is unprivileged-writeable is also implicitly
> privileged-execute-never.
> Given that, there is no one set of attributes with which iommu_map() can
> implement, say, dma_alloc_coherent() that will allow every possible type of
> access without something running into unexecepted permission faults.
> 
> Fortunately the SMMU architecture provides a means to mitigate such issues by
> overriding the incoming attributes of a transaction; make use of that to strip the
> privileged/unprivileged status off incoming transactions, leaving just the
> instruction/data dichotomy which the IOMMU API does at least understand;
> Four states good, two states better.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index
> 59ee4b8..1f9093d 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -167,6 +167,9 @@
>  #define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
>  #define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
> 
> +#define S2CR_PRIVCFG_SHIFT		24
> +#define S2CR_PRIVCFG_UNPRIV		(2 << S2CR_PRIVCFG_SHIFT)
> +
>  /* Context bank attribute registers */
>  #define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
>  #define CBAR_VMID_SHIFT			0
> @@ -1083,7 +1086,7 @@ static int arm_smmu_domain_add_master(struct
> arm_smmu_domain *smmu_domain,
>  		u32 idx, s2cr;
> 
>  		idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
> -		s2cr = S2CR_TYPE_TRANS |
> +		s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
>  		       (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);

Treating all MMU master access as unprivileged sounds more
conservative. Alternate approach would be to treat instruction
fetches as data reads.

Regards,
Anup

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as unprivileged
@ 2016-01-27  6:00         ` Anup Patel
  0 siblings, 0 replies; 32+ messages in thread
From: Anup Patel @ 2016-01-27  6:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

> -----Original Message-----
> From: iommu-bounces at lists.linux-foundation.org [mailto:iommu-
> bounces at lists.linux-foundation.org] On Behalf Of Robin Murphy
> Sent: 26 January 2016 23:37
> To: will.deacon at arm.com
> Cc: iommu at lists.linux-foundation.org; linux-arm-kernel at lists.infradead.org;
> tchalamarla at caviumnetworks.com
> Subject: [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as
> unprivileged
> 
> The IOMMU API has no concept of privilege so assumes all devices and
> mappings are equal, and indeed most non-CPU master devices on an AMBA
> interconnect make little use of the attribute bits on the bus thus by default
> perform unprivileged data accesses.
> 
> Some devices, however, believe themselves more equal than others, such as
> programmable DMA controllers whose 'master' thread issues bus transactions
> marked as privileged instruction fetches, while the data accesses of its channel
> threads (under the control of Linux, at least) are marked as unprivileged. This
> poses a problem for implementing the DMA API on an IOMMU conforming to
> ARM VMSAv8, under which a page that is unprivileged-writeable is also implicitly
> privileged-execute-never.
> Given that, there is no one set of attributes with which iommu_map() can
> implement, say, dma_alloc_coherent() that will allow every possible type of
> access without something running into unexecepted permission faults.
> 
> Fortunately the SMMU architecture provides a means to mitigate such issues by
> overriding the incoming attributes of a transaction; make use of that to strip the
> privileged/unprivileged status off incoming transactions, leaving just the
> instruction/data dichotomy which the IOMMU API does at least understand;
> Four states good, two states better.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index
> 59ee4b8..1f9093d 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -167,6 +167,9 @@
>  #define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
>  #define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
> 
> +#define S2CR_PRIVCFG_SHIFT		24
> +#define S2CR_PRIVCFG_UNPRIV		(2 << S2CR_PRIVCFG_SHIFT)
> +
>  /* Context bank attribute registers */
>  #define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
>  #define CBAR_VMID_SHIFT			0
> @@ -1083,7 +1086,7 @@ static int arm_smmu_domain_add_master(struct
> arm_smmu_domain *smmu_domain,
>  		u32 idx, s2cr;
> 
>  		idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
> -		s2cr = S2CR_TYPE_TRANS |
> +		s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
>  		       (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);

Treating all MMU master access as unprivileged sounds more
conservative. Alternate approach would be to treat instruction
fetches as data reads.

Regards,
Anup

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] iommu/arm-smmu: Allow disabling unmatched stream bypass
  2016-01-26 18:06     ` Robin Murphy
@ 2016-02-09 14:06         ` Will Deacon
  -1 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:06 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

Hi Robin,

On Tue, Jan 26, 2016 at 06:06:35PM +0000, Robin Murphy wrote:
> Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
> debugging/security feature so that unmatched stream IDs (i.e. devices
> not attached to an IOMMU domain) may be configured to fault.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 32 +++++++++++++++++++++++---------
>  1 file changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 1f9093d..d1b7dc1 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -141,6 +141,8 @@
>  #define ID2_PTFS_16K			(1 << 13)
>  #define ID2_PTFS_64K			(1 << 14)
>  
> +#define sGFSR_USF			(1 << 1)
> +
>  /* Global TLB invalidation */
>  #define ARM_SMMU_GR0_TLBIVMID		0x64
>  #define ARM_SMMU_GR0_TLBIALLNSNH	0x68
> @@ -263,6 +265,10 @@ static int force_stage;
>  module_param_named(force_stage, force_stage, int, S_IRUGO);
>  MODULE_PARM_DESC(force_stage,
>  	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
> +static bool disable_bypass;
> +module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
> +MODULE_PARM_DESC(disable_bypass,
> +	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
>  
>  enum arm_smmu_arch_version {
>  	ARM_SMMU_V1 = 1,
> @@ -704,8 +710,12 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>  	if (!gfsr)
>  		return IRQ_NONE;
>  
> -	dev_err_ratelimited(smmu->dev,
> -		"Unexpected global fault, this could be serious\n");
> +	if (gfsr & sGFSR_USF)
> +		dev_err_ratelimited(smmu->dev, "Unmatched stream fault: ID 0x%x\n",
> +				    gfsynr1 & SMR_ID_MASK);
> +	else

I think I'd rather drop this message -- a global error is still indicative
that something has gone horriby wrong, and we already print the gsynr
registers below. Furthermore, if we take multiple faults, it might look
like they're all exclusively due to unmatched streams.

On top of that, I'm not convinced that USF will be set for an SMMU that
doesn't implement stream-matching (because it will match an S2CR of type
FAULT).

Other than that, looks good.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/4] iommu/arm-smmu: Allow disabling unmatched stream bypass
@ 2016-02-09 14:06         ` Will Deacon
  0 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:06 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On Tue, Jan 26, 2016 at 06:06:35PM +0000, Robin Murphy wrote:
> Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
> debugging/security feature so that unmatched stream IDs (i.e. devices
> not attached to an IOMMU domain) may be configured to fault.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 32 +++++++++++++++++++++++---------
>  1 file changed, 23 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 1f9093d..d1b7dc1 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -141,6 +141,8 @@
>  #define ID2_PTFS_16K			(1 << 13)
>  #define ID2_PTFS_64K			(1 << 14)
>  
> +#define sGFSR_USF			(1 << 1)
> +
>  /* Global TLB invalidation */
>  #define ARM_SMMU_GR0_TLBIVMID		0x64
>  #define ARM_SMMU_GR0_TLBIALLNSNH	0x68
> @@ -263,6 +265,10 @@ static int force_stage;
>  module_param_named(force_stage, force_stage, int, S_IRUGO);
>  MODULE_PARM_DESC(force_stage,
>  	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
> +static bool disable_bypass;
> +module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
> +MODULE_PARM_DESC(disable_bypass,
> +	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
>  
>  enum arm_smmu_arch_version {
>  	ARM_SMMU_V1 = 1,
> @@ -704,8 +710,12 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>  	if (!gfsr)
>  		return IRQ_NONE;
>  
> -	dev_err_ratelimited(smmu->dev,
> -		"Unexpected global fault, this could be serious\n");
> +	if (gfsr & sGFSR_USF)
> +		dev_err_ratelimited(smmu->dev, "Unmatched stream fault: ID 0x%x\n",
> +				    gfsynr1 & SMR_ID_MASK);
> +	else

I think I'd rather drop this message -- a global error is still indicative
that something has gone horriby wrong, and we already print the gsynr
registers below. Furthermore, if we take multiple faults, it might look
like they're all exclusively due to unmatched streams.

On top of that, I'm not convinced that USF will be set for an SMMU that
doesn't implement stream-matching (because it will match an S2CR of type
FAULT).

Other than that, looks good.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as unprivileged
  2016-01-26 18:06     ` Robin Murphy
@ 2016-02-09 14:08         ` Will Deacon
  -1 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:08 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

On Tue, Jan 26, 2016 at 06:06:34PM +0000, Robin Murphy wrote:
> The IOMMU API has no concept of privilege so assumes all devices and
> mappings are equal, and indeed most non-CPU master devices on an AMBA
> interconnect make little use of the attribute bits on the bus thus by
> default perform unprivileged data accesses.
> 
> Some devices, however, believe themselves more equal than others, such
> as programmable DMA controllers whose 'master' thread issues bus
> transactions marked as privileged instruction fetches, while the data
> accesses of its channel threads (under the control of Linux, at least)
> are marked as unprivileged. This poses a problem for implementing the
> DMA API on an IOMMU conforming to ARM VMSAv8, under which a page that is
> unprivileged-writeable is also implicitly privileged-execute-never.
> Given that, there is no one set of attributes with which iommu_map() can
> implement, say, dma_alloc_coherent() that will allow every possible type
> of access without something running into unexecepted permission faults.
> 
> Fortunately the SMMU architecture provides a means to mitigate such
> issues by overriding the incoming attributes of a transaction; make use
> of that to strip the privileged/unprivileged status off incoming
> transactions, leaving just the instruction/data dichotomy which the
> IOMMU API does at least understand; Four states good, two states better.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 59ee4b8..1f9093d 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -167,6 +167,9 @@
>  #define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
>  #define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
>  
> +#define S2CR_PRIVCFG_SHIFT		24
> +#define S2CR_PRIVCFG_UNPRIV		(2 << S2CR_PRIVCFG_SHIFT)
> +
>  /* Context bank attribute registers */
>  #define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
>  #define CBAR_VMID_SHIFT			0
> @@ -1083,7 +1086,7 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
>  		u32 idx, s2cr;
>  
>  		idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
> -		s2cr = S2CR_TYPE_TRANS |
> +		s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
>  		       (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
>  		writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));

Hmm, do we also need to worry about the bypass case? I guess not for the
moment, but I anticipate horrible things.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as unprivileged
@ 2016-02-09 14:08         ` Will Deacon
  0 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 26, 2016 at 06:06:34PM +0000, Robin Murphy wrote:
> The IOMMU API has no concept of privilege so assumes all devices and
> mappings are equal, and indeed most non-CPU master devices on an AMBA
> interconnect make little use of the attribute bits on the bus thus by
> default perform unprivileged data accesses.
> 
> Some devices, however, believe themselves more equal than others, such
> as programmable DMA controllers whose 'master' thread issues bus
> transactions marked as privileged instruction fetches, while the data
> accesses of its channel threads (under the control of Linux, at least)
> are marked as unprivileged. This poses a problem for implementing the
> DMA API on an IOMMU conforming to ARM VMSAv8, under which a page that is
> unprivileged-writeable is also implicitly privileged-execute-never.
> Given that, there is no one set of attributes with which iommu_map() can
> implement, say, dma_alloc_coherent() that will allow every possible type
> of access without something running into unexecepted permission faults.
> 
> Fortunately the SMMU architecture provides a means to mitigate such
> issues by overriding the incoming attributes of a transaction; make use
> of that to strip the privileged/unprivileged status off incoming
> transactions, leaving just the instruction/data dichotomy which the
> IOMMU API does at least understand; Four states good, two states better.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 59ee4b8..1f9093d 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -167,6 +167,9 @@
>  #define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
>  #define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
>  
> +#define S2CR_PRIVCFG_SHIFT		24
> +#define S2CR_PRIVCFG_UNPRIV		(2 << S2CR_PRIVCFG_SHIFT)
> +
>  /* Context bank attribute registers */
>  #define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
>  #define CBAR_VMID_SHIFT			0
> @@ -1083,7 +1086,7 @@ static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
>  		u32 idx, s2cr;
>  
>  		idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
> -		s2cr = S2CR_TYPE_TRANS |
> +		s2cr = S2CR_TYPE_TRANS | S2CR_PRIVCFG_UNPRIV |
>  		       (smmu_domain->cfg.cbndx << S2CR_CBNDX_SHIFT);
>  		writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));

Hmm, do we also need to worry about the bypass case? I guess not for the
moment, but I anticipate horrible things.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
  2016-01-26 18:06     ` Robin Murphy
@ 2016-02-09 14:15         ` Will Deacon
  -1 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:15 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

On Tue, Jan 26, 2016 at 06:06:37PM +0000, Robin Murphy wrote:
> TLB synchronisation is a mighty big hammmer to bring down on the
> transaction stream, typically stalling all in-flight transactions until
> the sync completes. Since in most cases (except at stage 2 on SMMUv1)
> a per-context sync operation is available, prefer that over the global
> operation when performing TLB maintenance for a single domain, to avoid
> unecessarily disrupting ongoing traffic in other contexts.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 32 ++++++++++++++++++++++++--------
>  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 18e0e10..bf1895c 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -219,6 +219,8 @@
>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>  #define ARM_SMMU_CB_ATS1PR		0x800
>  #define ARM_SMMU_CB_ATSR		0x8f0
>  
> @@ -546,14 +548,22 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  }
>  
>  /* Wait for any pending TLB invalidations to complete */
> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int cbndx)
>  {
>  	int count = 0;
> -	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> +	void __iomem *base, __iomem *status;
>  
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
> -	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
> -	       & sTLBGSTATUS_GSACTIVE) {
> +	if (cbndx < 0) {
> +		base = ARM_SMMU_GR0(smmu);
> +		status = base + ARM_SMMU_GR0_sTLBGSTATUS;
> +		writel_relaxed(0, base + ARM_SMMU_GR0_sTLBGSYNC);
> +	} else {
> +		base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cbndx);
> +		status = base + ARM_SMMU_CB_TLBSTATUS;
> +		writel_relaxed(0, base + ARM_SMMU_CB_TLBSYNC);
> +	}
> +
> +	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
>  		cpu_relax();
>  		if (++count == TLB_LOOP_TIMEOUT) {
>  			dev_err_ratelimited(smmu->dev,
> @@ -567,7 +577,13 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>  static void arm_smmu_tlb_sync(void *cookie)
>  {
>  	struct arm_smmu_domain *smmu_domain = cookie;
> -	__arm_smmu_tlb_sync(smmu_domain->smmu);
> +	int cbndx = smmu_domain->cfg.cbndx;
> +
> +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
> +	    smmu_domain->smmu->version < ARM_SMMU_V2)
> +		cbndx = -1;

I think it would be cleaner just to override the sync function pointer
when we initialise a stage-2 page table for an SMMUv1 implementation.

Any reason not to go that way?

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
@ 2016-02-09 14:15         ` Will Deacon
  0 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 26, 2016 at 06:06:37PM +0000, Robin Murphy wrote:
> TLB synchronisation is a mighty big hammmer to bring down on the
> transaction stream, typically stalling all in-flight transactions until
> the sync completes. Since in most cases (except at stage 2 on SMMUv1)
> a per-context sync operation is available, prefer that over the global
> operation when performing TLB maintenance for a single domain, to avoid
> unecessarily disrupting ongoing traffic in other contexts.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 32 ++++++++++++++++++++++++--------
>  1 file changed, 24 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 18e0e10..bf1895c 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -219,6 +219,8 @@
>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>  #define ARM_SMMU_CB_ATS1PR		0x800
>  #define ARM_SMMU_CB_ATSR		0x8f0
>  
> @@ -546,14 +548,22 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  }
>  
>  /* Wait for any pending TLB invalidations to complete */
> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int cbndx)
>  {
>  	int count = 0;
> -	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> +	void __iomem *base, __iomem *status;
>  
> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
> -	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
> -	       & sTLBGSTATUS_GSACTIVE) {
> +	if (cbndx < 0) {
> +		base = ARM_SMMU_GR0(smmu);
> +		status = base + ARM_SMMU_GR0_sTLBGSTATUS;
> +		writel_relaxed(0, base + ARM_SMMU_GR0_sTLBGSYNC);
> +	} else {
> +		base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cbndx);
> +		status = base + ARM_SMMU_CB_TLBSTATUS;
> +		writel_relaxed(0, base + ARM_SMMU_CB_TLBSYNC);
> +	}
> +
> +	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
>  		cpu_relax();
>  		if (++count == TLB_LOOP_TIMEOUT) {
>  			dev_err_ratelimited(smmu->dev,
> @@ -567,7 +577,13 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>  static void arm_smmu_tlb_sync(void *cookie)
>  {
>  	struct arm_smmu_domain *smmu_domain = cookie;
> -	__arm_smmu_tlb_sync(smmu_domain->smmu);
> +	int cbndx = smmu_domain->cfg.cbndx;
> +
> +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
> +	    smmu_domain->smmu->version < ARM_SMMU_V2)
> +		cbndx = -1;

I think it would be cleaner just to override the sync function pointer
when we initialise a stage-2 page table for an SMMUv1 implementation.

Any reason not to go that way?

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/4] Miscellaneous ARM SMMU patches
  2016-01-26 18:06 ` Robin Murphy
@ 2016-02-09 14:16     ` Will Deacon
  -1 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:16 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

On Tue, Jan 26, 2016 at 06:06:33PM +0000, Robin Murphy wrote:
> Here's my current "miscellaneous SMMU enhancements" branch for your
> consideration. Patch #1 solves a subtle corner case that cropped up
> while exercising stage 1 context formats on the DMA330-MMU500 model;
> #3 will be wanted fairly soon for DMA ops integration so may as well
> get some exposure now; #2 and #4 are more nice-to-haves.

Cheers, Robin. I've queued #1 and #3 for 4.6. If you address the comments
on the other two, I can queue those as well.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/4] Miscellaneous ARM SMMU patches
@ 2016-02-09 14:16     ` Will Deacon
  0 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2016-02-09 14:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 26, 2016 at 06:06:33PM +0000, Robin Murphy wrote:
> Here's my current "miscellaneous SMMU enhancements" branch for your
> consideration. Patch #1 solves a subtle corner case that cropped up
> while exercising stage 1 context formats on the DMA330-MMU500 model;
> #3 will be wanted fairly soon for DMA ops integration so may as well
> get some exposure now; #2 and #4 are more nice-to-haves.

Cheers, Robin. I've queued #1 and #3 for 4.6. If you address the comments
on the other two, I can queue those as well.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
  2016-02-09 14:15         ` Will Deacon
@ 2016-02-10 11:58             ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-02-10 11:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

On 09/02/16 14:15, Will Deacon wrote:
> On Tue, Jan 26, 2016 at 06:06:37PM +0000, Robin Murphy wrote:
>> TLB synchronisation is a mighty big hammmer to bring down on the
>> transaction stream, typically stalling all in-flight transactions until
>> the sync completes. Since in most cases (except at stage 2 on SMMUv1)
>> a per-context sync operation is available, prefer that over the global
>> operation when performing TLB maintenance for a single domain, to avoid
>> unecessarily disrupting ongoing traffic in other contexts.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>   drivers/iommu/arm-smmu.c | 32 ++++++++++++++++++++++++--------
>>   1 file changed, 24 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 18e0e10..bf1895c 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -219,6 +219,8 @@
>>   #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>>   #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>>   #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
>> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
>> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>>   #define ARM_SMMU_CB_ATS1PR		0x800
>>   #define ARM_SMMU_CB_ATSR		0x8f0
>>
>> @@ -546,14 +548,22 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>>   }
>>
>>   /* Wait for any pending TLB invalidations to complete */
>> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int cbndx)
>>   {
>>   	int count = 0;
>> -	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
>> +	void __iomem *base, __iomem *status;
>>
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
>> -	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
>> -	       & sTLBGSTATUS_GSACTIVE) {
>> +	if (cbndx < 0) {
>> +		base = ARM_SMMU_GR0(smmu);
>> +		status = base + ARM_SMMU_GR0_sTLBGSTATUS;
>> +		writel_relaxed(0, base + ARM_SMMU_GR0_sTLBGSYNC);
>> +	} else {
>> +		base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cbndx);
>> +		status = base + ARM_SMMU_CB_TLBSTATUS;
>> +		writel_relaxed(0, base + ARM_SMMU_CB_TLBSYNC);
>> +	}
>> +
>> +	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
>>   		cpu_relax();
>>   		if (++count == TLB_LOOP_TIMEOUT) {
>>   			dev_err_ratelimited(smmu->dev,
>> @@ -567,7 +577,13 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>>   static void arm_smmu_tlb_sync(void *cookie)
>>   {
>>   	struct arm_smmu_domain *smmu_domain = cookie;
>> -	__arm_smmu_tlb_sync(smmu_domain->smmu);
>> +	int cbndx = smmu_domain->cfg.cbndx;
>> +
>> +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
>> +	    smmu_domain->smmu->version < ARM_SMMU_V2)
>> +		cbndx = -1;
>
> I think it would be cleaner just to override the sync function pointer
> when we initialise a stage-2 page table for an SMMUv1 implementation.
>
> Any reason not to go that way?

Frankly, the idea just didn't occur to me. Looking more closely, if we 
were to simply put a set of iommu_gather_ops pointers in each domain 
based on the format, we could then also break up the confusing mess of 
arm_smmu_tlb_inv_range_nosync(), which would be nice.

I'll give that a go on top of the context format series I'm preparing 
(with which it would otherwise conflict horribly) and see how it looks.

Robin.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
@ 2016-02-10 11:58             ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-02-10 11:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/02/16 14:15, Will Deacon wrote:
> On Tue, Jan 26, 2016 at 06:06:37PM +0000, Robin Murphy wrote:
>> TLB synchronisation is a mighty big hammmer to bring down on the
>> transaction stream, typically stalling all in-flight transactions until
>> the sync completes. Since in most cases (except at stage 2 on SMMUv1)
>> a per-context sync operation is available, prefer that over the global
>> operation when performing TLB maintenance for a single domain, to avoid
>> unecessarily disrupting ongoing traffic in other contexts.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/arm-smmu.c | 32 ++++++++++++++++++++++++--------
>>   1 file changed, 24 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 18e0e10..bf1895c 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -219,6 +219,8 @@
>>   #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>>   #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>>   #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
>> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
>> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>>   #define ARM_SMMU_CB_ATS1PR		0x800
>>   #define ARM_SMMU_CB_ATSR		0x8f0
>>
>> @@ -546,14 +548,22 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>>   }
>>
>>   /* Wait for any pending TLB invalidations to complete */
>> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu, int cbndx)
>>   {
>>   	int count = 0;
>> -	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
>> +	void __iomem *base, __iomem *status;
>>
>> -	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
>> -	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
>> -	       & sTLBGSTATUS_GSACTIVE) {
>> +	if (cbndx < 0) {
>> +		base = ARM_SMMU_GR0(smmu);
>> +		status = base + ARM_SMMU_GR0_sTLBGSTATUS;
>> +		writel_relaxed(0, base + ARM_SMMU_GR0_sTLBGSYNC);
>> +	} else {
>> +		base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cbndx);
>> +		status = base + ARM_SMMU_CB_TLBSTATUS;
>> +		writel_relaxed(0, base + ARM_SMMU_CB_TLBSYNC);
>> +	}
>> +
>> +	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
>>   		cpu_relax();
>>   		if (++count == TLB_LOOP_TIMEOUT) {
>>   			dev_err_ratelimited(smmu->dev,
>> @@ -567,7 +577,13 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>>   static void arm_smmu_tlb_sync(void *cookie)
>>   {
>>   	struct arm_smmu_domain *smmu_domain = cookie;
>> -	__arm_smmu_tlb_sync(smmu_domain->smmu);
>> +	int cbndx = smmu_domain->cfg.cbndx;
>> +
>> +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
>> +	    smmu_domain->smmu->version < ARM_SMMU_V2)
>> +		cbndx = -1;
>
> I think it would be cleaner just to override the sync function pointer
> when we initialise a stage-2 page table for an SMMUv1 implementation.
>
> Any reason not to go that way?

Frankly, the idea just didn't occur to me. Looking more closely, if we 
were to simply put a set of iommu_gather_ops pointers in each domain 
based on the format, we could then also break up the confusing mess of 
arm_smmu_tlb_inv_range_nosync(), which would be nice.

I'll give that a go on top of the context format series I'm preparing 
(with which it would otherwise conflict horribly) and see how it looks.

Robin.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/4] iommu/arm-smmu: Allow disabling unmatched stream bypass
  2016-02-09 14:06         ` Will Deacon
@ 2016-02-10 12:10             ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-02-10 12:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

On 09/02/16 14:06, Will Deacon wrote:
> Hi Robin,
>
> On Tue, Jan 26, 2016 at 06:06:35PM +0000, Robin Murphy wrote:
>> Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
>> debugging/security feature so that unmatched stream IDs (i.e. devices
>> not attached to an IOMMU domain) may be configured to fault.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>   drivers/iommu/arm-smmu.c | 32 +++++++++++++++++++++++---------
>>   1 file changed, 23 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 1f9093d..d1b7dc1 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -141,6 +141,8 @@
>>   #define ID2_PTFS_16K			(1 << 13)
>>   #define ID2_PTFS_64K			(1 << 14)
>>
>> +#define sGFSR_USF			(1 << 1)
>> +
>>   /* Global TLB invalidation */
>>   #define ARM_SMMU_GR0_TLBIVMID		0x64
>>   #define ARM_SMMU_GR0_TLBIALLNSNH	0x68
>> @@ -263,6 +265,10 @@ static int force_stage;
>>   module_param_named(force_stage, force_stage, int, S_IRUGO);
>>   MODULE_PARM_DESC(force_stage,
>>   	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
>> +static bool disable_bypass;
>> +module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
>> +MODULE_PARM_DESC(disable_bypass,
>> +	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
>>
>>   enum arm_smmu_arch_version {
>>   	ARM_SMMU_V1 = 1,
>> @@ -704,8 +710,12 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>>   	if (!gfsr)
>>   		return IRQ_NONE;
>>
>> -	dev_err_ratelimited(smmu->dev,
>> -		"Unexpected global fault, this could be serious\n");
>> +	if (gfsr & sGFSR_USF)
>> +		dev_err_ratelimited(smmu->dev, "Unmatched stream fault: ID 0x%x\n",
>> +				    gfsynr1 & SMR_ID_MASK);
>> +	else
>
> I think I'd rather drop this message -- a global error is still indicative
> that something has gone horriby wrong, and we already print the gsynr
> registers below. Furthermore, if we take multiple faults, it might look
> like they're all exclusively due to unmatched streams.
>
> On top of that, I'm not convinced that USF will be set for an SMMU that
> doesn't implement stream-matching (because it will match an S2CR of type
> FAULT).

You're quite right, by the look of it the stream indexing case should 
report ICF rather than USF. My rationale for special-casing the message 
was that it's less of an "unexpected" fault when it's something you've 
specifically asked the driver for, but it also seems reasonable that 
someone turning on such an "I know what I'm doing" feature might be able 
to decode GFSR themselves. Considering the machinations of matching 
!MULTI and USF or ICF as appropriate just for the sake of printing a 
slightly reworded message, I'll drop this hunk.

> Other than that, looks good.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/4] iommu/arm-smmu: Allow disabling unmatched stream bypass
@ 2016-02-10 12:10             ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-02-10 12:10 UTC (permalink / raw)
  To: linux-arm-kernel

On 09/02/16 14:06, Will Deacon wrote:
> Hi Robin,
>
> On Tue, Jan 26, 2016 at 06:06:35PM +0000, Robin Murphy wrote:
>> Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
>> debugging/security feature so that unmatched stream IDs (i.e. devices
>> not attached to an IOMMU domain) may be configured to fault.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/arm-smmu.c | 32 +++++++++++++++++++++++---------
>>   1 file changed, 23 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 1f9093d..d1b7dc1 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -141,6 +141,8 @@
>>   #define ID2_PTFS_16K			(1 << 13)
>>   #define ID2_PTFS_64K			(1 << 14)
>>
>> +#define sGFSR_USF			(1 << 1)
>> +
>>   /* Global TLB invalidation */
>>   #define ARM_SMMU_GR0_TLBIVMID		0x64
>>   #define ARM_SMMU_GR0_TLBIALLNSNH	0x68
>> @@ -263,6 +265,10 @@ static int force_stage;
>>   module_param_named(force_stage, force_stage, int, S_IRUGO);
>>   MODULE_PARM_DESC(force_stage,
>>   	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
>> +static bool disable_bypass;
>> +module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
>> +MODULE_PARM_DESC(disable_bypass,
>> +	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
>>
>>   enum arm_smmu_arch_version {
>>   	ARM_SMMU_V1 = 1,
>> @@ -704,8 +710,12 @@ static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
>>   	if (!gfsr)
>>   		return IRQ_NONE;
>>
>> -	dev_err_ratelimited(smmu->dev,
>> -		"Unexpected global fault, this could be serious\n");
>> +	if (gfsr & sGFSR_USF)
>> +		dev_err_ratelimited(smmu->dev, "Unmatched stream fault: ID 0x%x\n",
>> +				    gfsynr1 & SMR_ID_MASK);
>> +	else
>
> I think I'd rather drop this message -- a global error is still indicative
> that something has gone horriby wrong, and we already print the gsynr
> registers below. Furthermore, if we take multiple faults, it might look
> like they're all exclusively due to unmatched streams.
>
> On top of that, I'm not convinced that USF will be set for an SMMU that
> doesn't implement stream-matching (because it will match an S2CR of type
> FAULT).

You're quite right, by the look of it the stream indexing case should 
report ICF rather than USF. My rationale for special-casing the message 
was that it's less of an "unexpected" fault when it's something you've 
specifically asked the driver for, but it also seems reasonable that 
someone turning on such an "I know what I'm doing" feature might be able 
to decode GFSR themselves. Considering the machinations of matching 
!MULTI and USF or ICF as appropriate just for the sake of printing a 
slightly reworded message, I'll drop this hunk.

> Other than that, looks good.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2] iommu/arm-smmu: Allow disabling unmatched stream bypass
  2016-02-09 14:06         ` Will Deacon
@ 2016-02-10 14:25             ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-02-10 14:25 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
debugging/security feature so that unmatched stream IDs (i.e. devices
not attached to an IOMMU domain) may be configured to fault.

Rather than introduce unsightly inconsistency, or repeat the existing
unnecessary use of module_param_named(), fix that as well in passing.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

v2:
- Don't bother trying to report the fault specially.
- Realise that calling module_param_named() with identical variable
  and parameter names is silly.

 drivers/iommu/arm-smmu.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1f9093d..cf1e330 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -260,9 +260,13 @@
 #define FSYNR0_WNR			(1 << 4)
 
 static int force_stage;
-module_param_named(force_stage, force_stage, int, S_IRUGO);
+module_param(force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
 	"Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
+static bool disable_bypass;
+module_param(disable_bypass, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_bypass,
+	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
 enum arm_smmu_arch_version {
 	ARM_SMMU_V1 = 1,
@@ -1111,9 +1115,9 @@ static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
 	 */
 	for (i = 0; i < cfg->num_streamids; ++i) {
 		u32 idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
+		u32 reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			       gr0_base + ARM_SMMU_GR0_S2CR(idx));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(idx));
 	}
 
 	arm_smmu_master_free_smrs(smmu, cfg);
@@ -1476,11 +1480,11 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 
-	/* Mark all SMRn as invalid and all S2CRn as bypass */
+	/* Mark all SMRn as invalid and all S2CRn as bypass unless overridden */
+	reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 	for (i = 0; i < smmu->num_mapping_groups; ++i) {
 		writel_relaxed(0, gr0_base + ARM_SMMU_GR0_SMR(i));
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			gr0_base + ARM_SMMU_GR0_S2CR(i));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(i));
 	}
 
 	/* Make sure all context banks are disabled and clear CB_FSR  */
@@ -1502,8 +1506,12 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	/* Disable TLB broadcasting. */
 	reg |= (sCR0_VMIDPNE | sCR0_PTM);
 
-	/* Enable client access, but bypass when no mapping is found */
-	reg &= ~(sCR0_CLIENTPD | sCR0_USFCFG);
+	/* Enable client access, handling unmatched streams as appropriate */
+	reg &= ~sCR0_CLIENTPD;
+	if (disable_bypass)
+		reg |= sCR0_USFCFG;
+	else
+		reg &= ~sCR0_USFCFG;
 
 	/* Disable forced broadcasting */
 	reg &= ~sCR0_FB;
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2] iommu/arm-smmu: Allow disabling unmatched stream bypass
@ 2016-02-10 14:25             ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2016-02-10 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

Borrow the disable_bypass parameter from the SMMUv3 driver as a handy
debugging/security feature so that unmatched stream IDs (i.e. devices
not attached to an IOMMU domain) may be configured to fault.

Rather than introduce unsightly inconsistency, or repeat the existing
unnecessary use of module_param_named(), fix that as well in passing.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2:
- Don't bother trying to report the fault specially.
- Realise that calling module_param_named() with identical variable
  and parameter names is silly.

 drivers/iommu/arm-smmu.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1f9093d..cf1e330 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -260,9 +260,13 @@
 #define FSYNR0_WNR			(1 << 4)
 
 static int force_stage;
-module_param_named(force_stage, force_stage, int, S_IRUGO);
+module_param(force_stage, int, S_IRUGO);
 MODULE_PARM_DESC(force_stage,
 	"Force SMMU mappings to be installed@a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation.");
+static bool disable_bypass;
+module_param(disable_bypass, bool, S_IRUGO);
+MODULE_PARM_DESC(disable_bypass,
+	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
 enum arm_smmu_arch_version {
 	ARM_SMMU_V1 = 1,
@@ -1111,9 +1115,9 @@ static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
 	 */
 	for (i = 0; i < cfg->num_streamids; ++i) {
 		u32 idx = cfg->smrs ? cfg->smrs[i].idx : cfg->streamids[i];
+		u32 reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			       gr0_base + ARM_SMMU_GR0_S2CR(idx));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(idx));
 	}
 
 	arm_smmu_master_free_smrs(smmu, cfg);
@@ -1476,11 +1480,11 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	reg = readl_relaxed(ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sGFSR);
 
-	/* Mark all SMRn as invalid and all S2CRn as bypass */
+	/* Mark all SMRn as invalid and all S2CRn as bypass unless overridden */
+	reg = disable_bypass ? S2CR_TYPE_FAULT : S2CR_TYPE_BYPASS;
 	for (i = 0; i < smmu->num_mapping_groups; ++i) {
 		writel_relaxed(0, gr0_base + ARM_SMMU_GR0_SMR(i));
-		writel_relaxed(S2CR_TYPE_BYPASS,
-			gr0_base + ARM_SMMU_GR0_S2CR(i));
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_S2CR(i));
 	}
 
 	/* Make sure all context banks are disabled and clear CB_FSR  */
@@ -1502,8 +1506,12 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 	/* Disable TLB broadcasting. */
 	reg |= (sCR0_VMIDPNE | sCR0_PTM);
 
-	/* Enable client access, but bypass when no mapping is found */
-	reg &= ~(sCR0_CLIENTPD | sCR0_USFCFG);
+	/* Enable client access, handling unmatched streams as appropriate */
+	reg &= ~sCR0_CLIENTPD;
+	if (disable_bypass)
+		reg |= sCR0_USFCFG;
+	else
+		reg &= ~sCR0_USFCFG;
 
 	/* Disable forced broadcasting */
 	reg &= ~sCR0_FB;
-- 
2.7.0.25.gfc10eb5.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
  2017-03-07 18:09 [PATCH 0/4] ARM SMMU per-context TLB sync Robin Murphy
@ 2017-03-07 18:09     ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2017-03-07 18:09 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

TLB synchronisation typically involves the SMMU blocking all incoming
transactions until the TLBs report completion of all outstanding
operations. In the common SMMUv2 configuration of a single distributed
SMMU serving multiple peripherals, that means that a single unmap
request has the potential to bring the hammer down on the entire system
if synchronised globally. Since stage 1 contexts, and stage 2 contexts
under SMMUv2, offer local sync operations, let's make use of those
wherever we can in the hope of minimising global disruption.

To that end, rather than add any more branches to the already unwieldy
monolithic TLB maintenance ops, break them up into smaller, neater,
functions which we can then mix and match as appropriate.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 100 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8aafe304171..f7411109670f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
 #define ARM_SMMU_CB_S1_TLBIVAL		0x620
 #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
 #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
+#define ARM_SMMU_CB_TLBSYNC		0x7f0
+#define ARM_SMMU_CB_TLBSTATUS		0x7f4
 #define ARM_SMMU_CB_ATS1PR		0x800
 #define ARM_SMMU_CB_ATSR		0x8f0
 
@@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 }
 
 /* Wait for any pending TLB invalidations to complete */
-static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
+static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
+				void __iomem *sync, void __iomem *status)
 {
 	int count = 0;
-	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
 
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
-	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
-	       & sTLBGSTATUS_GSACTIVE) {
+	writel_relaxed(0, sync);
+	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
 		cpu_relax();
 		if (++count == TLB_LOOP_TIMEOUT) {
 			dev_err_ratelimited(smmu->dev,
@@ -587,29 +588,49 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 	}
 }
 
-static void arm_smmu_tlb_sync(void *cookie)
+static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)
 {
-	struct arm_smmu_domain *smmu_domain = cookie;
-	__arm_smmu_tlb_sync(smmu_domain->smmu);
+	void __iomem *base = ARM_SMMU_GR0(smmu);
+
+	__arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC,
+			    base + ARM_SMMU_GR0_sTLBGSTATUS);
 }
 
-static void arm_smmu_tlb_inv_context(void *cookie)
+static void arm_smmu_tlb_sync_context(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
+
+	__arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC,
+			    base + ARM_SMMU_CB_TLBSTATUS);
+}
+
+static void arm_smmu_tlb_sync_vmid(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+
+	arm_smmu_tlb_sync_global(smmu_domain->smmu);
+}
+
+static void arm_smmu_tlb_inv_context_s1(void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	void __iomem *base = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
+
+	writel_relaxed(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
+	arm_smmu_tlb_sync_context(cookie);
+}
+
+static void arm_smmu_tlb_inv_context_s2(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
-	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
-	void __iomem *base;
+	void __iomem *base = ARM_SMMU_GR0(smmu);
 
-	if (stage1) {
-		base = ARM_SMMU_CB(smmu, cfg->cbndx);
-		writel_relaxed(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
-	} else {
-		base = ARM_SMMU_GR0(smmu);
-		writel_relaxed(cfg->vmid, base + ARM_SMMU_GR0_TLBIVMID);
-	}
-
-	__arm_smmu_tlb_sync(smmu);
+	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
+	arm_smmu_tlb_sync_global(smmu);
 }
 
 static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
@@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
-	void __iomem *reg;
+	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
+	size_t step;
 
-	if (stage1) {
-		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
-		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
-
-		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
-			iova &= ~12UL;
-			iova |= cfg->asid;
-			do {
-				writel_relaxed(iova, reg);
-				iova += granule;
-			} while (size -= granule);
-		} else {
-			iova >>= 12;
-			iova |= (u64)cfg->asid << 48;
-			do {
-				writeq_relaxed(iova, reg);
-				iova += granule >> 12;
-			} while (size -= granule);
-		}
-	} else if (smmu->version == ARM_SMMU_V2) {
-		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
+	if (stage1)
+		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
+			      ARM_SMMU_CB_S1_TLBIVA;
+	else
 		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
 			      ARM_SMMU_CB_S2_TLBIIPAS2;
-		iova >>= 12;
-		do {
-			smmu_write_atomic_lq(iova, reg);
-			iova += granule >> 12;
-		} while (size -= granule);
+
+	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
+		iova &= ~12UL;
+		iova |= cfg->asid;
+		step = granule;
 	} else {
-		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
-		writel_relaxed(cfg->vmid, reg);
+		iova >>= 12;
+		step = granule >> 12;
+		if (stage1)
+			iova |= (u64)cfg->asid << 48;
 	}
+
+	do {
+		smmu_write_atomic_lq(iova, reg);
+		iova += step;
+	} while (size -= granule);
 }
 
-static const struct iommu_gather_ops arm_smmu_gather_ops = {
-	.tlb_flush_all	= arm_smmu_tlb_inv_context,
+/*
+ * On MMU-401 at least, the cost of firing off multiple TLBIVMIDs appears
+ * almost negligible, but the benefit of getting the first one in as far ahead
+ * of the sync as possible is significant, hence we don't just make this a
+ * no-op and set .tlb_sync to arm_smmu_inv_context_s2() as you might think.
+ */
+static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size,
+					 size_t granule, bool leaf, void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	void __iomem *base = ARM_SMMU_GR0(smmu);
+
+	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
+}
+
+static const struct iommu_gather_ops arm_smmu_s1_tlb_ops = {
+	.tlb_flush_all	= arm_smmu_tlb_inv_context_s1,
 	.tlb_add_flush	= arm_smmu_tlb_inv_range_nosync,
-	.tlb_sync	= arm_smmu_tlb_sync,
+	.tlb_sync	= arm_smmu_tlb_sync_context,
+};
+
+static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v2 = {
+	.tlb_flush_all	= arm_smmu_tlb_inv_context_s2,
+	.tlb_add_flush	= arm_smmu_tlb_inv_range_nosync,
+	.tlb_sync	= arm_smmu_tlb_sync_context,
+};
+
+static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v1 = {
+	.tlb_flush_all	= arm_smmu_tlb_inv_context_s2,
+	.tlb_add_flush	= arm_smmu_tlb_inv_vmid_nosync,
+	.tlb_sync	= arm_smmu_tlb_sync_vmid,
 };
 
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
@@ -829,6 +868,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	enum io_pgtable_fmt fmt;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	const struct iommu_gather_ops *tlb_ops;
 
 	mutex_lock(&smmu_domain->init_mutex);
 	if (smmu_domain->smmu)
@@ -900,6 +940,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 			ias = min(ias, 32UL);
 			oas = min(oas, 32UL);
 		}
+		tlb_ops = &arm_smmu_s1_tlb_ops;
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 		/*
@@ -918,12 +959,15 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 			ias = min(ias, 40UL);
 			oas = min(oas, 40UL);
 		}
+		if (smmu->version == ARM_SMMU_V2)
+			tlb_ops = &arm_smmu_s2_tlb_ops_v2;
+		else
+			tlb_ops = &arm_smmu_s2_tlb_ops_v1;
 		break;
 	default:
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-
 	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
 				      smmu->num_context_banks);
 	if (ret < 0)
@@ -946,7 +990,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
-		.tlb		= &arm_smmu_gather_ops,
+		.tlb		= tlb_ops,
 		.iommu_dev	= smmu->dev,
 	};
 
@@ -1730,7 +1774,7 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 		reg |= sCR0_EXIDENABLE;
 
 	/* Push the button */
-	__arm_smmu_tlb_sync(smmu);
+	arm_smmu_tlb_sync_global(smmu);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 }
 
-- 
2.11.0.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
@ 2017-03-07 18:09     ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2017-03-07 18:09 UTC (permalink / raw)
  To: linux-arm-kernel

TLB synchronisation typically involves the SMMU blocking all incoming
transactions until the TLBs report completion of all outstanding
operations. In the common SMMUv2 configuration of a single distributed
SMMU serving multiple peripherals, that means that a single unmap
request has the potential to bring the hammer down on the entire system
if synchronised globally. Since stage 1 contexts, and stage 2 contexts
under SMMUv2, offer local sync operations, let's make use of those
wherever we can in the hope of minimising global disruption.

To that end, rather than add any more branches to the already unwieldy
monolithic TLB maintenance ops, break them up into smaller, neater,
functions which we can then mix and match as appropriate.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
 1 file changed, 100 insertions(+), 56 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c8aafe304171..f7411109670f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
 #define ARM_SMMU_CB_S1_TLBIVAL		0x620
 #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
 #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
+#define ARM_SMMU_CB_TLBSYNC		0x7f0
+#define ARM_SMMU_CB_TLBSTATUS		0x7f4
 #define ARM_SMMU_CB_ATS1PR		0x800
 #define ARM_SMMU_CB_ATSR		0x8f0
 
@@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 }
 
 /* Wait for any pending TLB invalidations to complete */
-static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
+static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
+				void __iomem *sync, void __iomem *status)
 {
 	int count = 0;
-	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
 
-	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
-	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
-	       & sTLBGSTATUS_GSACTIVE) {
+	writel_relaxed(0, sync);
+	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
 		cpu_relax();
 		if (++count == TLB_LOOP_TIMEOUT) {
 			dev_err_ratelimited(smmu->dev,
@@ -587,29 +588,49 @@ static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
 	}
 }
 
-static void arm_smmu_tlb_sync(void *cookie)
+static void arm_smmu_tlb_sync_global(struct arm_smmu_device *smmu)
 {
-	struct arm_smmu_domain *smmu_domain = cookie;
-	__arm_smmu_tlb_sync(smmu_domain->smmu);
+	void __iomem *base = ARM_SMMU_GR0(smmu);
+
+	__arm_smmu_tlb_sync(smmu, base + ARM_SMMU_GR0_sTLBGSYNC,
+			    base + ARM_SMMU_GR0_sTLBGSTATUS);
 }
 
-static void arm_smmu_tlb_inv_context(void *cookie)
+static void arm_smmu_tlb_sync_context(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	void __iomem *base = ARM_SMMU_CB(smmu, smmu_domain->cfg.cbndx);
+
+	__arm_smmu_tlb_sync(smmu, base + ARM_SMMU_CB_TLBSYNC,
+			    base + ARM_SMMU_CB_TLBSTATUS);
+}
+
+static void arm_smmu_tlb_sync_vmid(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+
+	arm_smmu_tlb_sync_global(smmu_domain->smmu);
+}
+
+static void arm_smmu_tlb_inv_context_s1(void *cookie)
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	void __iomem *base = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
+
+	writel_relaxed(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
+	arm_smmu_tlb_sync_context(cookie);
+}
+
+static void arm_smmu_tlb_inv_context_s2(void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
-	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
-	void __iomem *base;
+	void __iomem *base = ARM_SMMU_GR0(smmu);
 
-	if (stage1) {
-		base = ARM_SMMU_CB(smmu, cfg->cbndx);
-		writel_relaxed(cfg->asid, base + ARM_SMMU_CB_S1_TLBIASID);
-	} else {
-		base = ARM_SMMU_GR0(smmu);
-		writel_relaxed(cfg->vmid, base + ARM_SMMU_GR0_TLBIVMID);
-	}
-
-	__arm_smmu_tlb_sync(smmu);
+	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
+	arm_smmu_tlb_sync_global(smmu);
 }
 
 static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
@@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 {
 	struct arm_smmu_domain *smmu_domain = cookie;
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
-	void __iomem *reg;
+	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
+	size_t step;
 
-	if (stage1) {
-		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
-		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
-
-		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
-			iova &= ~12UL;
-			iova |= cfg->asid;
-			do {
-				writel_relaxed(iova, reg);
-				iova += granule;
-			} while (size -= granule);
-		} else {
-			iova >>= 12;
-			iova |= (u64)cfg->asid << 48;
-			do {
-				writeq_relaxed(iova, reg);
-				iova += granule >> 12;
-			} while (size -= granule);
-		}
-	} else if (smmu->version == ARM_SMMU_V2) {
-		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
+	if (stage1)
+		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
+			      ARM_SMMU_CB_S1_TLBIVA;
+	else
 		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
 			      ARM_SMMU_CB_S2_TLBIIPAS2;
-		iova >>= 12;
-		do {
-			smmu_write_atomic_lq(iova, reg);
-			iova += granule >> 12;
-		} while (size -= granule);
+
+	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
+		iova &= ~12UL;
+		iova |= cfg->asid;
+		step = granule;
 	} else {
-		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
-		writel_relaxed(cfg->vmid, reg);
+		iova >>= 12;
+		step = granule >> 12;
+		if (stage1)
+			iova |= (u64)cfg->asid << 48;
 	}
+
+	do {
+		smmu_write_atomic_lq(iova, reg);
+		iova += step;
+	} while (size -= granule);
 }
 
-static const struct iommu_gather_ops arm_smmu_gather_ops = {
-	.tlb_flush_all	= arm_smmu_tlb_inv_context,
+/*
+ * On MMU-401@least, the cost of firing off multiple TLBIVMIDs appears
+ * almost negligible, but the benefit of getting the first one in as far ahead
+ * of the sync as possible is significant, hence we don't just make this a
+ * no-op and set .tlb_sync to arm_smmu_inv_context_s2() as you might think.
+ */
+static void arm_smmu_tlb_inv_vmid_nosync(unsigned long iova, size_t size,
+					 size_t granule, bool leaf, void *cookie)
+{
+	struct arm_smmu_domain *smmu_domain = cookie;
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	void __iomem *base = ARM_SMMU_GR0(smmu);
+
+	writel_relaxed(smmu_domain->cfg.vmid, base + ARM_SMMU_GR0_TLBIVMID);
+}
+
+static const struct iommu_gather_ops arm_smmu_s1_tlb_ops = {
+	.tlb_flush_all	= arm_smmu_tlb_inv_context_s1,
 	.tlb_add_flush	= arm_smmu_tlb_inv_range_nosync,
-	.tlb_sync	= arm_smmu_tlb_sync,
+	.tlb_sync	= arm_smmu_tlb_sync_context,
+};
+
+static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v2 = {
+	.tlb_flush_all	= arm_smmu_tlb_inv_context_s2,
+	.tlb_add_flush	= arm_smmu_tlb_inv_range_nosync,
+	.tlb_sync	= arm_smmu_tlb_sync_context,
+};
+
+static const struct iommu_gather_ops arm_smmu_s2_tlb_ops_v1 = {
+	.tlb_flush_all	= arm_smmu_tlb_inv_context_s2,
+	.tlb_add_flush	= arm_smmu_tlb_inv_vmid_nosync,
+	.tlb_sync	= arm_smmu_tlb_sync_vmid,
 };
 
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
@@ -829,6 +868,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	enum io_pgtable_fmt fmt;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	const struct iommu_gather_ops *tlb_ops;
 
 	mutex_lock(&smmu_domain->init_mutex);
 	if (smmu_domain->smmu)
@@ -900,6 +940,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 			ias = min(ias, 32UL);
 			oas = min(oas, 32UL);
 		}
+		tlb_ops = &arm_smmu_s1_tlb_ops;
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 		/*
@@ -918,12 +959,15 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 			ias = min(ias, 40UL);
 			oas = min(oas, 40UL);
 		}
+		if (smmu->version == ARM_SMMU_V2)
+			tlb_ops = &arm_smmu_s2_tlb_ops_v2;
+		else
+			tlb_ops = &arm_smmu_s2_tlb_ops_v1;
 		break;
 	default:
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-
 	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
 				      smmu->num_context_banks);
 	if (ret < 0)
@@ -946,7 +990,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
-		.tlb		= &arm_smmu_gather_ops,
+		.tlb		= tlb_ops,
 		.iommu_dev	= smmu->dev,
 	};
 
@@ -1730,7 +1774,7 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 		reg |= sCR0_EXIDENABLE;
 
 	/* Push the button */
-	__arm_smmu_tlb_sync(smmu);
+	arm_smmu_tlb_sync_global(smmu);
 	writel(reg, ARM_SMMU_GR0_NS(smmu) + ARM_SMMU_GR0_sCR0);
 }
 
-- 
2.11.0.dirty

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
  2017-03-07 18:09     ` Robin Murphy
@ 2017-03-30 14:37         ` Will Deacon
  -1 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2017-03-30 14:37 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Robin,

This mostly looks great, but I have a couple of minor comments below.

On Tue, Mar 07, 2017 at 06:09:07PM +0000, Robin Murphy wrote:
> TLB synchronisation typically involves the SMMU blocking all incoming
> transactions until the TLBs report completion of all outstanding
> operations. In the common SMMUv2 configuration of a single distributed
> SMMU serving multiple peripherals, that means that a single unmap
> request has the potential to bring the hammer down on the entire system
> if synchronised globally. Since stage 1 contexts, and stage 2 contexts
> under SMMUv2, offer local sync operations, let's make use of those
> wherever we can in the hope of minimising global disruption.
> 
> To that end, rather than add any more branches to the already unwieldy
> monolithic TLB maintenance ops, break them up into smaller, neater,
> functions which we can then mix and match as appropriate.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 100 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c8aafe304171..f7411109670f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>  #define ARM_SMMU_CB_ATS1PR		0x800
>  #define ARM_SMMU_CB_ATSR		0x8f0
>  
> @@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  }
>  
>  /* Wait for any pending TLB invalidations to complete */
> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> +				void __iomem *sync, void __iomem *status)

Given that you take the arm_smmu_device anyway, I think I'd prefer just
passing the offsets for sync and status and avoiding the additions
in the caller (a bit like your other patch in this series ;).

>  static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
> @@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>  {
>  	struct arm_smmu_domain *smmu_domain = cookie;
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
> -	void __iomem *reg;
> +	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
> +	size_t step;
>  
> -	if (stage1) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> -		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
> -
> -		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> -			iova &= ~12UL;
> -			iova |= cfg->asid;
> -			do {
> -				writel_relaxed(iova, reg);
> -				iova += granule;
> -			} while (size -= granule);
> -		} else {
> -			iova >>= 12;
> -			iova |= (u64)cfg->asid << 48;
> -			do {
> -				writeq_relaxed(iova, reg);
> -				iova += granule >> 12;
> -			} while (size -= granule);
> -		}
> -	} else if (smmu->version == ARM_SMMU_V2) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> +	if (stage1)
> +		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
> +			      ARM_SMMU_CB_S1_TLBIVA;
> +	else
>  		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
>  			      ARM_SMMU_CB_S2_TLBIIPAS2;
> -		iova >>= 12;
> -		do {
> -			smmu_write_atomic_lq(iova, reg);
> -			iova += granule >> 12;
> -		} while (size -= granule);
> +
> +	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> +		iova &= ~12UL;
> +		iova |= cfg->asid;
> +		step = granule;
>  	} else {
> -		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
> -		writel_relaxed(cfg->vmid, reg);
> +		iova >>= 12;
> +		step = granule >> 12;
> +		if (stage1)
> +			iova |= (u64)cfg->asid << 48;
>  	}
> +
> +	do {
> +		smmu_write_atomic_lq(iova, reg);
> +		iova += step;
> +	} while (size -= granule);

There seems to be a lot of refactoring going on here, but I'm not entirely
comfortable with the unconditional move to smmu_write_atomic_lq. Given the
way in which arm_smmu_tlb_inv_range_nosync is now called (i.e. only for
stage-1 or SMMUv2 stage-2), then I think you just need to delete the final
else clause in the current code and you're done.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
@ 2017-03-30 14:37         ` Will Deacon
  0 siblings, 0 replies; 32+ messages in thread
From: Will Deacon @ 2017-03-30 14:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

This mostly looks great, but I have a couple of minor comments below.

On Tue, Mar 07, 2017 at 06:09:07PM +0000, Robin Murphy wrote:
> TLB synchronisation typically involves the SMMU blocking all incoming
> transactions until the TLBs report completion of all outstanding
> operations. In the common SMMUv2 configuration of a single distributed
> SMMU serving multiple peripherals, that means that a single unmap
> request has the potential to bring the hammer down on the entire system
> if synchronised globally. Since stage 1 contexts, and stage 2 contexts
> under SMMUv2, offer local sync operations, let's make use of those
> wherever we can in the hope of minimising global disruption.
> 
> To that end, rather than add any more branches to the already unwieldy
> monolithic TLB maintenance ops, break them up into smaller, neater,
> functions which we can then mix and match as appropriate.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 100 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c8aafe304171..f7411109670f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>  #define ARM_SMMU_CB_ATS1PR		0x800
>  #define ARM_SMMU_CB_ATSR		0x8f0
>  
> @@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  }
>  
>  /* Wait for any pending TLB invalidations to complete */
> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> +				void __iomem *sync, void __iomem *status)

Given that you take the arm_smmu_device anyway, I think I'd prefer just
passing the offsets for sync and status and avoiding the additions
in the caller (a bit like your other patch in this series ;).

>  static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
> @@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>  {
>  	struct arm_smmu_domain *smmu_domain = cookie;
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
> -	void __iomem *reg;
> +	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
> +	size_t step;
>  
> -	if (stage1) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> -		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
> -
> -		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> -			iova &= ~12UL;
> -			iova |= cfg->asid;
> -			do {
> -				writel_relaxed(iova, reg);
> -				iova += granule;
> -			} while (size -= granule);
> -		} else {
> -			iova >>= 12;
> -			iova |= (u64)cfg->asid << 48;
> -			do {
> -				writeq_relaxed(iova, reg);
> -				iova += granule >> 12;
> -			} while (size -= granule);
> -		}
> -	} else if (smmu->version == ARM_SMMU_V2) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> +	if (stage1)
> +		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
> +			      ARM_SMMU_CB_S1_TLBIVA;
> +	else
>  		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
>  			      ARM_SMMU_CB_S2_TLBIIPAS2;
> -		iova >>= 12;
> -		do {
> -			smmu_write_atomic_lq(iova, reg);
> -			iova += granule >> 12;
> -		} while (size -= granule);
> +
> +	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> +		iova &= ~12UL;
> +		iova |= cfg->asid;
> +		step = granule;
>  	} else {
> -		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
> -		writel_relaxed(cfg->vmid, reg);
> +		iova >>= 12;
> +		step = granule >> 12;
> +		if (stage1)
> +			iova |= (u64)cfg->asid << 48;
>  	}
> +
> +	do {
> +		smmu_write_atomic_lq(iova, reg);
> +		iova += step;
> +	} while (size -= granule);

There seems to be a lot of refactoring going on here, but I'm not entirely
comfortable with the unconditional move to smmu_write_atomic_lq. Given the
way in which arm_smmu_tlb_inv_range_nosync is now called (i.e. only for
stage-1 or SMMUv2 stage-2), then I think you just need to delete the final
else clause in the current code and you're done.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
  2017-03-30 14:37         ` Will Deacon
@ 2017-03-30 15:48             ` Robin Murphy
  -1 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2017-03-30 15:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 30/03/17 15:37, Will Deacon wrote:
> Hi Robin,
> 
> This mostly looks great, but I have a couple of minor comments below.
> 
> On Tue, Mar 07, 2017 at 06:09:07PM +0000, Robin Murphy wrote:
>> TLB synchronisation typically involves the SMMU blocking all incoming
>> transactions until the TLBs report completion of all outstanding
>> operations. In the common SMMUv2 configuration of a single distributed
>> SMMU serving multiple peripherals, that means that a single unmap
>> request has the potential to bring the hammer down on the entire system
>> if synchronised globally. Since stage 1 contexts, and stage 2 contexts
>> under SMMUv2, offer local sync operations, let's make use of those
>> wherever we can in the hope of minimising global disruption.
>>
>> To that end, rather than add any more branches to the already unwieldy
>> monolithic TLB maintenance ops, break them up into smaller, neater,
>> functions which we can then mix and match as appropriate.
>>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>  drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 100 insertions(+), 56 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8aafe304171..f7411109670f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
>>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
>> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
>> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>>  #define ARM_SMMU_CB_ATS1PR		0x800
>>  #define ARM_SMMU_CB_ATSR		0x8f0
>>  
>> @@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>>  }
>>  
>>  /* Wait for any pending TLB invalidations to complete */
>> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>> +				void __iomem *sync, void __iomem *status)
> 
> Given that you take the arm_smmu_device anyway, I think I'd prefer just
> passing the offsets for sync and status and avoiding the additions
> in the caller (a bit like your other patch in this series ;).

Note that the sole reason for passing the arm_smmu_device is for the
dev_err(), but I neither want to remove that nor duplicate it across the
callers...

However, the concrete reason for not passing offsets is that this
function serves for both global and local syncs, so there is no single
base address that can be assumed. At one point I toyed with just passing
a context bank number (using -1 for "global") but even I thought that
ended up looking awful ;)

>>  static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>> @@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>>  {
>>  	struct arm_smmu_domain *smmu_domain = cookie;
>>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>  	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
>> -	void __iomem *reg;
>> +	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
>> +	size_t step;
>>  
>> -	if (stage1) {
>> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
>> -		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
>> -
>> -		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
>> -			iova &= ~12UL;
>> -			iova |= cfg->asid;
>> -			do {
>> -				writel_relaxed(iova, reg);
>> -				iova += granule;
>> -			} while (size -= granule);
>> -		} else {
>> -			iova >>= 12;
>> -			iova |= (u64)cfg->asid << 48;
>> -			do {
>> -				writeq_relaxed(iova, reg);
>> -				iova += granule >> 12;
>> -			} while (size -= granule);
>> -		}
>> -	} else if (smmu->version == ARM_SMMU_V2) {
>> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
>> +	if (stage1)
>> +		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
>> +			      ARM_SMMU_CB_S1_TLBIVA;
>> +	else
>>  		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
>>  			      ARM_SMMU_CB_S2_TLBIIPAS2;
>> -		iova >>= 12;
>> -		do {
>> -			smmu_write_atomic_lq(iova, reg);
>> -			iova += granule >> 12;
>> -		} while (size -= granule);
>> +
>> +	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
>> +		iova &= ~12UL;
>> +		iova |= cfg->asid;
>> +		step = granule;
>>  	} else {
>> -		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
>> -		writel_relaxed(cfg->vmid, reg);
>> +		iova >>= 12;
>> +		step = granule >> 12;
>> +		if (stage1)
>> +			iova |= (u64)cfg->asid << 48;
>>  	}
>> +
>> +	do {
>> +		smmu_write_atomic_lq(iova, reg);
>> +		iova += step;
>> +	} while (size -= granule);
> 
> There seems to be a lot of refactoring going on here, but I'm not entirely
> comfortable with the unconditional move to smmu_write_atomic_lq. Given the
> way in which arm_smmu_tlb_inv_range_nosync is now called (i.e. only for
> stage-1 or SMMUv2 stage-2), then I think you just need to delete the final
> else clause in the current code and you're done.

Yes, this bears the scars of having been split into multiple functions
then put back together again, possibly more than once (I forget the full
history). Now that things have settled on just 3 sets of ops (stage 1,
stage 2, and braindead stage 2), I'll have one last try at a stage
1/stage 2 split, just to try and make the logic less inscrutable (which
is my main gripe with this function), but I'll do so as a standalone
patch that you can have the final call on.

Robin.

> 
> Will
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
@ 2017-03-30 15:48             ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2017-03-30 15:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 30/03/17 15:37, Will Deacon wrote:
> Hi Robin,
> 
> This mostly looks great, but I have a couple of minor comments below.
> 
> On Tue, Mar 07, 2017 at 06:09:07PM +0000, Robin Murphy wrote:
>> TLB synchronisation typically involves the SMMU blocking all incoming
>> transactions until the TLBs report completion of all outstanding
>> operations. In the common SMMUv2 configuration of a single distributed
>> SMMU serving multiple peripherals, that means that a single unmap
>> request has the potential to bring the hammer down on the entire system
>> if synchronised globally. Since stage 1 contexts, and stage 2 contexts
>> under SMMUv2, offer local sync operations, let's make use of those
>> wherever we can in the hope of minimising global disruption.
>>
>> To that end, rather than add any more branches to the already unwieldy
>> monolithic TLB maintenance ops, break them up into smaller, neater,
>> functions which we can then mix and match as appropriate.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>  drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
>>  1 file changed, 100 insertions(+), 56 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index c8aafe304171..f7411109670f 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
>>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
>> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
>> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>>  #define ARM_SMMU_CB_ATS1PR		0x800
>>  #define ARM_SMMU_CB_ATSR		0x8f0
>>  
>> @@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>>  }
>>  
>>  /* Wait for any pending TLB invalidations to complete */
>> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
>> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>> +				void __iomem *sync, void __iomem *status)
> 
> Given that you take the arm_smmu_device anyway, I think I'd prefer just
> passing the offsets for sync and status and avoiding the additions
> in the caller (a bit like your other patch in this series ;).

Note that the sole reason for passing the arm_smmu_device is for the
dev_err(), but I neither want to remove that nor duplicate it across the
callers...

However, the concrete reason for not passing offsets is that this
function serves for both global and local syncs, so there is no single
base address that can be assumed. At one point I toyed with just passing
a context bank number (using -1 for "global") but even I thought that
ended up looking awful ;)

>>  static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>> @@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>>  {
>>  	struct arm_smmu_domain *smmu_domain = cookie;
>>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
>> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
>>  	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
>> -	void __iomem *reg;
>> +	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
>> +	size_t step;
>>  
>> -	if (stage1) {
>> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
>> -		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
>> -
>> -		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
>> -			iova &= ~12UL;
>> -			iova |= cfg->asid;
>> -			do {
>> -				writel_relaxed(iova, reg);
>> -				iova += granule;
>> -			} while (size -= granule);
>> -		} else {
>> -			iova >>= 12;
>> -			iova |= (u64)cfg->asid << 48;
>> -			do {
>> -				writeq_relaxed(iova, reg);
>> -				iova += granule >> 12;
>> -			} while (size -= granule);
>> -		}
>> -	} else if (smmu->version == ARM_SMMU_V2) {
>> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
>> +	if (stage1)
>> +		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
>> +			      ARM_SMMU_CB_S1_TLBIVA;
>> +	else
>>  		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
>>  			      ARM_SMMU_CB_S2_TLBIIPAS2;
>> -		iova >>= 12;
>> -		do {
>> -			smmu_write_atomic_lq(iova, reg);
>> -			iova += granule >> 12;
>> -		} while (size -= granule);
>> +
>> +	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
>> +		iova &= ~12UL;
>> +		iova |= cfg->asid;
>> +		step = granule;
>>  	} else {
>> -		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
>> -		writel_relaxed(cfg->vmid, reg);
>> +		iova >>= 12;
>> +		step = granule >> 12;
>> +		if (stage1)
>> +			iova |= (u64)cfg->asid << 48;
>>  	}
>> +
>> +	do {
>> +		smmu_write_atomic_lq(iova, reg);
>> +		iova += step;
>> +	} while (size -= granule);
> 
> There seems to be a lot of refactoring going on here, but I'm not entirely
> comfortable with the unconditional move to smmu_write_atomic_lq. Given the
> way in which arm_smmu_tlb_inv_range_nosync is now called (i.e. only for
> stage-1 or SMMUv2 stage-2), then I think you just need to delete the final
> else clause in the current code and you're done.

Yes, this bears the scars of having been split into multiple functions
then put back together again, possibly more than once (I forget the full
history). Now that things have settled on just 3 sets of ops (stage 1,
stage 2, and braindead stage 2), I'll have one last try at a stage
1/stage 2 split, just to try and make the logic less inscrutable (which
is my main gripe with this function), but I'll do so as a standalone
patch that you can have the final call on.

Robin.

> 
> Will
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-03-30 15:48 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-26 18:06 [PATCH 0/4] Miscellaneous ARM SMMU patches Robin Murphy
2016-01-26 18:06 ` Robin Murphy
     [not found] ` <cover.1453830752.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-01-26 18:06   ` [PATCH 1/4] iommu/arm-smmu: Treat all device transactions as unprivileged Robin Murphy
2016-01-26 18:06     ` Robin Murphy
     [not found]     ` <6c5730256333b8d941f2c0371c1ab709a454938c.1453830752.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-01-27  6:00       ` Anup Patel
2016-01-27  6:00         ` Anup Patel
2016-02-09 14:08       ` Will Deacon
2016-02-09 14:08         ` Will Deacon
2016-01-26 18:06   ` [PATCH 2/4] iommu/arm-smmu: Allow disabling unmatched stream bypass Robin Murphy
2016-01-26 18:06     ` Robin Murphy
     [not found]     ` <10ebf5a136a787da54ffd1d6167953f82f01a834.1453830752.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-02-09 14:06       ` Will Deacon
2016-02-09 14:06         ` Will Deacon
     [not found]         ` <20160209140631.GM22874-5wv7dgnIgG8@public.gmane.org>
2016-02-10 12:10           ` Robin Murphy
2016-02-10 12:10             ` Robin Murphy
2016-02-10 14:25           ` [PATCH v2] " Robin Murphy
2016-02-10 14:25             ` Robin Murphy
2016-01-26 18:06   ` [PATCH 3/4] iommu/arm-smmu: Support DMA-API domains Robin Murphy
2016-01-26 18:06     ` Robin Murphy
2016-01-26 18:06   ` [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate Robin Murphy
2016-01-26 18:06     ` Robin Murphy
     [not found]     ` <fc2ede950aea2e84a0b7cf93e4cd605f0cbdf3c3.1453830752.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-02-09 14:15       ` Will Deacon
2016-02-09 14:15         ` Will Deacon
     [not found]         ` <20160209141508.GO22874-5wv7dgnIgG8@public.gmane.org>
2016-02-10 11:58           ` Robin Murphy
2016-02-10 11:58             ` Robin Murphy
2016-02-09 14:16   ` [PATCH 0/4] Miscellaneous ARM SMMU patches Will Deacon
2016-02-09 14:16     ` Will Deacon
  -- strict thread matches above, loose matches on Subject: below --
2017-03-07 18:09 [PATCH 0/4] ARM SMMU per-context TLB sync Robin Murphy
     [not found] ` <cover.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-07 18:09   ` [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate Robin Murphy
2017-03-07 18:09     ` Robin Murphy
     [not found]     ` <03ef837586cd991f2d8431908230fd3a3dd8c9cf.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-30 14:37       ` Will Deacon
2017-03-30 14:37         ` Will Deacon
     [not found]         ` <20170330143744.GG22160-5wv7dgnIgG8@public.gmane.org>
2017-03-30 15:48           ` Robin Murphy
2017-03-30 15:48             ` Robin Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.