Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V9 2/3] Documentation: admin-guide: perf: add i.MX8 ddr pmu user doc
From: Joakim Zhang @ 2019-08-28 12:07 UTC (permalink / raw)
  To: mark.rutland@arm.com, will@kernel.org, robin.murphy@arm.com
  Cc: Frank Li, dl-linux-imx, linux-arm-kernel@lists.infradead.org,
	Joakim Zhang
In-Reply-To: <20190828120524.9038-1-qiangqing.zhang@nxp.com>

Add i.MX8 ddr pmu user doc.

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>

---
ChangeLog:
V1 -> V4:
	* new add in V4.
V4 -> V5:
	* no change.
V5 -> V6:
	* change the event name
V6 -> V7:
	* no change.
V7 -> V8:
	* improve the doc, add more details.
V8 -> V9:
	* improve the doc.
---
 Documentation/admin-guide/perf/imx-ddr.rst | 52 ++++++++++++++++++++++
 1 file changed, 52 insertions(+)
 create mode 100644 Documentation/admin-guide/perf/imx-ddr.rst

diff --git a/Documentation/admin-guide/perf/imx-ddr.rst b/Documentation/admin-guide/perf/imx-ddr.rst
new file mode 100644
index 000000000000..517a205abad6
--- /dev/null
+++ b/Documentation/admin-guide/perf/imx-ddr.rst
@@ -0,0 +1,52 @@
+=====================================================
+Freescale i.MX8 DDR Performance Monitoring Unit (PMU)
+=====================================================
+
+There are no performance counters inside the DRAM controller, so performance
+signals are brought out to the edge of the controller where a set of 4 x 32 bit
+counters is implemented. This is controlled by the CSV modes programed in counter
+control register which causes a large number of PERF signals to be generated.
+
+Selection of the value for each counter is done via the config registers. There
+is one register for each counter. Counter 0 is special in that it always counts
+“time” and when expired causes a lock on itself and the other counters and an
+interrupt is raised. If any other counter overflows, it continues counting, and
+no interrupt is raised.
+
+The "format" directory describes format of the config (event ID) and config1
+(AXI filtering) fields of the perf_event_attr structure, see /sys/bus/event_source/
+devices/imx8_ddr0/format/. The "events" directory describes the events types
+hardware supported that can be used with perf tool, see /sys/bus/event_source/
+devices/imx8_ddr0/events/.
+  e.g.::
+        perf stat -a -e imx8_ddr0/cycles/ cmd
+        perf stat -a -e imx8_ddr0/read/,imx8_ddr0/write/ cmd
+
+AXI filtering is only used by CSV modes 0x41 (axid-read) and 0x42 (axid-write)
+to count reading or writing matches filter setting. Filter setting is various
+from different DRAM controller implementations, which is distinguished by quirks
+in the driver.
+
+* With DDR_CAP_AXI_ID_FILTER quirk.
+  Filter is defined with two configuration parts:
+  --AXI_ID defines AxID matching value.
+  --AXI_MASKING defines which bits of AxID are meaningful for the matching.
+        0:corresponding bit is masked.
+        1: corresponding bit is not masked, i.e. used to do the matching.
+
+  AXI_ID and AXI_MASKING are mapped on DPCR1 register in performance counter.
+  When non-masked bits are matching corresponding AXI_ID bits then counter is
+  incremented. Perf counter is incremented if
+          AxID && AXI_MASKING == AXI_ID && AXI_MASKING
+
+  This filter doesn't support filter different AXI ID for axid-read and axid-write
+  event at the same time as this filter is shared between counters.
+  e.g.::
+        perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd
+        perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd
+
+  NOTE: axi_mask is inverted in userspace(i.e. set bits are bits to mask), and
+  it will be reverted in driver automatically. so that the user can just specify
+  axi_id to monitor a specific id, rather than having to specify axi_mask.
+  e.g.::
+        perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12
-- 
2.17.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH V9 1/3] perf: imx8_ddr_perf: add AXI ID filter support
From: Joakim Zhang @ 2019-08-28 12:07 UTC (permalink / raw)
  To: mark.rutland@arm.com, will@kernel.org, robin.murphy@arm.com
  Cc: Frank Li, dl-linux-imx, linux-arm-kernel@lists.infradead.org,
	Joakim Zhang

AXI filtering is used by CSV modes 0x41 and 0x42 to count reads or
writes with an ARID or AWID matching filter setting. Granularity is at
subsystem level. Implementation does not allow filtring between masters
within a subsystem. Filter is defined with 2 configuration parameters.

--AXI_ID defines AxID matching value
--AXI_MASKING defines which bits of AxID are meaningful for the matching
	0:corresponding bit is masked
	1: corresponding bit is not masked, i.e. used to do the matching

When non-masked bits are matching corresponding AXI_ID bits then counter
is incremented. This filter allows counting read or write access from a
subsystem or multiple subsystems.

Perf counter is incremented if AxID && AXI_MASKING == AXI_ID && AXI_MASKING

AXI_ID and AXI_MASKING are mapped on DPCR1 register in performance counter.

Read and write AXI ID filter should write same value to DPCR1 if want to
specify at the same time as this filter is shared between counters.

e.g.
perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd
perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD/ cmd

NOTE: axi_mask is inverted in userspace(i.e. set bits are bits to mask), and
it will be reverted in driver automatically. so that the user can just specify
axi_id to monitor a specific id, rather than having to specify axi_mask.
e.g.
perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will monitor ARID=0x12

Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>

---
ChangeLog:
V1 -> V2:
	* add error log if user specifies read/write AXI ID filter at
	the same time.
	* of_device_get_match_data() instead of of_match_device(), and
	remove the check of return value.
V2 -> V3:
	* move the AXI ID check to event_add().
	* add support for same value of axi_id.
V3 -> V4:
	* move the AXI ID check to event_init().
V4 -> V5:
	* reject event group if AXI ID not consistent in event_init().
V5 -> V6:
	* change the event name: axi-id-read->axid-read;
	axi-id-write->axid-write
	* add another helper: ddr_perf_filters_compatible()
	* drop the dev_dbg()
V6 -> V7:
	* revert AXI_MASKING at driver.
V7 -> V8:
	* separate axi_id to axi_mask and axi_id these two fileds.
V8 -> V9:
	* only program DPCR1 for filtered events.
---
 drivers/perf/fsl_imx8_ddr_perf.c | 74 +++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/fsl_imx8_ddr_perf.c b/drivers/perf/fsl_imx8_ddr_perf.c
index 0e3310dbb145..ce7345745b42 100644
--- a/drivers/perf/fsl_imx8_ddr_perf.c
+++ b/drivers/perf/fsl_imx8_ddr_perf.c
@@ -35,6 +35,8 @@
 #define EVENT_CYCLES_COUNTER	0
 #define NUM_COUNTERS		4
 
+#define AXI_MASKING_REVERT	0xffff0000	/* AXI_MASKING(MSB 16bits) + AXI_ID(LSB 16bits) */
+
 #define to_ddr_pmu(p)		container_of(p, struct ddr_pmu, pmu)
 
 #define DDR_PERF_DEV_NAME	"imx8_ddr"
@@ -42,9 +44,22 @@
 
 static DEFINE_IDA(ddr_ida);
 
+/* DDR Perf hardware feature */
+#define DDR_CAP_AXI_ID_FILTER          0x1     /* support AXI ID filter */
+
+struct fsl_ddr_devtype_data {
+	unsigned int quirks;    /* quirks needed for different DDR Perf core */
+};
+
+static const struct fsl_ddr_devtype_data imx8_devtype_data;
+
+static const struct fsl_ddr_devtype_data imx8m_devtype_data = {
+	.quirks = DDR_CAP_AXI_ID_FILTER,
+};
+
 static const struct of_device_id imx_ddr_pmu_dt_ids[] = {
-	{ .compatible = "fsl,imx8-ddr-pmu",},
-	{ .compatible = "fsl,imx8m-ddr-pmu",},
+	{ .compatible = "fsl,imx8-ddr-pmu", .data = &imx8_devtype_data},
+	{ .compatible = "fsl,imx8m-ddr-pmu", .data = &imx8m_devtype_data},
 	{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, imx_ddr_pmu_dt_ids);
@@ -58,6 +73,7 @@ struct ddr_pmu {
 	struct perf_event *events[NUM_COUNTERS];
 	int active_events;
 	enum cpuhp_state cpuhp_state;
+	const struct fsl_ddr_devtype_data *devtype_data;
 	int irq;
 	int id;
 };
@@ -129,6 +145,8 @@ static struct attribute *ddr_perf_events_attrs[] = {
 	IMX8_DDR_PMU_EVENT_ATTR(refresh, 0x37),
 	IMX8_DDR_PMU_EVENT_ATTR(write, 0x38),
 	IMX8_DDR_PMU_EVENT_ATTR(raw-hazard, 0x39),
+	IMX8_DDR_PMU_EVENT_ATTR(axid-read, 0x41),
+	IMX8_DDR_PMU_EVENT_ATTR(axid-write, 0x42),
 	NULL,
 };
 
@@ -138,9 +156,13 @@ static struct attribute_group ddr_perf_events_attr_group = {
 };
 
 PMU_FORMAT_ATTR(event, "config:0-7");
+PMU_FORMAT_ATTR(axi_id, "config1:0-15");
+PMU_FORMAT_ATTR(axi_mask, "config1:16-31");
 
 static struct attribute *ddr_perf_format_attrs[] = {
 	&format_attr_event.attr,
+	&format_attr_axi_id.attr,
+	&format_attr_axi_mask.attr,
 	NULL,
 };
 
@@ -190,6 +212,26 @@ static u32 ddr_perf_read_counter(struct ddr_pmu *pmu, int counter)
 	return readl_relaxed(pmu->base + COUNTER_READ + counter * 4);
 }
 
+static bool ddr_perf_is_filtered(struct perf_event *event)
+{
+	return event->attr.config == 0x41 || event->attr.config == 0x42;
+}
+
+static u32 ddr_perf_filter_val(struct perf_event *event)
+{
+	return event->attr.config1;
+}
+
+static bool ddr_perf_filters_compatible(struct perf_event *a,
+					struct perf_event *b)
+{
+	if (!ddr_perf_is_filtered(a))
+		return true;
+	if (!ddr_perf_is_filtered(b))
+		return true;
+	return ddr_perf_filter_val(a) == ddr_perf_filter_val(b);
+}
+
 static int ddr_perf_event_init(struct perf_event *event)
 {
 	struct ddr_pmu *pmu = to_ddr_pmu(event->pmu);
@@ -216,6 +258,15 @@ static int ddr_perf_event_init(struct perf_event *event)
 			!is_software_event(event->group_leader))
 		return -EINVAL;
 
+	if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
+		if (!ddr_perf_filters_compatible(event, event->group_leader))
+			return -EINVAL;
+		for_each_sibling_event(sibling, event->group_leader) {
+			if (!ddr_perf_filters_compatible(event, sibling))
+				return -EINVAL;
+		}
+	}
+
 	for_each_sibling_event(sibling, event->group_leader) {
 		if (sibling->pmu != event->pmu &&
 				!is_software_event(sibling))
@@ -288,6 +339,23 @@ static int ddr_perf_event_add(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int counter;
 	int cfg = event->attr.config;
+	int cfg1 = event->attr.config1;
+
+	if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
+		int i;
+
+		for (i = 1; i < NUM_COUNTERS; i++) {
+			if (pmu->events[i] &&
+			    !ddr_perf_filters_compatible(event, pmu->events[i]))
+				return -EINVAL;
+		}
+
+		if (ddr_perf_is_filtered(event)) {
+			/* revert axi id masking(axi_mask) value */
+			cfg1 ^= AXI_MASKING_REVERT;
+			writel(cfg1, pmu->base + COUNTER_DPCR1);
+		}
+	}
 
 	counter = ddr_perf_alloc_counter(pmu, cfg);
 	if (counter < 0) {
@@ -473,6 +541,8 @@ static int ddr_perf_probe(struct platform_device *pdev)
 	if (!name)
 		return -ENOMEM;
 
+	pmu->devtype_data = of_device_get_match_data(&pdev->dev);
+
 	pmu->cpu = raw_smp_processor_id();
 	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
 				      DDR_CPUHP_CB_NAME,
-- 
2.17.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2 2/7] soc: renesas: r8a7795-sysc: Fix power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel
In-Reply-To: <20190828113618.6672-1-geert+renesas@glider.be>

Describe the location and contents of the SYSCEXTMASK register on R-Car
H3, to prevent conflicts between internal and external power requests.

This register does not exist on R-Car H3 ES1.x and ES2.x.

Based on a patch in the BSP by Dien Pham <dien.pham.ry@renesas.com>.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - No changes.
---
 drivers/soc/renesas/r8a7795-sysc.c | 32 +++++++++++++++++++++++++-----
 drivers/soc/renesas/rcar-sysc.h    |  2 +-
 2 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/soc/renesas/r8a7795-sysc.c b/drivers/soc/renesas/r8a7795-sysc.c
index cda27a67de9876af..7e15cd09c4eb4780 100644
--- a/drivers/soc/renesas/r8a7795-sysc.c
+++ b/drivers/soc/renesas/r8a7795-sysc.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2016-2017 Glider bvba
  */
 
+#include <linux/bits.h>
 #include <linux/bug.h>
 #include <linux/kernel.h>
 #include <linux/sys_soc.h>
@@ -51,25 +52,46 @@ static struct rcar_sysc_area r8a7795_areas[] __initdata = {
 
 
 	/*
-	 * Fixups for R-Car H3 revisions after ES1.x
+	 * Fixups for R-Car H3 revisions
 	 */
 
-static const struct soc_device_attribute r8a7795es1[] __initconst = {
-	{ .soc_id = "r8a7795", .revision = "ES1.*" },
+#define HAS_A2VC0	BIT(0)		/* Power domain A2VC0 is present */
+#define NO_EXTMASK	BIT(1)		/* Missing SYSCEXTMASK register */
+
+static const struct soc_device_attribute r8a7795_quirks_match[] __initconst = {
+	{
+		.soc_id = "r8a7795", .revision = "ES1.*",
+		.data = (void *)(HAS_A2VC0 | NO_EXTMASK),
+	}, {
+		.soc_id = "r8a7795", .revision = "ES2.*",
+		.data = (void *)(NO_EXTMASK),
+	},
 	{ /* sentinel */ }
 };
 
 static int __init r8a7795_sysc_init(void)
 {
-	if (!soc_device_match(r8a7795es1))
+	const struct soc_device_attribute *attr;
+	u32 quirks = 0;
+
+	attr = soc_device_match(r8a7795_quirks_match);
+	if (attr)
+		quirks = (uintptr_t)attr->data;
+
+	if (!(quirks & HAS_A2VC0))
 		rcar_sysc_nullify(r8a7795_areas, ARRAY_SIZE(r8a7795_areas),
 				  R8A7795_PD_A2VC0);
 
+	if (quirks & NO_EXTMASK)
+		r8a7795_sysc_info.extmask_val = 0;
+
 	return 0;
 }
 
-const struct rcar_sysc_info r8a7795_sysc_info __initconst = {
+struct rcar_sysc_info r8a7795_sysc_info __initdata = {
 	.init = r8a7795_sysc_init,
 	.areas = r8a7795_areas,
 	.num_areas = ARRAY_SIZE(r8a7795_areas),
+	.extmask_offs = 0x2f8,
+	.extmask_val = BIT(0),
 };
diff --git a/drivers/soc/renesas/rcar-sysc.h b/drivers/soc/renesas/rcar-sysc.h
index 21ee3ff8620bbafe..499797a9e18c2f10 100644
--- a/drivers/soc/renesas/rcar-sysc.h
+++ b/drivers/soc/renesas/rcar-sysc.h
@@ -59,7 +59,7 @@ extern const struct rcar_sysc_info r8a7790_sysc_info;
 extern const struct rcar_sysc_info r8a7791_sysc_info;
 extern const struct rcar_sysc_info r8a7792_sysc_info;
 extern const struct rcar_sysc_info r8a7794_sysc_info;
-extern const struct rcar_sysc_info r8a7795_sysc_info;
+extern struct rcar_sysc_info r8a7795_sysc_info;
 extern const struct rcar_sysc_info r8a7796_sysc_info;
 extern const struct rcar_sysc_info r8a77965_sysc_info;
 extern const struct rcar_sysc_info r8a77970_sysc_info;
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2 1/7] soc: renesas: rcar-sysc: Prepare for fixing power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel
In-Reply-To: <20190828113618.6672-1-geert+renesas@glider.be>

Recent R-Car Gen3 SoCs added an External Request Mask Register to the
System Controller (SYSC).  This register allows to mask external power
requests for CPU or 3DG domains, to prevent conflicts between powering
off CPU cores or the 3D Graphics Engine, and changing the state of
another power domain through SYSC, which could lead to CPG state machine
lock-ups.

Add support for making use of this register.  Take into account that the
register is optional, and that its location and contents are
SoC-specific.

Note that the issue fixed by this cannot happen in the upstream kernel,
as upstream has no support for graphics acceleration yet.  SoCs lacking
the External Request Mask Register may need a different mitigation in
the future.

Inspired by a patch in the BSP by Dien Pham <dien.pham.ry@renesas.com>.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - Improve patch description.
---
 drivers/soc/renesas/rcar-sysc.c | 16 ++++++++++++++++
 drivers/soc/renesas/rcar-sysc.h |  3 +++
 2 files changed, 19 insertions(+)

diff --git a/drivers/soc/renesas/rcar-sysc.c b/drivers/soc/renesas/rcar-sysc.c
index 59b5e6b102722a6d..176de145f4230fbd 100644
--- a/drivers/soc/renesas/rcar-sysc.c
+++ b/drivers/soc/renesas/rcar-sysc.c
@@ -63,6 +63,7 @@ struct rcar_sysc_ch {
 
 static void __iomem *rcar_sysc_base;
 static DEFINE_SPINLOCK(rcar_sysc_lock); /* SMP CPUs + I/O devices */
+static u32 rcar_sysc_extmask_offs, rcar_sysc_extmask_val;
 
 static int rcar_sysc_pwr_on_off(const struct rcar_sysc_ch *sysc_ch, bool on)
 {
@@ -105,6 +106,14 @@ static int rcar_sysc_power(const struct rcar_sysc_ch *sysc_ch, bool on)
 
 	spin_lock_irqsave(&rcar_sysc_lock, flags);
 
+	/*
+	 * Mask external power requests for CPU or 3DG domains
+	 */
+	if (rcar_sysc_extmask_val) {
+		iowrite32(rcar_sysc_extmask_val,
+			  rcar_sysc_base + rcar_sysc_extmask_offs);
+	}
+
 	/*
 	 * The interrupt source needs to be enabled, but masked, to prevent the
 	 * CPU from receiving it.
@@ -148,6 +157,9 @@ static int rcar_sysc_power(const struct rcar_sysc_ch *sysc_ch, bool on)
 	iowrite32(isr_mask, rcar_sysc_base + SYSCISCR);
 
  out:
+	if (rcar_sysc_extmask_val)
+		iowrite32(0, rcar_sysc_base + rcar_sysc_extmask_offs);
+
 	spin_unlock_irqrestore(&rcar_sysc_lock, flags);
 
 	pr_debug("sysc power %s domain %d: %08x -> %d\n", on ? "on" : "off",
@@ -360,6 +372,10 @@ static int __init rcar_sysc_pd_init(void)
 
 	rcar_sysc_base = base;
 
+	/* Optional External Request Mask Register */
+	rcar_sysc_extmask_offs = info->extmask_offs;
+	rcar_sysc_extmask_val = info->extmask_val;
+
 	domains = kzalloc(sizeof(*domains), GFP_KERNEL);
 	if (!domains) {
 		error = -ENOMEM;
diff --git a/drivers/soc/renesas/rcar-sysc.h b/drivers/soc/renesas/rcar-sysc.h
index 485520a5b295c13e..21ee3ff8620bbafe 100644
--- a/drivers/soc/renesas/rcar-sysc.h
+++ b/drivers/soc/renesas/rcar-sysc.h
@@ -44,6 +44,9 @@ struct rcar_sysc_info {
 	int (*init)(void);	/* Optional */
 	const struct rcar_sysc_area *areas;
 	unsigned int num_areas;
+	/* Optional External Request Mask Register */
+	u32 extmask_offs;	/* SYSCEXTMASK register offset */
+	u32 extmask_val;	/* SYSCEXTMASK register mask value */
 };
 
 extern const struct rcar_sysc_info r8a7743_sysc_info;
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2 5/7] soc: renesas: r8a77970-sysc: Fix power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel
In-Reply-To: <20190828113618.6672-1-geert+renesas@glider.be>

Describe the location and contents of the SYSCEXTMASK register on R-Car
V3M, to prevent conflicts between internal and external power requests.

Based on a patch in the BSP by Dien Pham <dien.pham.ry@renesas.com>.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - No changes.
---
 drivers/soc/renesas/r8a77970-sysc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/soc/renesas/r8a77970-sysc.c b/drivers/soc/renesas/r8a77970-sysc.c
index 280c48b80f240424..b3033e3f0fd0b0ff 100644
--- a/drivers/soc/renesas/r8a77970-sysc.c
+++ b/drivers/soc/renesas/r8a77970-sysc.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2017 Cogent Embedded Inc.
  */
 
+#include <linux/bits.h>
 #include <linux/bug.h>
 #include <linux/kernel.h>
 
@@ -32,4 +33,6 @@ static const struct rcar_sysc_area r8a77970_areas[] __initconst = {
 const struct rcar_sysc_info r8a77970_sysc_info __initconst = {
 	.areas = r8a77970_areas,
 	.num_areas = ARRAY_SIZE(r8a77970_areas),
+	.extmask_offs = 0x1b0,
+	.extmask_val = BIT(0),
 };
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2 3/7] soc: renesas: r8a7796-sysc: Fix power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel
In-Reply-To: <20190828113618.6672-1-geert+renesas@glider.be>

Describe the location and contents of the SYSCEXTMASK register on R-Car
M3-W, to prevent conflicts between internal and external power requests.

This register does not exist on R-Car M3-W ES1.x.

Based on a patch in the BSP by Dien Pham <dien.pham.ry@renesas.com>.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - No changes.
---
 drivers/soc/renesas/r8a7796-sysc.c | 22 +++++++++++++++++++++-
 drivers/soc/renesas/rcar-sysc.h    |  2 +-
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/renesas/r8a7796-sysc.c b/drivers/soc/renesas/r8a7796-sysc.c
index 1b06f868b6e81c79..a4eaa8d1f5d0f49d 100644
--- a/drivers/soc/renesas/r8a7796-sysc.c
+++ b/drivers/soc/renesas/r8a7796-sysc.c
@@ -5,8 +5,10 @@
  * Copyright (C) 2016 Glider bvba
  */
 
+#include <linux/bits.h>
 #include <linux/bug.h>
 #include <linux/kernel.h>
+#include <linux/sys_soc.h>
 
 #include <dt-bindings/power/r8a7796-sysc.h>
 
@@ -39,7 +41,25 @@ static const struct rcar_sysc_area r8a7796_areas[] __initconst = {
 	{ "a3ir",	0x180, 0, R8A7796_PD_A3IR,	R8A7796_PD_ALWAYS_ON },
 };
 
-const struct rcar_sysc_info r8a7796_sysc_info __initconst = {
+
+/* Fixups for R-Car M3-W ES1.x revision */
+static const struct soc_device_attribute r8a7796es1[] __initconst = {
+	{ .soc_id = "r8a7796", .revision = "ES1.*" },
+	{ /* sentinel */ }
+};
+
+static int __init r8a7796_sysc_init(void)
+{
+	if (soc_device_match(r8a7796es1))
+		r8a7796_sysc_info.extmask_val = 0;
+
+	return 0;
+}
+
+struct rcar_sysc_info r8a7796_sysc_info __initdata = {
+	.init = r8a7796_sysc_init,
 	.areas = r8a7796_areas,
 	.num_areas = ARRAY_SIZE(r8a7796_areas),
+	.extmask_offs = 0x2f8,
+	.extmask_val = BIT(0),
 };
diff --git a/drivers/soc/renesas/rcar-sysc.h b/drivers/soc/renesas/rcar-sysc.h
index 499797a9e18c2f10..64c2a0fa7945d80b 100644
--- a/drivers/soc/renesas/rcar-sysc.h
+++ b/drivers/soc/renesas/rcar-sysc.h
@@ -60,7 +60,7 @@ extern const struct rcar_sysc_info r8a7791_sysc_info;
 extern const struct rcar_sysc_info r8a7792_sysc_info;
 extern const struct rcar_sysc_info r8a7794_sysc_info;
 extern struct rcar_sysc_info r8a7795_sysc_info;
-extern const struct rcar_sysc_info r8a7796_sysc_info;
+extern struct rcar_sysc_info r8a7796_sysc_info;
 extern const struct rcar_sysc_info r8a77965_sysc_info;
 extern const struct rcar_sysc_info r8a77970_sysc_info;
 extern const struct rcar_sysc_info r8a77980_sysc_info;
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2 0/7] soc: renesas: rcar-gen3-sysc: Fix power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel

	Hi all,

Recent R-Car Gen3 SoCs added an External Request Mask Register to the
System Controller (SYSC).  This register allows to mask external power
requests for CPU or 3DG domains, to prevent conflicts between powering
off CPU cores or the 3D Graphics Engine, and changing the state of
another power domain through SYSC, which could lead to CPG state machine
lock-ups.

This patch series starts making use of this register.  Note that the
register is optional, and that its location and contents are
SoC-specific.

This was inspired by a patch in the BSP by Dien Pham
<dien.pham.ry@renesas.com>.

Note that the issue fixed cannot happen in the upstream kernel, as
upstream has no support for graphics acceleration yet.  SoCs lacking the
External Request Mask Register may need a different mitigation in the
future.

Changes compared to v1[1]:
  - Improve description of cover letter and first patch.

Changes compared to RFC[2]:
  - Rebased.

This has been boot-tested on R-Car H3 ES1.0, H3 ES2.0, M3-W ES1.0, M3-N,
V3M, and E3 (only the last 3 have this register!), and regression-tested
on R-Car Gen2.

This has not been tested on R-Car H3 ES3.0, M3-W ES2.0, and V3H.

For your convenience, this series is available in the
topic/rcar3-sysc-extmask-v2 branch of my renesas-drivers git repository at
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git.

Thanks for your comments!

[1] Not posted, but included in yesterday's renesas-drivers-2019-08-27-v5.3-rc6
[2] "[RFC PATCH 0/7] soc: renesas: rcar-gen3-sysc: Fix power request conflicts"
    (https://lore.kernel.org/linux-renesas-soc/20181205155028.14335-1-geert+renesas@glider.be/)
 
Geert Uytterhoeven (7):
  soc: renesas: rcar-sysc: Prepare for fixing power request conflicts
  soc: renesas: r8a7795-sysc: Fix power request conflicts
  soc: renesas: r8a7796-sysc: Fix power request conflicts
  soc: renesas: r8a77965-sysc: Fix power request conflicts
  soc: renesas: r8a77970-sysc: Fix power request conflicts
  soc: renesas: r8a77980-sysc: Fix power request conflicts
  soc: renesas: r8a77990-sysc: Fix power request conflicts

 drivers/soc/renesas/r8a7795-sysc.c  | 32 ++++++++++++++++++++++++-----
 drivers/soc/renesas/r8a7796-sysc.c  | 22 +++++++++++++++++++-
 drivers/soc/renesas/r8a77965-sysc.c |  3 +++
 drivers/soc/renesas/r8a77970-sysc.c |  3 +++
 drivers/soc/renesas/r8a77980-sysc.c |  3 +++
 drivers/soc/renesas/r8a77990-sysc.c |  3 +++
 drivers/soc/renesas/rcar-sysc.c     | 16 +++++++++++++++
 drivers/soc/renesas/rcar-sysc.h     |  7 +++++--
 8 files changed, 81 insertions(+), 8 deletions(-)

-- 
2.17.1

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* [PATCH v2 4/7] soc: renesas: r8a77965-sysc: Fix power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel
In-Reply-To: <20190828113618.6672-1-geert+renesas@glider.be>

Describe the location and contents of the SYSCEXTMASK register on R-Car
M3-N, to prevent conflicts between internal and external power requests.

Based on a patch in the BSP by Dien Pham <dien.pham.ry@renesas.com>.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - No changes.
---
 drivers/soc/renesas/r8a77965-sysc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/soc/renesas/r8a77965-sysc.c b/drivers/soc/renesas/r8a77965-sysc.c
index e0533beb50fd7063..b5d8230d4bbc1b87 100644
--- a/drivers/soc/renesas/r8a77965-sysc.c
+++ b/drivers/soc/renesas/r8a77965-sysc.c
@@ -7,6 +7,7 @@
  * Copyright (C) 2016 Glider bvba
  */
 
+#include <linux/bits.h>
 #include <linux/bug.h>
 #include <linux/kernel.h>
 
@@ -33,4 +34,6 @@ static const struct rcar_sysc_area r8a77965_areas[] __initconst = {
 const struct rcar_sysc_info r8a77965_sysc_info __initconst = {
 	.areas = r8a77965_areas,
 	.num_areas = ARRAY_SIZE(r8a77965_areas),
+	.extmask_offs = 0x2f8,
+	.extmask_val = BIT(0),
 };
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2 7/7] soc: renesas: r8a77990-sysc: Fix power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel
In-Reply-To: <20190828113618.6672-1-geert+renesas@glider.be>

Describe the location and contents of the SYSCEXTMASK register on R-Car
E3, to prevent conflicts between internal and external power requests.

Based on a patch in the BSP by Dien Pham <dien.pham.ry@renesas.com>.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - No changes.
---
 drivers/soc/renesas/r8a77990-sysc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/soc/renesas/r8a77990-sysc.c b/drivers/soc/renesas/r8a77990-sysc.c
index 664b244eb1dd9d95..1d2432020b7a15d2 100644
--- a/drivers/soc/renesas/r8a77990-sysc.c
+++ b/drivers/soc/renesas/r8a77990-sysc.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2018 Renesas Electronics Corp.
  */
 
+#include <linux/bits.h>
 #include <linux/bug.h>
 #include <linux/kernel.h>
 #include <linux/sys_soc.h>
@@ -50,4 +51,6 @@ const struct rcar_sysc_info r8a77990_sysc_info __initconst = {
 	.init = r8a77990_sysc_init,
 	.areas = r8a77990_areas,
 	.num_areas = ARRAY_SIZE(r8a77990_areas),
+	.extmask_offs = 0x2f8,
+	.extmask_val = BIT(0),
 };
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* [PATCH v2 6/7] soc: renesas: r8a77980-sysc: Fix power request conflicts
From: Geert Uytterhoeven @ 2019-08-28 11:36 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linux-renesas-soc, Simon Horman, Geert Uytterhoeven,
	linux-arm-kernel
In-Reply-To: <20190828113618.6672-1-geert+renesas@glider.be>

Describe the location and contents of the SYSCEXTMASK register on R-Car
V3H, to prevent conflicts between internal and external power requests.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
---
v2:
  - No changes.
---
 drivers/soc/renesas/r8a77980-sysc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/soc/renesas/r8a77980-sysc.c b/drivers/soc/renesas/r8a77980-sysc.c
index a8dbe55e8ba82d7e..e3b5ee1b3091dee1 100644
--- a/drivers/soc/renesas/r8a77980-sysc.c
+++ b/drivers/soc/renesas/r8a77980-sysc.c
@@ -6,6 +6,7 @@
  * Copyright (C) 2018 Cogent Embedded, Inc.
  */
 
+#include <linux/bits.h>
 #include <linux/bug.h>
 #include <linux/kernel.h>
 
@@ -49,4 +50,6 @@ static const struct rcar_sysc_area r8a77980_areas[] __initconst = {
 const struct rcar_sysc_info r8a77980_sysc_info __initconst = {
 	.areas = r8a77980_areas,
 	.num_areas = ARRAY_SIZE(r8a77980_areas),
+	.extmask_offs = 0x138,
+	.extmask_val = BIT(0),
 };
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related

* Re: [PATCH v3 3/5] arm64: atomics: avoid out-of-line ll/sc atomics
From: Andrew Murray @ 2019-08-28 11:53 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Peter Zijlstra, Catalin Marinas, Boqun Feng, Will Deacon,
	Ard.Biesheuvel, linux-arm-kernel
In-Reply-To: <20190822170155.GE33080@lakrids.cambridge.arm.com>

On Thu, Aug 22, 2019 at 06:01:55PM +0100, Mark Rutland wrote:
> On Mon, Aug 12, 2019 at 03:36:23PM +0100, Andrew Murray wrote:
> > When building for LSE atomics (CONFIG_ARM64_LSE_ATOMICS), if the hardware
> > or toolchain doesn't support it the existing code will fallback to ll/sc
> > atomics. It achieves this by branching from inline assembly to a function
> > that is built with specical compile flags. Further this results in the
> > clobbering of registers even when the fallback isn't used increasing
> > register pressure.
> > 
> > Let's improve this by providing inline implementations of both LSE and
> > ll/sc and use a static key to select between them. This allows for the
> > compiler to generate better atomics code.
> > 
> > To improve icache performance for the LL/SC fallback atomics, we put them
> > in their own subsection.
> > 
> > Please note that as atomic_arch.h is included indirectly by kernel.h
> > (via bitops.h), we cannot depend on features provided later in the kernel.h
> > file. This prevents us from placing the system_uses_lse_atomics function
> > in cpu_feature.h due to its dependencies.
> > 
> > Signed-off-by: Andrew Murray <andrew.murray@arm.com>
> 
> [...]
> 
> > diff --git a/arch/arm64/include/asm/atomic_arch.h b/arch/arm64/include/asm/atomic_arch.h
> > new file mode 100644
> > index 000000000000..255a284321c6
> > --- /dev/null
> > +++ b/arch/arm64/include/asm/atomic_arch.h
> > @@ -0,0 +1,154 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Selection between LSE and LL/SC atomics.
> > + *
> > + * Copyright (C) 2018 ARM Ltd.
> > + * Author: Andrew Murray <andrew.murray@arm.com>
> > + */
> > +
> > +#ifndef __ASM_ATOMIC_ARCH_H
> > +#define __ASM_ATOMIC_ARCH_H
> > +
> > +#include <asm/atomic_lse.h>
> > +#include <asm/atomic_ll_sc.h>
> > +
> > +#include <linux/jump_label.h>
> > +#include <asm/cpucaps.h>
> 
> I'm guessing that we have to include the <asm/atomic_*> headers first
> due to the include dependencies. If that's the case, could we please
> have a comment here to that effect?
> 
> Minor nit, but could we also order those two alphabetically, please?
> 
> The general style is to have headers alphabetically, with (for reasons
> unknown) the <linux/*> headers before the <asm/*> headers.

There doesn't appear to be any valid reason for ordering the includes
in the way I have done, so I'll change it to the
linux-followed-by-asm-alphabetical way.

> 
> [...]
> 
> > +#if IS_ENABLED(CONFIG_ARM64_LSE_ATOMICS) && IS_ENABLED(CONFIG_AS_LSE)
> > +#define __LL_SC_FALLBACK(asm_ops)					\
> > +"	b	3f\n"							\
> > +"	.subsection	1\n"						\
> > +"3:\n"									\
> > +asm_ops "\n"								\
> > +"	b	4f\n"							\
> > +"	.previous\n"							\
> > +"4:\n"
> > +#else
> > +#define __LL_SC_FALLBACK(asm_ops) asm_ops
> >  #endif
> 
> Can we instead make the ll/sc functions with the cold attribute (wrapped
> by <linux/compiler.h> as __cold)?
> 
> IIUC that should have a similar effect, and might allow GCC to do better
> (e.g. merging compatible instances of the ll/sc code in the same cold
> subsection).
> 
> https://gcc.gnu.org/onlinedocs/gcc-9.2.0/gcc/Common-Function-Attributes.html#index-cold-function-attribute

Thanks for suggesting this, I hadn't considered this previously. However
the cold attribute only seems to have an effect when the function it is
applied to isn't inline. I think we want to keep both LSE and fallback
functions as inline.

At present the fallback functions are inlined, due to the use of 'unlikely'
they end up at the end of the function they are called from. Within this
inline'd function we take unconditional branches to/from the actual LL/SC
implementation, as we are using subsections the LL/SC implementations are
grouped together at the end of the each compilation unit. As we are using
unconditional branches, each call to an atomics function results in the LL/SC
implementation being duplicated (just like any other inline function, except
the code is elsewhere). We get some locality benefits from the use of
subsection but that is per-compilation unit (so you'll get clusters of them
across the vmlinux).

This approach gives us bloat, we can mitigate this by not using inline
functions, and further by optimising for size. We can put all the fallback
atomics in a single section to benefit from vmlinux-wide code locality -
however to benefit from all this we must use functions calls and their
associated overhead.

Thanks,

Andrew Murray

> 
> Otherwise, this is looking much nicer!
> 
> Thanks,
> Mark.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support
From: Neil Armstrong @ 2019-08-28 11:51 UTC (permalink / raw)
  To: Robin Murphy, Tomeu Vizoso
  Cc: Mark Rutland,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux IOMMU, David Airlie, Will Deacon, open list, dri-devel,
	Steven Price, Maxime Ripard, Chen-Yu Tsai, Rob Herring,
	Clément Péron,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
In-Reply-To: <8433455c-3b74-c176-49a1-388b3f085e9e@arm.com>

On 28/08/2019 13:49, Robin Murphy wrote:
> Hi Neil,
> 
> On 28/08/2019 12:28, Neil Armstrong wrote:
>> Hi Robin,
>>

[...]
>>>
>>> OK - with the 32-bit hack pointed to up-thread, a quick kmscube test gave me the impression that T720 works fine, but on closer inspection some parts of glmark2 do seem to go a bit wonky (although I suspect at least some of it is just down to the FPGA setup being both very slow and lacking in memory bandwidth), and the "nv12-1img" mode of kmscube turns out to render in some delightfully wrong colours.
>>>
>>> I'll try to get a 'proper' version of the io-pgtable patch posted soon.
>>
>> I'm trying to collect all the fixes needed to make T820 work again, and
>> I was wondering if you finally have a proper patch for this and "cfg->ias > 48"
>> hack ? Or one I can test ?
> 
> I do have a handful of io-pgtable patches written up and ready to go, I'm just treading carefully and waiting for the internal approval box to be ticked before I share anything :(

Great !

No problem, it can totally wait until approval,

Thanks,
Neil

> 
> Robin.
> 
>>
>> Thanks,
>> Neil
>>
>>>
>>> Thanks,
>>> Robin.
>>>
>>>>
>>>> Cheers,
>>>>
>>>> Tomeu
>>>>
>>>>> Robin.
>>>>>
>>>>>
>>>>> ----->8-----
>>>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>>>> index 546968d8a349..f29da6e8dc08 100644
>>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>>> @@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
>>>>> io_pgtable_cfg *cfg, void *cookie)
>>>>>           iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
>>>>>           if (iop) {
>>>>>                   u64 mair, ttbr;
>>>>> +               struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(&iop->ops);
>>>>>
>>>>> +               data->levels = 4;
>>>>>                   /* Copy values as union fields overlap */
>>>>>                   mair = cfg->arm_lpae_s1_cfg.mair[0];
>>>>>                   ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support
From: Robin Murphy @ 2019-08-28 11:49 UTC (permalink / raw)
  To: Neil Armstrong, Tomeu Vizoso
  Cc: Mark Rutland,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux IOMMU, David Airlie, Will Deacon, open list, dri-devel,
	Steven Price, Maxime Ripard, Chen-Yu Tsai, Rob Herring,
	Clément Péron,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
In-Reply-To: <92e9b697-ea0d-9b13-5512-b0a16a39df20@baylibre.com>

Hi Neil,

On 28/08/2019 12:28, Neil Armstrong wrote:
> Hi Robin,
> 
> On 31/05/2019 15:47, Robin Murphy wrote:
>> On 31/05/2019 13:04, Tomeu Vizoso wrote:
>>> On Wed, 29 May 2019 at 19:38, Robin Murphy <robin.murphy@arm.com> wrote:
>>>>
>>>> On 29/05/2019 16:09, Tomeu Vizoso wrote:
>>>>> On Tue, 21 May 2019 at 18:11, Clément Péron <peron.clem@gmail.com> wrote:
>>>>>>
>>>>> [snip]
>>>>>> [  345.204813] panfrost 1800000.gpu: mmu irq status=1
>>>>>> [  345.209617] panfrost 1800000.gpu: Unhandled Page fault in AS0 at VA
>>>>>> 0x0000000002400400
>>>>>
>>>>>    From what I can see here, 0x0000000002400400 points to the first byte
>>>>> of the first submitted job descriptor.
>>>>>
>>>>> So mapping buffers for the GPU doesn't seem to be working at all on
>>>>> 64-bit T-760.
>>>>>
>>>>> Steven, Robin, do you have any idea of why this could be?
>>>>
>>>> I tried rolling back to the old panfrost/nondrm shim, and it works fine
>>>> with kbase, and I also found that T-820 falls over in the exact same
>>>> manner, so the fact that it seemed to be common to the smaller 33-bit
>>>> designs rather than anything to do with the other
>>>> job_descriptor_size/v4/v5 complication turned out to be telling.
>>>
>>> Is this complication something you can explain? I don't know what v4
>>> and v5 are meant here.
>>
>> I was alluding to BASE_HW_FEATURE_V4, which I believe refers to the Midgard architecture version - the older versions implemented by T6xx and T720 seem to be collectively treated as "v4", while T760 and T8xx would effectively be "v5".
>>
>>>> [ as an aside, are 64-bit jobs actually known not to work on v4 GPUs, or
>>>> is it just that nobody's yet observed a 64-bit blob driving one? ]
>>>
>>> I'm looking right now at getting Panfrost working on T720 with 64-bit
>>> descriptors, with the ultimate goal of making Panfrost
>>> 64-bit-descriptor only so we can have a single build of Mesa in
>>> distros.
>>
>> Cool, I'll keep an eye out, and hope that it might be enough for T620 on Juno, too :)
>>
>>>> Long story short, it appears that 'Mali LPAE' is also lacking the start
>>>> level notion of VMSA, and expects a full 4-level table even for <40 bits
>>>> when level 0 effectively redundant. Thus walking the 3-level table that
>>>> io-pgtable comes back with ends up going wildly wrong. The hack below
>>>> seems to do the job for me; if Clément can confirm (on T-720 you'll
>>>> still need the userspace hack to force 32-bit jobs as well) then I think
>>>> I'll cook up a proper refactoring of the allocator to put things right.
>>>
>>> Mmaps seem to work with this patch, thanks.
>>>
>>> The main complication I'm facing right now seems to be that the SFBD
>>> descriptor on T720 seems to be different from the one we already had
>>> (tested on T6xx?).
>>
>> OK - with the 32-bit hack pointed to up-thread, a quick kmscube test gave me the impression that T720 works fine, but on closer inspection some parts of glmark2 do seem to go a bit wonky (although I suspect at least some of it is just down to the FPGA setup being both very slow and lacking in memory bandwidth), and the "nv12-1img" mode of kmscube turns out to render in some delightfully wrong colours.
>>
>> I'll try to get a 'proper' version of the io-pgtable patch posted soon.
> 
> I'm trying to collect all the fixes needed to make T820 work again, and
> I was wondering if you finally have a proper patch for this and "cfg->ias > 48"
> hack ? Or one I can test ?

I do have a handful of io-pgtable patches written up and ready to go, 
I'm just treading carefully and waiting for the internal approval box to 
be ticked before I share anything :(

Robin.

> 
> Thanks,
> Neil
> 
>>
>> Thanks,
>> Robin.
>>
>>>
>>> Cheers,
>>>
>>> Tomeu
>>>
>>>> Robin.
>>>>
>>>>
>>>> ----->8-----
>>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>>> index 546968d8a349..f29da6e8dc08 100644
>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>> @@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
>>>> io_pgtable_cfg *cfg, void *cookie)
>>>>           iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
>>>>           if (iop) {
>>>>                   u64 mair, ttbr;
>>>> +               struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(&iop->ops);
>>>>
>>>> +               data->levels = 4;
>>>>                   /* Copy values as union fields overlap */
>>>>                   mair = cfg->arm_lpae_s1_cfg.mair[0];
>>>>                   ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];
>>>>
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* RE: [PATCH V8 2/3] Documentation: admin-guide: perf: add i.MX8 ddr pmu user doc
From: Joakim Zhang @ 2019-08-28 11:32 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland@arm.com, Frank Li, robin.murphy@arm.com,
	dl-linux-imx, linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190828100440.melcuvu6hz5mrm3h@willie-the-truck>


> -----Original Message-----
> From: Will Deacon <will@kernel.org>
> Sent: 2019年8月28日 18:05
> To: Joakim Zhang <qiangqing.zhang@nxp.com>
> Cc: mark.rutland@arm.com; robin.murphy@arm.com; Frank Li
> <frank.li@nxp.com>; dl-linux-imx <linux-imx@nxp.com>;
> linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH V8 2/3] Documentation: admin-guide: perf: add i.MX8 ddr
> pmu user doc
> 
> On Wed, Aug 28, 2019 at 03:05:39AM +0000, Joakim Zhang wrote:
> > Add i.MX8 ddr pmu user doc.
> >
> > ChangeLog:
> > V1 -> V4:
> > 	* new add in V4.
> > V4 -> V5:
> > 	* no change.
> > V5 -> V6:
> > 	* change the event name
> > V6 -> V7:
> > 	* no change.
> > V7 -> V8:
> > 	* improve the doc, add more details.
> >
> > Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> > ---
> >  Documentation/admin-guide/perf/imx-ddr.rst | 51
> > ++++++++++++++++++++++
> >  1 file changed, 51 insertions(+)
> >  create mode 100644 Documentation/admin-guide/perf/imx-ddr.rst
> >
> > diff --git a/Documentation/admin-guide/perf/imx-ddr.rst
> > b/Documentation/admin-guide/perf/imx-ddr.rst
> > new file mode 100644
> > index 000000000000..438de3be667b
> > --- /dev/null
> > +++ b/Documentation/admin-guide/perf/imx-ddr.rst
> > @@ -0,0 +1,51 @@
> > +=====================================================
> > +Freescale i.MX8 DDR Performance Monitoring Unit (PMU)
> > +=====================================================
> > +
> > +There are no performance counters inside the DRAM controller, so
> > +performance signals are brought out to the edge of the controller
> > +where a set of 4 x 32 bit counters is implemented. This is controlled
> > +by the Performance log on parameter
>
> I don't understand what you mean by "Performance log on parameter".

[Joakim] From IC RM, it means Performance log on parameter field(event id)on counters control register. I will improve it.

> > +which causes a large number of PERF signals to be generated.
> > +
> > +Selection of the value for each counter is done via the config
> > +registiers. There
> 
> registers

[Joakim] Will Fix it.

> > +is one register for each counter. Counter 0 is special in that it
> > +always counts “time” and when expired causes a lock on itself and the
> > +other counters and an interrupt ie enable of counter 0 is a global function.
> 
> by "causes a lock on itself and the other counters" do you mean that Counter
> 0 counts down and, when it hits 0, all the counters stop counting?

[Joakim] Yes, as it commented in ddr_perf_irq_handler():
/*
 * When the cycle counter overflows, all counters are stopped,
 * and an IRQ is raised. If any other counter overflows, it
 * continues counting, and no IRQ is raised.
 *
 * Cycles occur at least 4 times as often as other events, so we
 * can update all events on a cycle counter overflow and not
 * lose events.
 */

Best Regards,
Joakim Zhang
> Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v6 0/6] Allwinner H6 Mali GPU support
From: Neil Armstrong @ 2019-08-28 11:28 UTC (permalink / raw)
  To: Robin Murphy, Tomeu Vizoso
  Cc: Mark Rutland,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Linux IOMMU, David Airlie, Will Deacon, open list, dri-devel,
	Steven Price, Maxime Ripard, Chen-Yu Tsai, Rob Herring,
	Clément Péron,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
In-Reply-To: <dc3872a4-8cd9-462b-9f73-0d69a810d985@arm.com>

Hi Robyn,

On 31/05/2019 15:47, Robin Murphy wrote:
> On 31/05/2019 13:04, Tomeu Vizoso wrote:
>> On Wed, 29 May 2019 at 19:38, Robin Murphy <robin.murphy@arm.com> wrote:
>>>
>>> On 29/05/2019 16:09, Tomeu Vizoso wrote:
>>>> On Tue, 21 May 2019 at 18:11, Clément Péron <peron.clem@gmail.com> wrote:
>>>>>
>>>> [snip]
>>>>> [  345.204813] panfrost 1800000.gpu: mmu irq status=1
>>>>> [  345.209617] panfrost 1800000.gpu: Unhandled Page fault in AS0 at VA
>>>>> 0x0000000002400400
>>>>
>>>>   From what I can see here, 0x0000000002400400 points to the first byte
>>>> of the first submitted job descriptor.
>>>>
>>>> So mapping buffers for the GPU doesn't seem to be working at all on
>>>> 64-bit T-760.
>>>>
>>>> Steven, Robin, do you have any idea of why this could be?
>>>
>>> I tried rolling back to the old panfrost/nondrm shim, and it works fine
>>> with kbase, and I also found that T-820 falls over in the exact same
>>> manner, so the fact that it seemed to be common to the smaller 33-bit
>>> designs rather than anything to do with the other
>>> job_descriptor_size/v4/v5 complication turned out to be telling.
>>
>> Is this complication something you can explain? I don't know what v4
>> and v5 are meant here.
> 
> I was alluding to BASE_HW_FEATURE_V4, which I believe refers to the Midgard architecture version - the older versions implemented by T6xx and T720 seem to be collectively treated as "v4", while T760 and T8xx would effectively be "v5".
> 
>>> [ as an aside, are 64-bit jobs actually known not to work on v4 GPUs, or
>>> is it just that nobody's yet observed a 64-bit blob driving one? ]
>>
>> I'm looking right now at getting Panfrost working on T720 with 64-bit
>> descriptors, with the ultimate goal of making Panfrost
>> 64-bit-descriptor only so we can have a single build of Mesa in
>> distros.
> 
> Cool, I'll keep an eye out, and hope that it might be enough for T620 on Juno, too :)
> 
>>> Long story short, it appears that 'Mali LPAE' is also lacking the start
>>> level notion of VMSA, and expects a full 4-level table even for <40 bits
>>> when level 0 effectively redundant. Thus walking the 3-level table that
>>> io-pgtable comes back with ends up going wildly wrong. The hack below
>>> seems to do the job for me; if Clément can confirm (on T-720 you'll
>>> still need the userspace hack to force 32-bit jobs as well) then I think
>>> I'll cook up a proper refactoring of the allocator to put things right.
>>
>> Mmaps seem to work with this patch, thanks.
>>
>> The main complication I'm facing right now seems to be that the SFBD
>> descriptor on T720 seems to be different from the one we already had
>> (tested on T6xx?).
> 
> OK - with the 32-bit hack pointed to up-thread, a quick kmscube test gave me the impression that T720 works fine, but on closer inspection some parts of glmark2 do seem to go a bit wonky (although I suspect at least some of it is just down to the FPGA setup being both very slow and lacking in memory bandwidth), and the "nv12-1img" mode of kmscube turns out to render in some delightfully wrong colours.
> 
> I'll try to get a 'proper' version of the io-pgtable patch posted soon.

I'm trying to collect all the fixes needed to make T820 work again, and
I was wondering if you finally have a proper patch for this and "cfg->ias > 48"
hack ? Or one I can test ?

Thanks,
Neil

> 
> Thanks,
> Robin.
> 
>>
>> Cheers,
>>
>> Tomeu
>>
>>> Robin.
>>>
>>>
>>> ----->8-----
>>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>>> index 546968d8a349..f29da6e8dc08 100644
>>> --- a/drivers/iommu/io-pgtable-arm.c
>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>> @@ -1023,12 +1023,14 @@ arm_mali_lpae_alloc_pgtable(struct
>>> io_pgtable_cfg *cfg, void *cookie)
>>>          iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
>>>          if (iop) {
>>>                  u64 mair, ttbr;
>>> +               struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(&iop->ops);
>>>
>>> +               data->levels = 4;
>>>                  /* Copy values as union fields overlap */
>>>                  mair = cfg->arm_lpae_s1_cfg.mair[0];
>>>                  ttbr = cfg->arm_lpae_s1_cfg.ttbr[0];
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* RE: [PATCH V8 1/3] perf: imx8_ddr_perf: add AXI ID filter support
From: Joakim Zhang @ 2019-08-28 11:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland@arm.com, Frank Li, robin.murphy@arm.com,
	dl-linux-imx, linux-arm-kernel@lists.infradead.org
In-Reply-To: <20190828095135.gsyqwor7tea2radn@willie-the-truck>


> -----Original Message-----
> From: Will Deacon <will@kernel.org>
> Sent: 2019年8月28日 17:52
> To: Joakim Zhang <qiangqing.zhang@nxp.com>
> Cc: mark.rutland@arm.com; robin.murphy@arm.com; Frank Li
> <frank.li@nxp.com>; dl-linux-imx <linux-imx@nxp.com>;
> linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH V8 1/3] perf: imx8_ddr_perf: add AXI ID filter support
> 
> On Wed, Aug 28, 2019 at 03:05:36AM +0000, Joakim Zhang wrote:
> > AXI filtering is used by CSV modes 0x41 and 0x42 to count reads or
> > writes with an ARID or AWID matching filter setting. Granularity is at
> > subsystem level. Implementation does not allow filtring between
> > masters within a subsystem. Filter is defined with 2 configuration
> parameters.
> >
> > --AXI_ID defines AxID matching value
> > --AXI_MASKING defines which bits of AxID are meaningful for the matching
> > 	0:corresponding bit is masked
> > 	1: corresponding bit is not masked, i.e. used to do the matching
> >
> > When non-masked bits are matching corresponding AXI_ID bits then
> > counter is incremented. This filter allows counting read or write
> > access from a subsystem or multiple subsystems.
> >
> > Perf counter is incremented if AxID && AXI_MASKING == AXI_ID &&
> > AXI_MASKING
> >
> > AXI_ID and AXI_MASKING are mapped on DPCR1 register in performance
> counter.
> >
> > Read and write AXI ID filter should write same value to DPCR1 if want
> > to specify at the same time as this filter is shared between counters.
> >
> > e.g.
> > perf stat -a -e imx8_ddr0/axid-read,axi_mask=0xMMMM,axi_id=0xDDDD/
> cmd
> > perf stat -a -e imx8_ddr0/axid-write,axi_mask=0xMMMM,axi_id=0xDDDD/
> > cmd
> >
> > NOTE: axi_mask is inverted in userspace(i.e. set bits are bits to
> > mask), and it will be reverted in driver automatically. so that the
> > user can just specify axi_id to monitor a specific id, rather than having to
> specify axi_mask.
> > e.g.
> > perf stat -a -e imx8_ddr0/axid-read,axi_id=0x12/ cmd, which will
> > monitor ARID=0x12
> >
> > ChangeLog:
> > V1 -> V2:
> > 	* add error log if user specifies read/write AXI ID filter at
> > 	the same time.
> > 	* of_device_get_match_data() instead of of_match_device(), and
> > 	remove the check of return value.
> > V2 -> V3:
> > 	* move the AXI ID check to event_add().
> > 	* add support for same value of axi_id.
> > V3 -> V4:
> > 	* move the AXI ID check to event_init().
> > V4 -> V5:
> > 	* reject event group if AXI ID not consistent in event_init().
> > V5 -> V6:
> > 	* change the event name: axi-id-read->axid-read;
> > 	axi-id-write->axid-write
> > 	* add another helper: ddr_perf_filters_compatible()
> > 	* drop the dev_dbg()
> > V6 -> V7:
> > 	* revert AXI_MASKING at driver.
> > V7 -> V8:
> > 	* separate axi_id to axi_mask and axi_id these two fileds.
> 
> Nit: please don't include the ChangeLog in the commit message (we don't want
> to include this in the linux history). Instead stick it after the cut.

[Joakim] Got it. Thanks.

> > Signed-off-by: Joakim Zhang <qiangqing.zhang@nxp.com>
> > ---
> >  drivers/perf/fsl_imx8_ddr_perf.c | 72
> > +++++++++++++++++++++++++++++++-
> >  1 file changed, 70 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/perf/fsl_imx8_ddr_perf.c
> > b/drivers/perf/fsl_imx8_ddr_perf.c
> > index 0e3310dbb145..e9bf956f434d 100644
> > --- a/drivers/perf/fsl_imx8_ddr_perf.c
> > +++ b/drivers/perf/fsl_imx8_ddr_perf.c
> > @@ -35,6 +35,8 @@
> >  #define EVENT_CYCLES_COUNTER	0
> >  #define NUM_COUNTERS		4
> >
> > +#define AXI_MASKING_REVERT	0xffff0000	/* AXI_MASKING(MSB
> 16bits) + AXI_ID(LSB 16bits) */
> > +
> >  #define to_ddr_pmu(p)		container_of(p, struct ddr_pmu, pmu)
> >
> >  #define DDR_PERF_DEV_NAME	"imx8_ddr"
> > @@ -42,9 +44,22 @@
> >
> >  static DEFINE_IDA(ddr_ida);
> >
> > +/* DDR Perf hardware feature */
> > +#define DDR_CAP_AXI_ID_FILTER          0x1     /* support AXI ID
> filter */
> > +
> > +struct fsl_ddr_devtype_data {
> > +	unsigned int quirks;    /* quirks needed for different DDR Perf core */
> > +};
> > +
> > +static const struct fsl_ddr_devtype_data imx8_devtype_data;
> > +
> > +static const struct fsl_ddr_devtype_data imx8m_devtype_data = {
> > +	.quirks = DDR_CAP_AXI_ID_FILTER,
> > +};
> > +
> >  static const struct of_device_id imx_ddr_pmu_dt_ids[] = {
> > -	{ .compatible = "fsl,imx8-ddr-pmu",},
> > -	{ .compatible = "fsl,imx8m-ddr-pmu",},
> > +	{ .compatible = "fsl,imx8-ddr-pmu", .data = &imx8_devtype_data},
> > +	{ .compatible = "fsl,imx8m-ddr-pmu", .data = &imx8m_devtype_data},
> >  	{ /* sentinel */ }
> >  };
> >  MODULE_DEVICE_TABLE(of, imx_ddr_pmu_dt_ids); @@ -58,6 +73,7 @@
> struct
> > ddr_pmu {
> >  	struct perf_event *events[NUM_COUNTERS];
> >  	int active_events;
> >  	enum cpuhp_state cpuhp_state;
> > +	const struct fsl_ddr_devtype_data *devtype_data;
> >  	int irq;
> >  	int id;
> >  };
> > @@ -129,6 +145,8 @@ static struct attribute *ddr_perf_events_attrs[] = {
> >  	IMX8_DDR_PMU_EVENT_ATTR(refresh, 0x37),
> >  	IMX8_DDR_PMU_EVENT_ATTR(write, 0x38),
> >  	IMX8_DDR_PMU_EVENT_ATTR(raw-hazard, 0x39),
> > +	IMX8_DDR_PMU_EVENT_ATTR(axid-read, 0x41),
> > +	IMX8_DDR_PMU_EVENT_ATTR(axid-write, 0x42),
> >  	NULL,
> >  };
> >
> > @@ -138,9 +156,13 @@ static struct attribute_group
> > ddr_perf_events_attr_group = {  };
> >
> >  PMU_FORMAT_ATTR(event, "config:0-7");
> > +PMU_FORMAT_ATTR(axi_id, "config1:0-15");
> PMU_FORMAT_ATTR(axi_mask,
> > +"config1:16-31");
> >
> >  static struct attribute *ddr_perf_format_attrs[] = {
> >  	&format_attr_event.attr,
> > +	&format_attr_axi_id.attr,
> > +	&format_attr_axi_mask.attr,
> >  	NULL,
> >  };
> >
> > @@ -190,6 +212,26 @@ static u32 ddr_perf_read_counter(struct ddr_pmu
> *pmu, int counter)
> >  	return readl_relaxed(pmu->base + COUNTER_READ + counter * 4);  }
> >
> > +static bool ddr_perf_is_filtered(struct perf_event *event) {
> > +	return event->attr.config == 0x41 || event->attr.config == 0x42; }
> > +
> > +static u32 ddr_perf_filter_val(struct perf_event *event) {
> > +	return event->attr.config1;
> > +}
> > +
> > +static bool ddr_perf_filters_compatible(struct perf_event *a,
> > +					struct perf_event *b)
> > +{
> > +	if (!ddr_perf_is_filtered(a))
> > +		return true;
> > +	if (!ddr_perf_is_filtered(b))
> > +		return true;
> > +	return ddr_perf_filter_val(a) == ddr_perf_filter_val(b); }
> > +
> >  static int ddr_perf_event_init(struct perf_event *event)  {
> >  	struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); @@ -216,6 +258,15
> @@
> > static int ddr_perf_event_init(struct perf_event *event)
> >  			!is_software_event(event->group_leader))
> >  		return -EINVAL;
> >
> > +	if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
> > +		if (!ddr_perf_filters_compatible(event, event->group_leader))
> > +			return -EINVAL;
> > +		for_each_sibling_event(sibling, event->group_leader) {
> > +			if (!ddr_perf_filters_compatible(event, sibling))
> > +				return -EINVAL;
> > +		}
> > +	}
> > +
> >  	for_each_sibling_event(sibling, event->group_leader) {
> >  		if (sibling->pmu != event->pmu &&
> >  				!is_software_event(sibling))
> > @@ -288,6 +339,21 @@ static int ddr_perf_event_add(struct perf_event
> *event, int flags)
> >  	struct hw_perf_event *hwc = &event->hw;
> >  	int counter;
> >  	int cfg = event->attr.config;
> > +	int cfg1 = event->attr.config1;
> > +
> > +	if (pmu->devtype_data->quirks & DDR_CAP_AXI_ID_FILTER) {
> > +		int i;
> > +
> > +		for (i = 1; i < NUM_COUNTERS; i++) {
> > +			if (pmu->events[i] &&
> > +			    !ddr_perf_filters_compatible(event, pmu->events[i]))
> > +				return -EINVAL;
> > +		}
> > +
> > +		/* revert axi id masking(axi_mask) value */
> > +		cfg1 ^= AXI_MASKING_REVERT;
> > +		writel(cfg1, pmu->base + COUNTER_DPCR1);
> 
> I was about to queue this up, but then I wondered about the following
> situation:
> 
> 	- I add a filtered event to a group (say 0x41) with a filter of 0
> 	  (which the driver inverts to 0xffff)
> 
> 	- I then add a non-filtered event (say 0x40) but config1 is set
> 	  to 0xffff0000
> 
> Won't the logic above result in us corrupting DPCR1 and clobbering the filter for
> the first event? What am I missing?
> 
> I think you should only program DPCR1 for filtered events.

[Joakim] Yes, you are right! I will do below change.

               if (ddr_perf_is_filtered(event)) {
                       /* revert axi id masking(axi_mask) value */
                       cfg1 ^= AXI_MASKING_REVERT;
                       writel(cfg1, pmu->base + COUNTER_DPCR1);
               }

Best Regards,
Joakim Zhang
> Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v3 01/10] KVM: arm64: Document PV-time interface
From: Steven Price @ 2019-08-28 11:23 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: kvm, linux-doc, Marc Zyngier, linux-kernel, Russell King,
	Catalin Marinas, Paolo Bonzini, Will Deacon, kvmarm,
	linux-arm-kernel
In-Reply-To: <20190827084407.GA6541@e113682-lin.lund.arm.com>

On 27/08/2019 09:44, Christoffer Dall wrote:
> On Wed, Aug 21, 2019 at 04:36:47PM +0100, Steven Price wrote:
>> Introduce a paravirtualization interface for KVM/arm64 based on the
>> "Arm Paravirtualized Time for Arm-Base Systems" specification DEN 0057A.
>>
>> This only adds the details about "Stolen Time" as the details of "Live
>> Physical Time" have not been fully agreed.
>>
>> User space can specify a reserved area of memory for the guest and
>> inform KVM to populate the memory with information on time that the host
>> kernel has stolen from the guest.
>>
>> A hypercall interface is provided for the guest to interrogate the
>> hypervisor's support for this interface and the location of the shared
>> memory structures.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>  Documentation/virt/kvm/arm/pvtime.txt | 100 ++++++++++++++++++++++++++
>>  1 file changed, 100 insertions(+)
>>  create mode 100644 Documentation/virt/kvm/arm/pvtime.txt
>>
>> diff --git a/Documentation/virt/kvm/arm/pvtime.txt b/Documentation/virt/kvm/arm/pvtime.txt
>> new file mode 100644
>> index 000000000000..1ceb118694e7
>> --- /dev/null
>> +++ b/Documentation/virt/kvm/arm/pvtime.txt
>> @@ -0,0 +1,100 @@
>> +Paravirtualized time support for arm64
>> +======================================
>> +
>> +Arm specification DEN0057/A defined a standard for paravirtualised time
>> +support for AArch64 guests:
>> +
>> +https://developer.arm.com/docs/den0057/a
>> +
>> +KVM/arm64 implements the stolen time part of this specification by providing
>> +some hypervisor service calls to support a paravirtualized guest obtaining a
>> +view of the amount of time stolen from its execution.
>> +
>> +Two new SMCCC compatible hypercalls are defined:
>> +
>> +PV_FEATURES 0xC5000020
>> +PV_TIME_ST  0xC5000022
>> +
>> +These are only available in the SMC64/HVC64 calling convention as
>> +paravirtualized time is not available to 32 bit Arm guests. The existence of
>> +the PV_FEATURES hypercall should be probed using the SMCCC 1.1 ARCH_FEATURES
>> +mechanism before calling it.
>> +
>> +PV_FEATURES
>> +    Function ID:  (uint32)  : 0xC5000020
>> +    PV_func_id:   (uint32)  : Either PV_TIME_LPT or PV_TIME_ST
>> +    Return value: (int32)   : NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
>> +                              PV-time feature is supported by the hypervisor.
>> +
>> +PV_TIME_ST
>> +    Function ID:  (uint32)  : 0xC5000022
>> +    Return value: (int64)   : IPA of the stolen time data structure for this
>> +                              (V)CPU. On failure:
>> +                              NOT_SUPPORTED (-1)
>> +
>> +The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
>> +with inner and outer write back caching attributes, in the inner shareable
>> +domain. A total of 16 bytes from the IPA returned are guaranteed to be
>> +meaningfully filled by the hypervisor (see structure below).
>> +
>> +PV_TIME_ST returns the structure for the calling VCPU.
>> +
>> +Stolen Time
>> +-----------
>> +
>> +The structure pointed to by the PV_TIME_ST hypercall is as follows:
>> +
>> +  Field       | Byte Length | Byte Offset | Description
>> +  ----------- | ----------- | ----------- | --------------------------
>> +  Revision    |      4      |      0      | Must be 0 for version 0.1
>> +  Attributes  |      4      |      4      | Must be 0
>> +  Stolen time |      8      |      8      | Stolen time in unsigned
>> +              |             |             | nanoseconds indicating how
>> +              |             |             | much time this VCPU thread
>> +              |             |             | was involuntarily not
>> +              |             |             | running on a physical CPU.
>> +
>> +The structure will be updated by the hypervisor prior to scheduling a VCPU. It
>> +will be present within a reserved region of the normal memory given to the
>> +guest. The guest should not attempt to write into this memory. There is a
>> +structure per VCPU of the guest.
>> +
>> +User space interface
>> +====================
>> +
>> +User space can request that KVM provide the paravirtualized time interface to
>> +a guest by creating a KVM_DEV_TYPE_ARM_PV_TIME device, for example:
>> +
> 
> I feel it would be more consistent to have the details of this in
> Documentation/virt/kvm/devices/arm-pv-time.txt and refer to this
> document from here.

Fair point - I'll move this lower part of the document and add a reference.

Thanks,

Steve

>> +    struct kvm_create_device pvtime_device = {
>> +            .type = KVM_DEV_TYPE_ARM_PV_TIME,
>> +            .attr = 0,
>> +            .flags = 0,
>> +    };
>> +
>> +    pvtime_fd = ioctl(vm_fd, KVM_CREATE_DEVICE, &pvtime_device);
>> +
>> +Creation of the device should be done after creating the vCPUs of the virtual
>> +machine.
>> +
>> +The IPA of the structures must be given to KVM. This is the base address
>> +of an array of stolen time structures (one for each VCPU). The base address
>> +must be page aligned. The size must be at least 64 * number of VCPUs and be a
>> +multiple of PAGE_SIZE.
>> +
>> +The memory for these structures should be added to the guest in the usual
>> +manner (e.g. using KVM_SET_USER_MEMORY_REGION).
>> +
>> +For example:
>> +
>> +    struct kvm_dev_arm_st_region region = {
>> +            .gpa = <IPA of guest base address>,
>> +            .size = <size in bytes>
>> +    };
>> +
>> +    struct kvm_device_attr st_base = {
>> +            .group = KVM_DEV_ARM_PV_TIME_PADDR,
>> +            .attr = KVM_DEV_ARM_PV_TIME_ST,
>> +            .addr = (u64)&region
>> +    };
>> +
>> +    ioctl(pvtime_fd, KVM_SET_DEVICE_ATTR, &st_base);
>> -- 
>> 2.20.1
>>
> 
> Thanks,
> 
>     Christoffer
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] arm64: kpti: ensure patched kernel text is fetched from PoU
From: James Morse @ 2019-08-28 11:12 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Catalin Marinas, Will Deacon, linux-arm-kernel
In-Reply-To: <20190827171257.36023-1-mark.rutland@arm.com>

Hi Mark,

On 27/08/2019 18:12, Mark Rutland wrote:
> While the MMUs is disabled, I-cache speculation can result in

(Nit: MMU)

> instructions being fetched from the PoC. During boot we may patch
> instructions (e.g. for alternatives and jump labels), and these may be
> dirty at the PoU (and stale at the PoC).
> 
> Thus, while the MMU is disabled in the KPTI pagetable fixup code we may
> load stale instructions into the I-cache, potentially leading to
> subsequent crashes when executing regions of code which have been
> modified at runtime.
> 
> Similarly to commit:
> 
>   8ec41987436d566f ("arm64: mm: ensure patched kernel text is fetched from PoU")
> 
> ... we can invalidate the I-cache after enabling the MMU to prevent such
> issues.

> The KPTI pagetable fixup code itself should be clean to the PoC per the
> boot protocol, so no maintenance is required for this code.

Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] arm64: fix fixmap copy for 16K pages and 48-bit VA
From: Anshuman Khandual @ 2019-08-28 11:00 UTC (permalink / raw)
  To: Mark Rutland, linux-arm-kernel
  Cc: Catalin Marinas, Steve Capper, Marc Zyngier, Will Deacon,
	Ard Biesheuvel
In-Reply-To: <20190827155708.34699-1-mark.rutland@arm.com>



On 08/27/2019 09:27 PM, Mark Rutland wrote:
> With 16K pages and 48-bit VAs, the PGD level of table has two entries,
> and so the fixmap shares a PGD with the kernel image. Since commit:
> 
>   f9040773b7bbbd9e ("arm64: move kernel image to base of vmalloc area")
> 
> ... we copy the existing fixmap to the new fine-grained page tables at
> the PUD level in this case. When walking to the new PUD, we forgot to
> offset the PGD entry and always used the PGD entry at index 0, but this
> worked as the kernel image and fixmap were in the low half of the TTBR1
> address space.
> 
> As of commit:
> 
>   14c127c957c1c607 ("arm64: mm: Flip kernel VA space")
> 
> ... the kernel image and fixmap are in the high half of the TTBR1
> address space, and hence use the PGD at index 1, but we didn't update
> the fixmap copying code to account for this.
> 
> Thus, we'll erroneously try to copy the fixmap slots into a PUD under
> the PGD entry at index 0. At the point we do so this PGD entry has not
> been initialised, and thus we'll try to write a value to a small offset
> from physical address 0, causing a number of potential problems.
> 
> Fix this be correctly offsetting the PGD. This is split over a few steps
> for legibility.
> 
> Fixes: 14c127c957c1c607 ("arm64: mm: Flip kernel VA space")
> Reported-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Steve Capper <Steve.Capper@arm.com>
> Cc: Will Deacon <will@kernel.org>

Tested-by: Anshuman Khandual <anshuman.khandual@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v2 2/2] dma-contiguous: Use fallback alloc_pages for single pages
From: Masahiro Yamada @ 2019-08-28 10:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Ulf Hansson, Tony Lindgren, Catalin Marinas, Will Deacon,
	Linux Kernel Mailing List, Max Filippov, Marek Szyprowski,
	Stephen Rothwell, Joerg Roedel, Russell King, Thierry Reding,
	linux-xtensa, Kees Cook, Nicolin Chen, Andrew Morton,
	linux-arm-kernel, Chris Zankel, Wolfram Sang, David Woodhouse,
	linux-mmc, Adrian Hunter, iommu, iamjoonsoo.kim, Robin Murphy
In-Reply-To: <20190827115541.GB5921@lst.de>

Hi Christoph,

On Tue, Aug 27, 2019 at 8:55 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Aug 27, 2019 at 06:03:14PM +0900, Masahiro Yamada wrote:
> > Yes, this makes my driver working again
> > when CONFIG_DMA_CMA=y.
> >
> >
> > If I apply the following, my driver gets back working
> > irrespective of CONFIG_DMA_CMA.
>
> That sounds a lot like the device simply isn't 64-bit DMA capable, and
> previously always got CMA allocations under the limit it actually
> supported.  I suggest that you submit this quirk to the mmc maintainers.


I tested v5.2 and my MMC host controller works with
dma_address that exceeds 32-bit physical address.

So, I believe my MMC device is 64-bit DMA capable.

I am still looking into the code
to find out what was changed.




--
Best Regards
Masahiro Yamada

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH] mmc: sunxi: fix unusuable eMMC on some H6 boards by disabling DDR
From: Alejandro González @ 2019-08-28 10:52 UTC (permalink / raw)
  To: Ulf Hansson, Maxime Ripard
  Cc: Greg Kroah-Hartman, Linus Walleij, linux-sunxi,
	linux-mmc@vger.kernel.org, Linux Kernel Mailing List,
	Chen-Yu Tsai, Thomas Gleixner, Linux ARM
In-Reply-To: <CAPDyKFr5opD2yBXmFRBY-9oA_3ShVv0GPFRO8Q_8TEiT+z2pQA@mail.gmail.com>

El 27/8/19 a las 15:24, Ulf Hansson escribió:> Assuming this should go stable as well? Perhaps you can find a
> relevant commit that we can put as a fixes tag as well?
> 
> Kind regards
> Uffe

The most relevant commit I've found that is related to enabling DDR speeds
on H6 boards is this one: https://github.com/torvalds/linux/commit/07bafc1e3536a4e3c422dbd13341688b54f159bb .
But it doesn't address the H6 SoC specifically, so I doubted whether it would
be appropiate to mark this patch as fixing it, and opted to not do it. I don't
mind adding that tag if it's appropiate, though :-)

On the other hand, I'm not sure that I understood correctly what do you mean by
this patch going stable, but I might say the changes themselves are stable and work.
The only downside I can think of to them is that they are a kind of workaround that
reduces performance on H6 boards and/or eMMC not affected by this problem (are there
any?), unless device trees are changed.

El 27/8/19 a las 15:32, Maxime Ripard escribió:
> On Sun, Aug 25, 2019 at 05:05:58PM +0200, Alejandro González wrote:
>> Some Allwinner H6 boards have timing problems when dealing with
>> DDR-capable eMMC cards. These boards include the Pine H64 and Tanix TX6.
>>
>> These timing problems result in out of sync communication between the
>> driver and the eMMC, which renders the memory unsuable for every
>> operation but some basic commmands, like reading the status register.
>>
>> The cause of these timing problems is not yet well known, but they go
>> away by disabling DDR mode operation in the driver. Like on some H5
>> boards, it might be that the traces are not precise enough to support
>> these speeds. However, Jernej Skrabec compared the BSP driver with this
>> driver, and found that the BSP driver configures pinctrl to operate at
>> 1.8 V when entering DDR mode (although 3.3 V operation is supported), while
>> the mainline kernel lacks any mechanism to switch voltages dynamically.
>> Finally, other possible cause might be some timing parameter that is
>> different on the H6 with respect to other SoCs.
> 
> This should be a comment in the driver where this is disabled.
> 
> Maxime

Thank you for your review. I'll mention that briefly in a comment in the code for
the next revision of this patch.

Regards.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v5 07/10] regulator: mt6358: Add support for MT6358 regulator
From: Mark Brown @ 2019-08-28 10:45 UTC (permalink / raw)
  To: Hsin-Hsiung Wang
  Cc: Mark Rutland, Alessandro Zummo, Alexandre Belloni, srv_heupstream,
	devicetree, Greg Kroah-Hartman, Sean Wang, Liam Girdwood,
	linux-kernel, Richard Fontana, Rob Herring, linux-mediatek,
	linux-arm-kernel, Matthias Brugger, Thomas Gleixner, Eddie Huang,
	Lee Jones, Kate Stewart, linux-rtc
In-Reply-To: <1566531931-9772-8-git-send-email-hsin-hsiung.wang@mediatek.com>


[-- Attachment #1.1: Type: text/plain, Size: 436 bytes --]

On Fri, Aug 23, 2019 at 11:45:28AM +0800, Hsin-Hsiung Wang wrote:
> The MT6358 is a regulator found on boards based on MediaTek MT8183 and
> probably other SoCs. It is a so called pmic and connects as a slave to
> SoC using SPI, wrapped inside the pmic-wrapper.

This looks good - since there was only one small issue with the example
in the binding document I'll apply both patches, please send a followup
fixing the binding document.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v5 05/10] regulator: Add document for MT6358 regulator
From: Mark Brown @ 2019-08-28 10:44 UTC (permalink / raw)
  To: Hsin-Hsiung Wang
  Cc: Mark Rutland, Alessandro Zummo, Alexandre Belloni, srv_heupstream,
	devicetree, Greg Kroah-Hartman, Sean Wang, Liam Girdwood,
	linux-kernel, Richard Fontana, Rob Herring, linux-mediatek,
	linux-arm-kernel, Matthias Brugger, Thomas Gleixner, Eddie Huang,
	Lee Jones, Kate Stewart, linux-rtc
In-Reply-To: <1566531931-9772-6-git-send-email-hsin-hsiung.wang@mediatek.com>


[-- Attachment #1.1: Type: text/plain, Size: 306 bytes --]

On Fri, Aug 23, 2019 at 11:45:26AM +0800, Hsin-Hsiung Wang wrote:

> +	pmic {
> +		compatible = "mediatek,mt6358";
> +
> +		mt6358regulator: mt6358regulator {
> +			compatible = "mediatek,mt6358-regulator";

This still lists the subnode compatible string which has been removed
from the binding.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: Continuous SD IO causes hung task messages
From: Ulf Hansson @ 2019-08-28 10:31 UTC (permalink / raw)
  To: Russell King - ARM Linux admin
  Cc: linux-mmc@vger.kernel.org, Linux Kernel Mailing List, Linux ARM
In-Reply-To: <20190827150614.GN13294@shell.armlinux.org.uk>

On Tue, 27 Aug 2019 at 17:06, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>
> On Tue, Aug 27, 2019 at 03:52:17PM +0100, Russell King - ARM Linux admin wrote:
> > On Tue, Aug 27, 2019 at 03:36:34PM +0100, Russell King - ARM Linux admin wrote:
> > > On Tue, Aug 27, 2019 at 03:55:23PM +0200, Ulf Hansson wrote:
> > > > On Tue, 27 Aug 2019 at 15:43, Russell King - ARM Linux admin
> > > > <linux@armlinux.org.uk> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > While dd'ing the contents of a SD card, I get hung task timeout
> > > > > messages as per below.  However, the dd is making progress.  Any
> > > > > ideas?
> > > > >
> > > > > Presumably, mmc_rescan doesn't get a look-in while IO is progressing
> > > > > for the card?
> > > >
> > > > Is it a regression?
> > > >
> > > > There not much of recent mmc core and mmc block changes, that I can
> > > > think of at this point.
> > >
> > > No idea - I just repaired the SD socket after the D2 line became
> > > disconnected, and decided to run that command as a test.
> > >
> > > > > ARM64 host, Macchiatobin, uSD card.
> > > >
> > > > What mmc host driver is it? mmci?
> > >
> > > sdhci-xenon.
> > >
> > > I'm just trying with one CPU online, then I'll try with two.  My
> > > suspicion is that there's a problem in the ARM64 arch code where
> > > unlocking a mutex doesn't get noticed on other CPUs.
> > >
> > > Hmm, I thought I'd try bringing another CPU online, but it seems
> > > like the ARM64 CPU hotplug code is broken:
> > >
> > > [ 3552.029689] CPU1: shutdown
> > > [ 3552.031099] psci: CPU1 killed.
> > > [ 3949.835212] CPU1: failed to come online
> > > [ 3949.837753] CPU1: failed in unknown state : 0x0
> > >
> > > which means I can only take CPUs down, I can't bring them back
> > > online without rebooting.
> >
> > Okay, running on a single CPU shows no problems.
> >
> > Running on four CPUs (as originally) shows that the kworker thread
> > _never_ gets scheduled, so the warning is not false.
> >
> > With three CPUs, same problem.
> >
> > root@arm-d06300000000:~# ps aux | grep ' D '
> > root        34  0.0  0.0      0     0 ?        D    15:38   0:00 [kworker/1:1+events_freezable]
> > root@arm-d06300000000:~# cat /proc/34/sched
> > kworker/1:1 (34, #threads: 1)
> > -------------------------------------------------------------------
> > se.exec_start                                :        318689.992440
> > se.vruntime                                  :         37750.882357
> > se.sum_exec_runtime                          :             9.421240
> > se.nr_migrations                             :                    0
> > nr_switches                                  :                 1174
> > nr_voluntary_switches                        :                 1171
> > nr_involuntary_switches                      :                    3
> > se.load.weight                               :              1048576
> > se.runnable_weight                           :              1048576
> > se.avg.load_sum                              :                    6
> > se.avg.runnable_load_sum                     :                    6
> > se.avg.util_sum                              :                 5170
> > se.avg.load_avg                              :                    0
> > se.avg.runnable_load_avg                     :                    0
> > se.avg.util_avg                              :                    0
> > se.avg.last_update_time                      :         318689991680
> > se.avg.util_est.ewma                         :                   10
> > se.avg.util_est.enqueued                     :                    0
> > policy                                       :                    0
> > prio                                         :                  120
> > clock-delta                                  :                    0
> >
> > The only thing that changes there is "clock-delta".  When I kill the
> > dd, I get:
> >
> > root@arm-d06300000000:~# cat /proc/34/sched
> > kworker/1:1 (34, #threads: 1)
> > -------------------------------------------------------------------
> > se.exec_start                                :        574025.791680
> > se.vruntime                                  :         79996.657300
> > se.sum_exec_runtime                          :            10.916400
> > se.nr_migrations                             :                    0
> > nr_switches                                  :                 1403
> > nr_voluntary_switches                        :                 1400
> > nr_involuntary_switches                      :                    3
> > se.load.weight                               :              1048576
> > se.runnable_weight                           :              1048576
> > se.avg.load_sum                              :                   15
> > se.avg.runnable_load_sum                     :                   15
> > se.avg.util_sum                              :                15007
> > se.avg.load_avg                              :                    0
> > se.avg.runnable_load_avg                     :                    0
> > se.avg.util_avg                              :                    0
> > se.avg.last_update_time                      :         574025791488
> > se.avg.util_est.ewma                         :                   10
> > se.avg.util_est.enqueued                     :                    0
> > policy                                       :                    0
> > prio                                         :                  120
> > clock-delta                                  :                   40
> >
> > so the thread makes forward progress.
> >
> > Down to two CPUs:
> >
> > root@arm-d06300000000:~# ps aux | grep ' D '
> > root        34  0.0  0.0      0     0 ?        D    15:38   0:00 [kworker/1:1+events_freezable]
> >
> > Same symptoms.  dd and md5sum switch between CPU 0 and CPU1.
>
> Hmm.
>
> static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
>                                     const struct blk_mq_queue_data *bd)
>
>         mq->in_flight[issue_type] += 1;
>         get_card = (mmc_tot_in_flight(mq) == 1);
>
>         if (get_card)
>                 mmc_get_card(card, &mq->ctx);
>
> mmc_get_card() gets the host lock according to the card.
>
> So, if we always have requests in flight (which is probably the case
> here) we never drop the host lock, and mmc_rescan() never gets a look
> in - hence blocking the kworker.

Ahh, you are right. However, this isn't a new problem I believe.

Even if we did some re-work of the locking mechanism while converting
to blk-mq, I still think the worker could starve the mmc_rescan work
before.

In practice this shouldn't be a problem though, unless I am
overlooking something. This is because it's not until there is an I/O
error, that causes the block worker to release the host, to it makes
sense to let mmc_rescan to claim the host to check for card removal.

>
> So this is a real issue with MMC, and not down to something in the
> arch.

Yep, thanks for running the test and providing more details!

>
> I suspect the reason that single-CPU doesn't show it is because it is
> unable to keep multiple requests in flight.

Yes, most likely.

Now, how to solve this problem I need to think more about....

FYI: The long term goal has been to try to remove the big fat host
lock altogether and slowly we have moved more an more things to be
executed as a part of the block worker, which is one of the needed
steps. Like the mmc ioctls for example...

Kind regards
Uffe

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply

* Re: [PATCH v7 07/13] soc: mediatek: Refactor bus protection control
From: Matthias Brugger @ 2019-08-28 10:28 UTC (permalink / raw)
  To: Weiyi Lu, Nicolas Boichat, Rob Herring
  Cc: James Liao, srv_heupstream, linux-kernel, Fan Chen,
	linux-mediatek, Yong Wu, linux-arm-kernel
In-Reply-To: <1566983506-26598-8-git-send-email-weiyi.lu@mediatek.com>



On 28/08/2019 11:11, Weiyi Lu wrote:
> Put bus protection enable and disable control in separate functions.
> 
> Signed-off-by: Weiyi Lu <weiyi.lu@mediatek.com>

Applied to v5.4-next/soc

Thanks!

> ---
>  drivers/soc/mediatek/mtk-scpsys.c | 44 ++++++++++++++++++++++++++-------------
>  1 file changed, 30 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/soc/mediatek/mtk-scpsys.c b/drivers/soc/mediatek/mtk-scpsys.c
> index ad0f619..fb2b027 100644
> --- a/drivers/soc/mediatek/mtk-scpsys.c
> +++ b/drivers/soc/mediatek/mtk-scpsys.c
> @@ -274,6 +274,30 @@ static int scpsys_sram_disable(struct scp_domain *scpd, void __iomem *ctl_addr)
>  			MTK_POLL_DELAY_US, MTK_POLL_TIMEOUT);
>  }
>  
> +static int scpsys_bus_protect_enable(struct scp_domain *scpd)
> +{
> +	struct scp *scp = scpd->scp;
> +
> +	if (!scpd->data->bus_prot_mask)
> +		return 0;
> +
> +	return mtk_infracfg_set_bus_protection(scp->infracfg,
> +			scpd->data->bus_prot_mask,
> +			scp->bus_prot_reg_update);
> +}
> +
> +static int scpsys_bus_protect_disable(struct scp_domain *scpd)
> +{
> +	struct scp *scp = scpd->scp;
> +
> +	if (!scpd->data->bus_prot_mask)
> +		return 0;
> +
> +	return mtk_infracfg_clear_bus_protection(scp->infracfg,
> +			scpd->data->bus_prot_mask,
> +			scp->bus_prot_reg_update);
> +}
> +
>  static int scpsys_power_on(struct generic_pm_domain *genpd)
>  {
>  	struct scp_domain *scpd = container_of(genpd, struct scp_domain, genpd);
> @@ -316,13 +340,9 @@ static int scpsys_power_on(struct generic_pm_domain *genpd)
>  	if (ret < 0)
>  		goto err_pwr_ack;
>  
> -	if (scpd->data->bus_prot_mask) {
> -		ret = mtk_infracfg_clear_bus_protection(scp->infracfg,
> -				scpd->data->bus_prot_mask,
> -				scp->bus_prot_reg_update);
> -		if (ret)
> -			goto err_pwr_ack;
> -	}
> +	ret = scpsys_bus_protect_disable(scpd);
> +	if (ret < 0)
> +		goto err_pwr_ack;
>  
>  	return 0;
>  
> @@ -344,13 +364,9 @@ static int scpsys_power_off(struct generic_pm_domain *genpd)
>  	u32 val;
>  	int ret, tmp;
>  
> -	if (scpd->data->bus_prot_mask) {
> -		ret = mtk_infracfg_set_bus_protection(scp->infracfg,
> -				scpd->data->bus_prot_mask,
> -				scp->bus_prot_reg_update);
> -		if (ret)
> -			goto out;
> -	}
> +	ret = scpsys_bus_protect_enable(scpd);
> +	if (ret < 0)
> +		goto out;
>  
>  	ret = scpsys_sram_disable(scpd, ctl_addr);
>  	if (ret < 0)
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox