* [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
@ 2014-03-24 16:17 Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type Thomas Petazzoni
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2014-03-24 16:17 UTC (permalink / raw)
To: linux-arm-kernel
Russell, Will, Catalin,
This patch series adresses a problem that affects the newer Marvell
Armada 375 and 38x SOCs, based on Cortex-A9+PL310, combined with the
Marvell PCIe hardware unit. When the hardware I/O coherency is
enabled, the combination of Cortex-A9/PL310/Marvell PCIe hardware unit
will quickly cause a deadlock when the PCIe bus is stressed.
The workaround for this problem has been suggested by ARM, and
consists in two things:
(1) Map the PCIe regions as strongly-ordered
(2) Disable the outer cache sync of the PL310 when hardware I/O
coherency is used, since it is unneeded and causes the deadlock.
The following three patches address the problem in the following way:
* PATCH 1/3 adds the necessary infrastructure to allow PCIe I/O
regions to be mapped as strongly-ordered.
* PATCH 2/3 extends the l2x0 cache driver to be able to pass the
information on whether hardware I/O coherency is used or not, and
in the former case, disable the outer_cache_sync operation.
* PATCH 3/3 uses both of the added infrastructures, as well as the
existing infrastructure to customize the behavior of ioremap() on a
per-platform basis, to implement the workaround for the Armada 375
and 38x SOCs.
Patches 1/3 and 2/3 apply on 3.14-rc1. Patch 3/3 applies on top of the
'[PATCH 00/10] ARM: mvebu: Hardware coherency support for Armada 375
and 38x' patch set. But for now, I'm mainly interested in hearing
about patches 1/3 and 2/3.
Please let me know what you think about the proposed solution.
Thanks!
Thomas
Thomas Petazzoni (3):
ARM: mm: allow sub-architectures to override PCI I/O memory type
ARM: mm: add support for HW coherent systems in PL310
ARM: mvebu: implement L2/PCIe deadlock workaround
arch/arm/include/asm/hardware/cache-l2x0.h | 1 +
arch/arm/include/asm/io.h | 6 ++++++
arch/arm/mach-mvebu/board-v7.c | 31 +++++++++++++++++++++++++++++-
arch/arm/mm/cache-l2x0.c | 22 ++++++++++++++++++++-
arch/arm/mm/ioremap.c | 9 ++++++++-
5 files changed, 66 insertions(+), 3 deletions(-)
--
1.8.3.2
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type
2014-03-24 16:17 [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
@ 2014-03-24 16:17 ` Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310 Thomas Petazzoni
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2014-03-24 16:17 UTC (permalink / raw)
To: linux-arm-kernel
Due to a design incompatibility between the PCIe Marvell controller
and the Cortex-A9, stressing PCIe devices with a lot of traffic
quickly causes a deadlock.
One part of the workaround for this is to have all PCIe regions mapped
as MT_MEMORY_SO instead of MT_DEVICE. While the arch_ioremap_caller()
mechanism allows sub-architecture code to override ioremap(), used to
map PCIe memory regions, there isn't such a mechanism to override the
behavior of pci_ioremap_io().
This commit adds the arch_pci_ioremap_mem_type variable, initialized
to MT_DEVICE by default, and that sub-architecture code can
override. We have chosen to expose a single variable rather than
offering the possibility of overriding the entire pci_ioremap_io(),
because implementing pci_ioremap_io() requires calling functions
(get_mem_type()) that are private to the arch/arm/mm/ code.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
arch/arm/include/asm/io.h | 6 ++++++
arch/arm/mm/ioremap.c | 9 ++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 8aa4cca..3d23418 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -179,6 +179,12 @@ static inline void __iomem *__typesafe_io(unsigned long addr)
/* PCI fixed i/o mapping */
#define PCI_IO_VIRT_BASE 0xfee00000
+#if defined(CONFIG_PCI)
+void pci_ioremap_set_mem_type(int mem_type);
+#else
+static inline void pci_ioremap_set_mem_type(int mem_type) {}
+#endif
+
extern int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr);
/*
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index f9c32ba..d1e5ad7 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -438,6 +438,13 @@ void __arm_iounmap(volatile void __iomem *io_addr)
EXPORT_SYMBOL(__arm_iounmap);
#ifdef CONFIG_PCI
+static int pci_ioremap_mem_type = MT_DEVICE;
+
+void pci_ioremap_set_mem_type(int mem_type)
+{
+ pci_ioremap_mem_type = mem_type;
+}
+
int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
{
BUG_ON(offset + SZ_64K > IO_SPACE_LIMIT);
@@ -445,7 +452,7 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
return ioremap_page_range(PCI_IO_VIRT_BASE + offset,
PCI_IO_VIRT_BASE + offset + SZ_64K,
phys_addr,
- __pgprot(get_mem_type(MT_DEVICE)->prot_pte));
+ __pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
}
EXPORT_SYMBOL_GPL(pci_ioremap_io);
#endif
--
1.8.3.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310
2014-03-24 16:17 [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type Thomas Petazzoni
@ 2014-03-24 16:17 ` Thomas Petazzoni
2014-04-08 17:24 ` Catalin Marinas
2014-03-24 16:17 ` [PATCH 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround Thomas Petazzoni
2014-04-03 14:07 ` [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
3 siblings, 1 reply; 15+ messages in thread
From: Thomas Petazzoni @ 2014-03-24 16:17 UTC (permalink / raw)
To: linux-arm-kernel
When a PL310 cache is used on a system that provides hardware
coherency, the outer cache sync operation is useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
controller and the Cortex-A9.
This commit adds a new l2x0_of_init_coherent() variant of
l2x0_of_init(), which allows the caller to tell the L2 cache
initialization that the system has hardware coherency support enabled,
and that therefore the outer cache sync operation can be skipped if
possible.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
arch/arm/include/asm/hardware/cache-l2x0.h | 1 +
arch/arm/mm/cache-l2x0.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/arch/arm/include/asm/hardware/cache-l2x0.h b/arch/arm/include/asm/hardware/cache-l2x0.h
index 6795ff7..d7db409 100644
--- a/arch/arm/include/asm/hardware/cache-l2x0.h
+++ b/arch/arm/include/asm/hardware/cache-l2x0.h
@@ -110,6 +110,7 @@
extern void __init l2x0_init(void __iomem *base, u32 aux_val, u32 aux_mask);
#if defined(CONFIG_CACHE_L2X0) && defined(CONFIG_OF)
extern int l2x0_of_init(u32 aux_val, u32 aux_mask);
+extern int l2x0_of_init_coherent(u32 aux_val, u32 aux_mask);
#else
static inline int l2x0_of_init(u32 aux_val, u32 aux_mask)
{
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 7abde2c..76ec012 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -968,7 +968,7 @@ static const struct of_device_id l2x0_ids[] __initconst = {
{}
};
-int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
+int __init l2x0_of_init_common(u32 aux_val, u32 aux_mask, bool is_coherent)
{
struct device_node *np;
const struct l2x0_of_data *data;
@@ -1005,8 +1005,28 @@ int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
of_init = true;
memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
+
+ /*
+ * PL310 doesn't need an outer cache sync operation when the
+ * system is operating with hardware coherency enabled, as it
+ * is done directly in hardware.
+ */
+ if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
+ outer_cache.sync = NULL;
+
l2x0_init(l2x0_base, aux_val, aux_mask);
return 0;
}
+
+int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
+{
+ return l2x0_of_init_common(aux_val, aux_mask, false);
+}
+
+int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
+{
+ return l2x0_of_init_common(aux_val, aux_mask, true);
+}
+
#endif
--
1.8.3.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround
2014-03-24 16:17 [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310 Thomas Petazzoni
@ 2014-03-24 16:17 ` Thomas Petazzoni
2014-04-03 14:07 ` [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
3 siblings, 0 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2014-03-24 16:17 UTC (permalink / raw)
To: linux-arm-kernel
The Marvell Armada 375 and Armada 38x SOCs, which use the Cortex-A9
CPU core, the PL310 cache and the Marvell PCIe hardware block are
affected a L2/PCIe deadlock problem when hardware I/O coherency is used.
This deadlock can be avoided by mapping the PCIe memory areas as
strongly-ordered, and by removing the outer cache sync done in
software. This is done in this patch, thanks to the new bits of
infrastructure added in 'ARM: mm: allow sub-architectures to override
PCI I/O memory type' and 'ARM: mm: add support for HW coherent systems
in PL310' respectively.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
arch/arm/mach-mvebu/board-v7.c | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
index 6260cb8..af4eee9 100644
--- a/arch/arm/mach-mvebu/board-v7.c
+++ b/arch/arm/mach-mvebu/board-v7.c
@@ -75,6 +75,26 @@ static int armada_375_external_abort_wa(unsigned long addr, unsigned int fsr,
return 1;
}
+/*
+ * This ioremap hook is used on Armada 375/38x to ensure that PCIe
+ * memory areas are mapped as MT_MEMORY_RW_SO instead of
+ * MT_DEVICE. This is needed as a workaround for a deadlock issue
+ * between the PCIe interface and the cache controller.
+ */
+static void __iomem *
+armada_pcie_wa_ioremap_caller(phys_addr_t phys_addr, size_t size,
+ unsigned int mtype, void *caller)
+{
+ struct resource pcie_mem;
+
+ mvebu_mbus_get_pcie_mem_aperture(&pcie_mem);
+
+ if (pcie_mem.start <= phys_addr && (phys_addr + size) <= pcie_mem.end)
+ mtype = MT_MEMORY_RW_SO;
+
+ return __arm_ioremap_caller(phys_addr, size, mtype, caller);
+}
+
static void __init mvebu_timer_and_clk_init(void)
{
of_clk_init(NULL);
@@ -83,9 +103,18 @@ static void __init mvebu_timer_and_clk_init(void)
BUG_ON(mvebu_mbus_dt_init(coherency_available()));
coherency_init();
#ifdef CONFIG_CACHE_L2X0
- l2x0_of_init(0, ~0UL);
+ if (coherency_available())
+ l2x0_of_init_coherent(0, ~0UL);
+ else
+ l2x0_of_init(0, ~0UL);
#endif
+ if (of_machine_is_compatible("marvell,armada375") ||
+ of_machine_is_compatible("marvell,armada38x")) {
+ arch_ioremap_caller = armada_pcie_wa_ioremap_caller;
+ pci_ioremap_set_mem_type(MT_MEMORY_RW_SO);
+ }
+
if (of_machine_is_compatible("marvell,armada375"))
hook_fault_code(16 + 6, armada_375_external_abort_wa, SIGBUS, 0,
"imprecise external abort");
--
1.8.3.2
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
2014-03-24 16:17 [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
` (2 preceding siblings ...)
2014-03-24 16:17 ` [PATCH 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround Thomas Petazzoni
@ 2014-04-03 14:07 ` Thomas Petazzoni
2014-04-03 14:15 ` Russell King - ARM Linux
3 siblings, 1 reply; 15+ messages in thread
From: Thomas Petazzoni @ 2014-04-03 14:07 UTC (permalink / raw)
To: linux-arm-kernel
Russell, Will, Catalin,
Any comments about the proposed patches? These are important for the
recent Armada 375/38x platforms and I'm sure will require a bit of
discussion. Moreover, one of the patch affects the L2 cache driver that
Russell is currently working on, so it would be nice if we could get
the discussion going soon.
Thanks a lot!
Thomas
On Mon, 24 Mar 2014 17:17:49 +0100, Thomas Petazzoni wrote:
> Russell, Will, Catalin,
>
> This patch series adresses a problem that affects the newer Marvell
> Armada 375 and 38x SOCs, based on Cortex-A9+PL310, combined with the
> Marvell PCIe hardware unit. When the hardware I/O coherency is
> enabled, the combination of Cortex-A9/PL310/Marvell PCIe hardware unit
> will quickly cause a deadlock when the PCIe bus is stressed.
>
> The workaround for this problem has been suggested by ARM, and
> consists in two things:
>
> (1) Map the PCIe regions as strongly-ordered
>
> (2) Disable the outer cache sync of the PL310 when hardware I/O
> coherency is used, since it is unneeded and causes the deadlock.
>
> The following three patches address the problem in the following way:
>
> * PATCH 1/3 adds the necessary infrastructure to allow PCIe I/O
> regions to be mapped as strongly-ordered.
>
> * PATCH 2/3 extends the l2x0 cache driver to be able to pass the
> information on whether hardware I/O coherency is used or not, and
> in the former case, disable the outer_cache_sync operation.
>
> * PATCH 3/3 uses both of the added infrastructures, as well as the
> existing infrastructure to customize the behavior of ioremap() on a
> per-platform basis, to implement the workaround for the Armada 375
> and 38x SOCs.
>
> Patches 1/3 and 2/3 apply on 3.14-rc1. Patch 3/3 applies on top of the
> '[PATCH 00/10] ARM: mvebu: Hardware coherency support for Armada 375
> and 38x' patch set. But for now, I'm mainly interested in hearing
> about patches 1/3 and 2/3.
>
> Please let me know what you think about the proposed solution.
>
> Thanks!
>
> Thomas
>
> Thomas Petazzoni (3):
> ARM: mm: allow sub-architectures to override PCI I/O memory type
> ARM: mm: add support for HW coherent systems in PL310
> ARM: mvebu: implement L2/PCIe deadlock workaround
>
> arch/arm/include/asm/hardware/cache-l2x0.h | 1 +
> arch/arm/include/asm/io.h | 6 ++++++
> arch/arm/mach-mvebu/board-v7.c | 31 +++++++++++++++++++++++++++++-
> arch/arm/mm/cache-l2x0.c | 22 ++++++++++++++++++++-
> arch/arm/mm/ioremap.c | 9 ++++++++-
> 5 files changed, 66 insertions(+), 3 deletions(-)
>
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
2014-04-03 14:07 ` [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
@ 2014-04-03 14:15 ` Russell King - ARM Linux
2014-04-07 9:10 ` Thomas Petazzoni
2014-04-07 12:13 ` Catalin Marinas
0 siblings, 2 replies; 15+ messages in thread
From: Russell King - ARM Linux @ 2014-04-03 14:15 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Apr 03, 2014 at 04:07:27PM +0200, Thomas Petazzoni wrote:
> Russell, Will, Catalin,
>
> Any comments about the proposed patches? These are important for the
> recent Armada 375/38x platforms and I'm sure will require a bit of
> discussion. Moreover, one of the patch affects the L2 cache driver that
> Russell is currently working on, so it would be nice if we could get
> the discussion going soon.
Will is away for another week and a half.
There is an important point to be made about the L2 cache though - it is
not possible to disable the "sync" at the L2 cache - any register write
automatically invokes a sync before it is actioned, so avoiding the
explicit sync doesn't stop them from happening, it just reduces the
number which occur.
Is it possible that you are hitting one of the PL310 errata?
--
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
2014-04-03 14:15 ` Russell King - ARM Linux
@ 2014-04-07 9:10 ` Thomas Petazzoni
2014-04-07 12:13 ` Catalin Marinas
1 sibling, 0 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2014-04-07 9:10 UTC (permalink / raw)
To: linux-arm-kernel
Dear Russell King - ARM Linux,
On Thu, 3 Apr 2014 15:15:33 +0100, Russell King - ARM Linux wrote:
> > Any comments about the proposed patches? These are important for the
> > recent Armada 375/38x platforms and I'm sure will require a bit of
> > discussion. Moreover, one of the patch affects the L2 cache driver that
> > Russell is currently working on, so it would be nice if we could get
> > the discussion going soon.
>
> Will is away for another week and a half.
Ok. However I believe Will and Catalin are the best placed to give all
the fine details about this problem.
> There is an important point to be made about the L2 cache though - it is
> not possible to disable the "sync" at the L2 cache - any register write
> automatically invokes a sync before it is actioned, so avoiding the
> explicit sync doesn't stop them from happening, it just reduces the
> number which occur.
>
> Is it possible that you are hitting one of the PL310 errata?
As far as I'm aware, there is no PL310 errata registered for this
problem: at least I asked Will and Catalin about this, and they never
came up with a PL310 errata. My understanding is that the problem here
is not a deficiency of the PL310 itself, but rather an unfortunate
interaction between the PL310 behavior, the hardware I/O coherency and
the PCIe transactions.
The suggested fix to disable explicit outer cache sync, and map PCIe
regions are strongly ordered was proposed by ARM as a solution to the
deadlocks, and it was tested successfully.
I am hoping that Catalin and Will will enter the discussion and give
the details you need to fully understand the problem.
Thanks,
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
2014-04-03 14:15 ` Russell King - ARM Linux
2014-04-07 9:10 ` Thomas Petazzoni
@ 2014-04-07 12:13 ` Catalin Marinas
2014-04-14 16:59 ` Will Deacon
1 sibling, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2014-04-07 12:13 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Apr 03, 2014 at 03:15:33PM +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 03, 2014 at 04:07:27PM +0200, Thomas Petazzoni wrote:
> > Any comments about the proposed patches? These are important for the
> > recent Armada 375/38x platforms and I'm sure will require a bit of
> > discussion. Moreover, one of the patch affects the L2 cache driver that
> > Russell is currently working on, so it would be nice if we could get
> > the discussion going soon.
>
> Will is away for another week and a half.
And when he's back, I'm pretty sure he will be eager to look at the
PL310 code ;)
> There is an important point to be made about the L2 cache though - it is
> not possible to disable the "sync" at the L2 cache - any register write
> automatically invokes a sync before it is actioned, so avoiding the
> explicit sync doesn't stop them from happening, it just reduces the
> number which occur.
I agree, there is no way to disable the "sync". For this particular case
(a system erratum probably, I need to check) you need to make sure you
don't do PL310 and PCIe accesses by the CPU at the same time, otherwise
the system may deadlock.
However, on this Marvell board, the DMA is fully coherent so they don't
need L1 + L2 cache maintenance for such buffers. Due to hardware
coherency, they don't need the outer_sync() either for wmb() etc. So
far, you can treat this as an optimisation.
But the side-effect is that for most run-time operations (apart from
some power management or CPU hotplug) they won't need to touch any PL310
registers at all, and not even the outer_sync() via wmb(). This makes
it (nearly) impossible to hit the system erratum.
I don't remember why the SO memory is still needed for PCIe. I assume to
avoid posted writes but if we avoid PL310 accesses altogether, this may
not be an issue (well, unless the CPU power management code on this
platform needs to flush PL310 explicitly). I'll let you know when I find
out.
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310
2014-03-24 16:17 ` [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310 Thomas Petazzoni
@ 2014-04-08 17:24 ` Catalin Marinas
2014-04-08 18:12 ` Thomas Petazzoni
0 siblings, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2014-04-08 17:24 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Mar 24, 2014 at 04:17:51PM +0000, Thomas Petazzoni wrote:
> --- a/arch/arm/mm/cache-l2x0.c
> +++ b/arch/arm/mm/cache-l2x0.c
> @@ -968,7 +968,7 @@ static const struct of_device_id l2x0_ids[] __initconst = {
> {}
> };
>
> -int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> +int __init l2x0_of_init_common(u32 aux_val, u32 aux_mask, bool is_coherent)
> {
> struct device_node *np;
> const struct l2x0_of_data *data;
> @@ -1005,8 +1005,28 @@ int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
>
> of_init = true;
> memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> +
> + /*
> + * PL310 doesn't need an outer cache sync operation when the
> + * system is operating with hardware coherency enabled, as it
> + * is done directly in hardware.
> + */
> + if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> + outer_cache.sync = NULL;
> +
For this particular case, you can add a specific l2x0_of_data structure
with the right compatible string for your platform where
outer_cache.sync is NULL, assuming that dma_alloc_coherent() returns
Cacheable memory (e.g. your platform uses arm_coherent_dma_ops). The
only use of outer_sync() is for non-SMP barriers in relation to Normal
NonCacheable buffers.
Even if the platform has coherent I/O, you still need the range L2 cache
ops to be available for secondary CPU booting. Unless this platform can
be configured in a way similar to the "marvell,aurora-system-cache" case
and you can make all the outer_cache ops NULL.
> l2x0_init(l2x0_base, aux_val, aux_mask);
>
> return 0;
> }
> +
> +int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> +{
> + return l2x0_of_init_common(aux_val, aux_mask, false);
> +}
> +
> +int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
> +{
> + return l2x0_of_init_common(aux_val, aux_mask, true);
> +}
This could be a bit misleading. I would rather have a generic
pl310_data_dma_coherent structure (though even on coherent systems you
could still have drivers that prefer writecombine/NormalNC memory to
avoid thrashing the cache).
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310
2014-04-08 17:24 ` Catalin Marinas
@ 2014-04-08 18:12 ` Thomas Petazzoni
2014-04-09 11:15 ` Catalin Marinas
2014-04-09 14:06 ` Catalin Marinas
0 siblings, 2 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2014-04-08 18:12 UTC (permalink / raw)
To: linux-arm-kernel
Dear Catalin Marinas,
On Tue, 8 Apr 2014 18:24:25 +0100, Catalin Marinas wrote:
> > of_init = true;
> > memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> > +
> > + /*
> > + * PL310 doesn't need an outer cache sync operation when the
> > + * system is operating with hardware coherency enabled, as it
> > + * is done directly in hardware.
> > + */
> > + if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> > + outer_cache.sync = NULL;
> > +
>
> For this particular case, you can add a specific l2x0_of_data structure
> with the right compatible string for your platform where
> outer_cache.sync is NULL,
In fact, I'm not sure using a separate compatible string is possible,
because there are situations where the hardware platform may be I/O
coherent, and some situations where it is not the case. For example, in
the current kernel, the platform is I/O coherent when CONFIG_SMP is
enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
same hardware platform, so same Device Tree in both cases.
So using a different compatible string doesn't work here: we would have
to adjust the compatible string depending on whether CONFIG_SMP
or !CONFIG_SMP is used. Of course, this is doable at run time before
probing the L2 cache driver, using a small quirk, but that doesn't
sound really nice.
> assuming that dma_alloc_coherent() returns
> Cacheable memory (e.g. your platform uses arm_coherent_dma_ops).
We don't use arm_coherent_dma_ops, we have our own DMA operations, see
arch/arm/mach-mvebu/coherency.c (in mainline, only the support for the
PJ4B based Armada 370/XP is there, but the support for the Cortex-A9
based Armada 375/38x has been posted at
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244916.html).
> The
> only use of outer_sync() is for non-SMP barriers in relation to Normal
> NonCacheable buffers.
>
> Even if the platform has coherent I/O, you still need the range L2 cache
> ops to be available for secondary CPU booting. Unless this platform can
> be configured in a way similar to the "marvell,aurora-system-cache" case
> and you can make all the outer_cache ops NULL.
Hum, I am not sure about that one. Maybe Tawfik or Nadav can comment on
this question?
> > l2x0_init(l2x0_base, aux_val, aux_mask);
> >
> > return 0;
> > }
> > +
> > +int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> > +{
> > + return l2x0_of_init_common(aux_val, aux_mask, false);
> > +}
> > +
> > +int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
> > +{
> > + return l2x0_of_init_common(aux_val, aux_mask, true);
> > +}
>
> This could be a bit misleading. I would rather have a generic
> pl310_data_dma_coherent structure (though even on coherent systems you
> could still have drivers that prefer writecombine/NormalNC memory to
> avoid thrashing the cache).
I'm not sure to follow your suggestion here. What structure type would
pl310_data_dma_coherent be? A "struct l2x0_of_data" like this:
static const struct l2x0_of_data pl310_dma_coherent_data = {
.setup = pl310_of_setup,
.save = pl310_save,
.outer_cache = {
.resume = pl310_resume,
.inv_range = l2x0_inv_range,
.clean_range = l2x0_clean_range,
.flush_range = l2x0_flush_range,
.sync = NULL,
.flush_all = l2x0_flush_all,
.inv_all = l2x0_inv_all,
.disable = l2x0_disable,
},
};
static const struct of_device_id l2x0_ids[] __initconst = {
[...]
{ .compatible = "arm,pl310-cache", .data = (void *)&pl310_data },
{ .compatible = "arm,pl310-cache-dma-coherent", .data = (void *)&pl310_dma_coherent_data },
[...]
};
Is this what you have in mind? This looks ok to me, if we have a
solution for the CONFIG_SMP vs. !CONFIG_SMP problem.
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310
2014-04-08 18:12 ` Thomas Petazzoni
@ 2014-04-09 11:15 ` Catalin Marinas
2014-04-09 14:06 ` Catalin Marinas
1 sibling, 0 replies; 15+ messages in thread
From: Catalin Marinas @ 2014-04-09 11:15 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Apr 08, 2014 at 07:12:12PM +0100, Thomas Petazzoni wrote:
> On Tue, 8 Apr 2014 18:24:25 +0100, Catalin Marinas wrote:
> > > of_init = true;
> > > memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> > > +
> > > + /*
> > > + * PL310 doesn't need an outer cache sync operation when the
> > > + * system is operating with hardware coherency enabled, as it
> > > + * is done directly in hardware.
> > > + */
> > > + if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> > > + outer_cache.sync = NULL;
> > > +
> >
> > For this particular case, you can add a specific l2x0_of_data structure
> > with the right compatible string for your platform where
> > outer_cache.sync is NULL,
>
> In fact, I'm not sure using a separate compatible string is possible,
> because there are situations where the hardware platform may be I/O
> coherent, and some situations where it is not the case. For example, in
> the current kernel, the platform is I/O coherent when CONFIG_SMP is
> enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
> same hardware platform, so same Device Tree in both cases.
>
> So using a different compatible string doesn't work here: we would have
> to adjust the compatible string depending on whether CONFIG_SMP
> or !CONFIG_SMP is used. Of course, this is doable at run time before
> probing the L2 cache driver, using a small quirk, but that doesn't
> sound really nice.
You could use #ifdef CONFIG_SMP around the compatible string but I
agree, it doesn't look nice.
> > assuming that dma_alloc_coherent() returns
> > Cacheable memory (e.g. your platform uses arm_coherent_dma_ops).
>
> We don't use arm_coherent_dma_ops, we have our own DMA operations, see
> arch/arm/mach-mvebu/coherency.c (in mainline, only the support for the
> PJ4B based Armada 370/XP is there, but the support for the Cortex-A9
> based Armada 375/38x has been posted at
> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/244916.html).
Slightly confusing. So for dma_alloc you can use Cacheable memory, but
for the from_device cases, you need some mvebu_hwcc_sync_io_barrier()
(and no cache maintenance). Doesn't it mean that barriers like rmb()
need such barrier as well (in combination with dma_alloc'ed buffers)?
> > > l2x0_init(l2x0_base, aux_val, aux_mask);
> > >
> > > return 0;
> > > }
> > > +
> > > +int __init l2x0_of_init(u32 aux_val, u32 aux_mask)
> > > +{
> > > + return l2x0_of_init_common(aux_val, aux_mask, false);
> > > +}
> > > +
> > > +int __init l2x0_of_init_coherent(u32 aux_val, u32 aux_mask)
> > > +{
> > > + return l2x0_of_init_common(aux_val, aux_mask, true);
> > > +}
> >
> > This could be a bit misleading. I would rather have a generic
> > pl310_data_dma_coherent structure (though even on coherent systems you
> > could still have drivers that prefer writecombine/NormalNC memory to
> > avoid thrashing the cache).
>
> I'm not sure to follow your suggestion here. What structure type would
> pl310_data_dma_coherent be? A "struct l2x0_of_data" like this:
>
> static const struct l2x0_of_data pl310_dma_coherent_data = {
> .setup = pl310_of_setup,
> .save = pl310_save,
> .outer_cache = {
> .resume = pl310_resume,
> .inv_range = l2x0_inv_range,
> .clean_range = l2x0_clean_range,
> .flush_range = l2x0_flush_range,
> .sync = NULL,
> .flush_all = l2x0_flush_all,
> .inv_all = l2x0_inv_all,
> .disable = l2x0_disable,
> },
> };
>
> static const struct of_device_id l2x0_ids[] __initconst = {
> [...]
> { .compatible = "arm,pl310-cache", .data = (void *)&pl310_data },
> { .compatible = "arm,pl310-cache-dma-coherent", .data = (void *)&pl310_dma_coherent_data },
> [...]
> };
>
> Is this what you have in mind? This looks ok to me, if we have a
> solution for the CONFIG_SMP vs. !CONFIG_SMP problem.
I don't have a nice way. Basically for this board, even if the coherency
was fully set up by firmware (which BTW should be the case for arm64, we
don't take mach-* directories ;)), it depends on the kernel build, I
guess this is because of the Shareable vs NonShareable mappings. The
best I can come up with is a conditionally compiled "marvell,..."
compatible string but your original approach is probably nicer (and as I
said, with the assumption that no driver uses writecombine/dmacoherent
memory for DMA).
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310
2014-04-08 18:12 ` Thomas Petazzoni
2014-04-09 11:15 ` Catalin Marinas
@ 2014-04-09 14:06 ` Catalin Marinas
2014-05-06 10:07 ` Thomas Petazzoni
1 sibling, 1 reply; 15+ messages in thread
From: Catalin Marinas @ 2014-04-09 14:06 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Apr 08, 2014 at 07:12:12PM +0100, Thomas Petazzoni wrote:
> On Tue, 8 Apr 2014 18:24:25 +0100, Catalin Marinas wrote:
>
> > > of_init = true;
> > > memcpy(&outer_cache, &data->outer_cache, sizeof(outer_cache));
> > > +
> > > + /*
> > > + * PL310 doesn't need an outer cache sync operation when the
> > > + * system is operating with hardware coherency enabled, as it
> > > + * is done directly in hardware.
> > > + */
> > > + if (of_device_is_compatible(np, "arm,pl310-cache") && is_coherent)
> > > + outer_cache.sync = NULL;
> > > +
> >
> > For this particular case, you can add a specific l2x0_of_data structure
> > with the right compatible string for your platform where
> > outer_cache.sync is NULL,
>
> In fact, I'm not sure using a separate compatible string is possible,
> because there are situations where the hardware platform may be I/O
> coherent, and some situations where it is not the case. For example, in
> the current kernel, the platform is I/O coherent when CONFIG_SMP is
> enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
> same hardware platform, so same Device Tree in both cases.
I think Russell has a better solution in his L2 cache cleanup series.
Patch 18/44 introduces a .fixup function which takes the outer_cache_fns
pointer and that's a place where your code can check whether coherency
is present or not (and turn .sync into NULL).
--
Catalin
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
2014-04-07 12:13 ` Catalin Marinas
@ 2014-04-14 16:59 ` Will Deacon
2014-04-14 18:39 ` Thomas Petazzoni
0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2014-04-14 16:59 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Apr 07, 2014 at 01:13:04PM +0100, Catalin Marinas wrote:
> On Thu, Apr 03, 2014 at 03:15:33PM +0100, Russell King - ARM Linux wrote:
> > On Thu, Apr 03, 2014 at 04:07:27PM +0200, Thomas Petazzoni wrote:
> > > Any comments about the proposed patches? These are important for the
> > > recent Armada 375/38x platforms and I'm sure will require a bit of
> > > discussion. Moreover, one of the patch affects the L2 cache driver that
> > > Russell is currently working on, so it would be nice if we could get
> > > the discussion going soon.
> >
> > Will is away for another week and a half.
>
> And when he's back, I'm pretty sure he will be eager to look at the
> PL310 code ;)
Lucky for me, it looks like rmk is rewriting it ;)
Is there anything you want me to do here?
Will
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock
2014-04-14 16:59 ` Will Deacon
@ 2014-04-14 18:39 ` Thomas Petazzoni
0 siblings, 0 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2014-04-14 18:39 UTC (permalink / raw)
To: linux-arm-kernel
Dear Will Deacon,
On Mon, 14 Apr 2014 17:59:56 +0100, Will Deacon wrote:
> > > Will is away for another week and a half.
> >
> > And when he's back, I'm pretty sure he will be eager to look at the
> > PL310 code ;)
>
> Lucky for me, it looks like rmk is rewriting it ;)
>
> Is there anything you want me to do here?
Yes. I would actually been interested in a summary/conclusion of the
discussions, because I'm not sure what I'm supposed to do now. So there
are two parts here:
* Mapping PCIe I/O regions as strongly ordered.
* Disabling outer_cache sync in the PL310 driver.
Some discussion has been going on with Catalin, I've answered some
questions by providing more details to Catalin, but in the end, I don't
know if the patches I provided are acceptable or not, and if not, in
what way they should be changed.
Thanks!
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310
2014-04-09 14:06 ` Catalin Marinas
@ 2014-05-06 10:07 ` Thomas Petazzoni
0 siblings, 0 replies; 15+ messages in thread
From: Thomas Petazzoni @ 2014-05-06 10:07 UTC (permalink / raw)
To: linux-arm-kernel
Dear Catalin Marinas,
On Wed, 9 Apr 2014 15:06:25 +0100, Catalin Marinas wrote:
> > In fact, I'm not sure using a separate compatible string is possible,
> > because there are situations where the hardware platform may be I/O
> > coherent, and some situations where it is not the case. For example, in
> > the current kernel, the platform is I/O coherent when CONFIG_SMP is
> > enabled, but not I/O coherent when CONFIG_SMP is disabled. And it's the
> > same hardware platform, so same Device Tree in both cases.
>
> I think Russell has a better solution in his L2 cache cleanup series.
> Patch 18/44 introduces a .fixup function which takes the outer_cache_fns
> pointer and that's a place where your code can check whether coherency
> is present or not (and turn .sync into NULL).
I had a look at the .fixup proposal, but I don't see how a .fixup
function, which is completely internal to the L2 cache driver, could
guess whether the system is coherent or not.
Thomas
--
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2014-05-06 10:07 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-24 16:17 [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 1/3] ARM: mm: allow sub-architectures to override PCI I/O memory type Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 2/3] ARM: mm: add support for HW coherent systems in PL310 Thomas Petazzoni
2014-04-08 17:24 ` Catalin Marinas
2014-04-08 18:12 ` Thomas Petazzoni
2014-04-09 11:15 ` Catalin Marinas
2014-04-09 14:06 ` Catalin Marinas
2014-05-06 10:07 ` Thomas Petazzoni
2014-03-24 16:17 ` [PATCH 3/3] ARM: mvebu: implement L2/PCIe deadlock workaround Thomas Petazzoni
2014-04-03 14:07 ` [PATCH 0/3] ARM: implement workaround for Cortex-A9/PL310/PCIe deadlock Thomas Petazzoni
2014-04-03 14:15 ` Russell King - ARM Linux
2014-04-07 9:10 ` Thomas Petazzoni
2014-04-07 12:13 ` Catalin Marinas
2014-04-14 16:59 ` Will Deacon
2014-04-14 18:39 ` Thomas Petazzoni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).