* [PATCH] ARM: l2c: fix support for HW coherent systems in PL310 cache
@ 2015-08-11 17:05 Gregory CLEMENT
2015-08-11 17:26 ` Catalin Marinas
0 siblings, 1 reply; 4+ messages in thread
From: Gregory CLEMENT @ 2015-08-11 17:05 UTC (permalink / raw)
To: linux-arm-kernel
From: Nadav Haklai <nadavh@marvell.com>
When a PL310 cache is used in a system that provides hardware
coherency, the entire outer cache operations are useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
controller and the Cortex-A9.
This commit extends a previous commit:
98ea2dba65932ffc456b6d7b11b8a0624e2f7b95 which added the io-coherent
support for the PL310 cache by also disabling the outer cache flush
range operation.
In the current kernel implementation, the outer cache flush range
operation is triggered by the dma_alloc function.
This operation can be take place during runtime and in some
circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x
SoCs.
[gregory.clement at free-electrons.com]: fix title s/mm/l2c/, add the
stable flag
Signed-off-by: Nadav Haklai <nadavh@marvell.com>
Reviewed-by: Ofer Heifetz <oferh@marvell.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Fixes: 98ea2dba6593 ("ARM: 8076/1: mm: add support for HW coherent
systems in PL310 cache")
Cc: <stable@vger.kernel.org>
---
arch/arm/mm/cache-l2x0.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 71b3d3309024..5c3148675597 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1266,9 +1266,9 @@ static const struct l2c_init_data of_l2c310_data __initconst = {
};
/*
- * This is a variant of the of_l2c310_data with .sync set to
- * NULL. Outer sync operations are not needed when the system is I/O
- * coherent, and potentially harmful in certain situations (PCIe/PL310
+ * This is a variant of the of_l2c310_data with .sync and .flush_range set to
+ * NULL. Outer sync and flush range operations are not needed when the system
+ * is I/O coherent, and potentially harmful in certain situations (PCIe/PL310
* deadlock on Armada 375/38x due to hardware I/O coherency). The
* other operations are kept because they are infrequent (therefore do
* not cause the deadlock in practice) and needed for secondary CPU
@@ -1287,7 +1287,6 @@ static const struct l2c_init_data of_l2c310_coherent_data __initconst = {
.outer_cache = {
.inv_range = l2c210_inv_range,
.clean_range = l2c210_clean_range,
- .flush_range = l2c210_flush_range,
.flush_all = l2c210_flush_all,
.disable = l2c310_disable,
.resume = l2c310_resume,
--
2.1.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH] ARM: l2c: fix support for HW coherent systems in PL310 cache
2015-08-11 17:05 [PATCH] ARM: l2c: fix support for HW coherent systems in PL310 cache Gregory CLEMENT
@ 2015-08-11 17:26 ` Catalin Marinas
2015-08-12 8:14 ` Gregory CLEMENT
0 siblings, 1 reply; 4+ messages in thread
From: Catalin Marinas @ 2015-08-11 17:26 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Aug 11, 2015 at 07:05:43PM +0200, Gregory CLEMENT wrote:
> From: Nadav Haklai <nadavh@marvell.com>
>
> When a PL310 cache is used in a system that provides hardware
> coherency, the entire outer cache operations are useless, and can be
> skipped. Moreover, on some systems, it is harmful as it causes
> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> controller and the Cortex-A9.
>
> This commit extends a previous commit:
> 98ea2dba65932ffc456b6d7b11b8a0624e2f7b95 which added the io-coherent
> support for the PL310 cache by also disabling the outer cache flush
> range operation.
>
> In the current kernel implementation, the outer cache flush range
> operation is triggered by the dma_alloc function.
> This operation can be take place during runtime and in some
> circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x
> SoCs.
While this may work around the issue for your specific SoC, I think a
better fix is in DMA alloc code to avoid flushing caches for coherent
devices. This would be the __dma_clear_buffer() implementation which
isn't aware of whether the device is coherent or not.
--
Catalin
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] ARM: l2c: fix support for HW coherent systems in PL310 cache
2015-08-11 17:26 ` Catalin Marinas
@ 2015-08-12 8:14 ` Gregory CLEMENT
2015-08-12 16:57 ` Catalin Marinas
0 siblings, 1 reply; 4+ messages in thread
From: Gregory CLEMENT @ 2015-08-12 8:14 UTC (permalink / raw)
To: linux-arm-kernel
Hi Catalin,
On 11/08/2015 19:26, Catalin Marinas wrote:
> On Tue, Aug 11, 2015 at 07:05:43PM +0200, Gregory CLEMENT wrote:
>> From: Nadav Haklai <nadavh@marvell.com>
>>
>> When a PL310 cache is used in a system that provides hardware
>> coherency, the entire outer cache operations are useless, and can be
>> skipped. Moreover, on some systems, it is harmful as it causes
>> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
>> controller and the Cortex-A9.
>>
>> This commit extends a previous commit:
>> 98ea2dba65932ffc456b6d7b11b8a0624e2f7b95 which added the io-coherent
>> support for the PL310 cache by also disabling the outer cache flush
>> range operation.
>>
>> In the current kernel implementation, the outer cache flush range
>> operation is triggered by the dma_alloc function.
>> This operation can be take place during runtime and in some
>> circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x
>> SoCs.
>
> While this may work around the issue for your specific SoC, I think a
> better fix is in DMA alloc code to avoid flushing caches for coherent
> devices. This would be the __dma_clear_buffer() implementation which
> isn't aware of whether the device is coherent or not.
Indeed, the other use of the outer cache flush range is done pretty
early in the boot and should not be a problem.
What do you think of the following patch?
I am not totally happy with it because I added a variable to a struct
dedicated for the outer cache functions. But using a function at the
l2cc driver level would be too much. At the opposite we could directly
set a boolean from the l2cc driver. From my point of view it would
violate the layer but I might be wrong.
Thanks,
Gregory
arch/arm/include/asm/outercache.h | 9 +++++++++
arch/arm/mm/cache-l2x0.c | 1 +
arch/arm/mm/dma-mapping.c | 6 ++++--
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h
index 563b92fc2f41..7f7bbfcf1d32 100644
--- a/arch/arm/include/asm/outercache.h
+++ b/arch/arm/include/asm/outercache.h
@@ -35,6 +35,7 @@ struct outer_cache_fns {
void (*sync)(void);
#endif
void (*resume)(void);
+ bool is_coherent;
/* This is an ARM L2C thing */
void (*write_sec)(unsigned long, unsigned);
@@ -56,6 +57,14 @@ static inline void outer_inv_range(phys_addr_t start, phys_addr_t end)
}
/**
+ * outer_is_coherent - tell if the outer cache is io coherent
+ */
+static inline bool outer_is_coherent(void)
+{
+ return outer_cache.is_coherent;
+}
+
+/**
* outer_clean_range - clean dirty outer cache lines
* @start: starting physical address, inclusive
* @end: end physical address, exclusive
diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
index 71b3d3309024..0071e276adc0 100644
--- a/arch/arm/mm/cache-l2x0.c
+++ b/arch/arm/mm/cache-l2x0.c
@@ -1291,6 +1291,7 @@ static const struct l2c_init_data of_l2c310_coherent_data __initconst = {
.flush_all = l2c210_flush_all,
.disable = l2c310_disable,
.resume = l2c310_resume,
+ .is_coherent = true,
},
};
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 1ced8a0f7a52..4d87df2c16f9 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -241,12 +241,14 @@ static void __dma_clear_buffer(struct page *page, size_t size)
page++;
size -= PAGE_SIZE;
}
- outer_flush_range(base, end);
+ if (!outer_is_coherent())
+ outer_flush_range(base, end);
} else {
void *ptr = page_address(page);
memset(ptr, 0, size);
dmac_flush_range(ptr, ptr + size);
- outer_flush_range(__pa(ptr), __pa(ptr) + size);
+ if (!outer_is_coherent())
+ outer_flush_range(__pa(ptr), __pa(ptr) + size);
}
}
--
Gregory Clement, Free Electrons
Kernel, drivers, real-time and embedded Linux
development, consulting, training and support.
http://free-electrons.com
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH] ARM: l2c: fix support for HW coherent systems in PL310 cache
2015-08-12 8:14 ` Gregory CLEMENT
@ 2015-08-12 16:57 ` Catalin Marinas
0 siblings, 0 replies; 4+ messages in thread
From: Catalin Marinas @ 2015-08-12 16:57 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Aug 12, 2015 at 10:14:25AM +0200, Gregory CLEMENT wrote:
> On 11/08/2015 19:26, Catalin Marinas wrote:
> > On Tue, Aug 11, 2015 at 07:05:43PM +0200, Gregory CLEMENT wrote:
> >> From: Nadav Haklai <nadavh@marvell.com>
> >>
> >> When a PL310 cache is used in a system that provides hardware
> >> coherency, the entire outer cache operations are useless, and can be
> >> skipped. Moreover, on some systems, it is harmful as it causes
> >> deadlocks between the Marvell coherency mechanism, the Marvell PCIe
> >> controller and the Cortex-A9.
> >>
> >> This commit extends a previous commit:
> >> 98ea2dba65932ffc456b6d7b11b8a0624e2f7b95 which added the io-coherent
> >> support for the PL310 cache by also disabling the outer cache flush
> >> range operation.
> >>
> >> In the current kernel implementation, the outer cache flush range
> >> operation is triggered by the dma_alloc function.
> >> This operation can be take place during runtime and in some
> >> circumstances may lead to the PCIe/PL310 deadlock on Armada 375/38x
> >> SoCs.
> >
> > While this may work around the issue for your specific SoC, I think a
> > better fix is in DMA alloc code to avoid flushing caches for coherent
> > devices. This would be the __dma_clear_buffer() implementation which
> > isn't aware of whether the device is coherent or not.
>
> Indeed, the other use of the outer cache flush range is done pretty
> early in the boot and should not be a problem.
>
> What do you think of the following patch?
[...]
> arch/arm/include/asm/outercache.h | 9 +++++++++
> arch/arm/mm/cache-l2x0.c | 1 +
> arch/arm/mm/dma-mapping.c | 6 ++++--
> 3 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/outercache.h b/arch/arm/include/asm/outercache.h
> index 563b92fc2f41..7f7bbfcf1d32 100644
> --- a/arch/arm/include/asm/outercache.h
> +++ b/arch/arm/include/asm/outercache.h
> @@ -35,6 +35,7 @@ struct outer_cache_fns {
> void (*sync)(void);
> #endif
> void (*resume)(void);
> + bool is_coherent;
>
> /* This is an ARM L2C thing */
> void (*write_sec)(unsigned long, unsigned);
> @@ -56,6 +57,14 @@ static inline void outer_inv_range(phys_addr_t start, phys_addr_t end)
> }
>
> /**
> + * outer_is_coherent - tell if the outer cache is io coherent
> + */
> +static inline bool outer_is_coherent(void)
> +{
> + return outer_cache.is_coherent;
> +}
> +
> +/**
> * outer_clean_range - clean dirty outer cache lines
> * @start: starting physical address, inclusive
> * @end: end physical address, exclusive
> diff --git a/arch/arm/mm/cache-l2x0.c b/arch/arm/mm/cache-l2x0.c
> index 71b3d3309024..0071e276adc0 100644
> --- a/arch/arm/mm/cache-l2x0.c
> +++ b/arch/arm/mm/cache-l2x0.c
> @@ -1291,6 +1291,7 @@ static const struct l2c_init_data of_l2c310_coherent_data __initconst = {
> .flush_all = l2c210_flush_all,
> .disable = l2c310_disable,
> .resume = l2c310_resume,
> + .is_coherent = true,
> },
> };
I don't really think we need these. If you need to check a device, we
already have is_device_dma_coherent(). Just pass bool argument to
__dma_clear_buffer() and fix the calling places. The __dma_alloc()
function also takes a "bool coherent" but in this case
__alloc_simple_buffer() is only ever used on coherent devices, so you
may not even need to check is_device_dma_coherent() (however, there are
some patches to add CMA support for coherent devices, I don't know
whether they are queued yet).
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 1ced8a0f7a52..4d87df2c16f9 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -241,12 +241,14 @@ static void __dma_clear_buffer(struct page *page, size_t size)
> page++;
> size -= PAGE_SIZE;
> }
> - outer_flush_range(base, end);
> + if (!outer_is_coherent())
> + outer_flush_range(base, end);
> } else {
> void *ptr = page_address(page);
> memset(ptr, 0, size);
> dmac_flush_range(ptr, ptr + size);
> - outer_flush_range(__pa(ptr), __pa(ptr) + size);
> + if (!outer_is_coherent())
> + outer_flush_range(__pa(ptr), __pa(ptr) + size);
> }
> }
This would make sense if we have a device which is outer coherent but
inner non-coherent, though I doubt this is the case you are trying to
fix.
--
Catalin
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-08-12 16:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-11 17:05 [PATCH] ARM: l2c: fix support for HW coherent systems in PL310 cache Gregory CLEMENT
2015-08-11 17:26 ` Catalin Marinas
2015-08-12 8:14 ` Gregory CLEMENT
2015-08-12 16:57 ` Catalin Marinas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).