[PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
@ 2014-07-02 16:21 Thomas Petazzoni
  2014-07-02 16:21 ` [PATCH 1/3] ARM: mvebu: make the coherency_ll.S functions work with no coherency fabric Thomas Petazzoni
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-02 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Jason, Andrew, Gregory, Sebastian,

I believe you have seen the on-going discussion with Russell on how to
ensure that the relevant requirements for hardware I/O coherency are
meant in non-SMP situations. Since it appears it will take quite a bit
of time to find a proper solution to this problem, this patch series
proposes to simply disable hardware I/O coherency in problematic
configurations.

It has the consequence of completely disabling hardware I/O coherency
on Armada 370 (this processor is single-core, so it is never executed
in a SMP situation, as is_smp() returns false), and it disables
hardware I/O coherency for Armada XP in !CONFIG_SMP
configurations. For Armada 375 and 38x there is no change since they
were already using hardware I/O coherency only in SMP-enabled
situations.

The first patch is a preparation needed to not use the coherency
fabric at all on Armada 370/XP by making some assembly code accept
this situation.

The second patch actually implements the disabling by reworking the
coherency_type() function.

The third patch is a somewhat unrelated fix, just added in this patch
series because it touches the same coherency.c file.

Since the fact of having I/O coherency enabled when it should not can
cause infrequent but real data corruptions, I'd like to see the first
two patches backported all the way up to when I/O coherency support
was added, i.e in v3.8. I'm especially interested in seeing these
patches reach the longterm 3.10 kernel version, which many people are
using on Armada 370 systems today. As mentionned in the patch
comments, the patches will most likely not easily apply on older
kernel versions, but I'll provide the appropriate backports when
requested.

Some independent testing/review of this patch series would definitely
be welcome.

Thanks,

Thomas

Thomas Petazzoni (3):
  ARM: mvebu: make the coherency_ll.S functions work with no coherency
    fabric
  ARM: mvebu: disable I/O coherency on non-SMP situations on Armada
    370/XP
  ARM: mvebu: add missing of_node_put() call in coherency.c

 arch/arm/mach-mvebu/coherency.c    | 39 ++++++++++++++++++++++++--------------
 arch/arm/mach-mvebu/coherency_ll.S | 21 ++++++++++++++++++--
 2 files changed, 44 insertions(+), 16 deletions(-)

-- 
2.0.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/3] ARM: mvebu: make the coherency_ll.S functions work with no coherency fabric
  2014-07-02 16:21 [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Thomas Petazzoni
@ 2014-07-02 16:21 ` Thomas Petazzoni
  2014-07-02 16:21 ` [PATCH 2/3] ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/XP Thomas Petazzoni
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-02 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

The ll_add_cpu_to_smp_group(), ll_enable_coherency() and
ll_disable_coherency() are used on Armada XP to control the coherency
fabric. However, they make the assumption that the coherency fabric is
always available, which is currently a correct assumption but will no
longer be true with a followup commit that disables the usage of the
coherency fabric when the conditions are not met to use it.

Therefore, this commit modifies those functions so that they check the
return value of ll_get_coherency_base(), and if the return value is 0,
they simply return without configuring anything in the coherency
fabric.

The ll_get_coherency_base() function is also modified to properly
return 0 when the function is called with the MMU disabled. In this
case, it normally returns the physical address of the coherency
fabric, but we now check if the virtual address is 0, and if that's
case, return a physical address of 0 to indicate that the coherency
fabric is not enabled.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: <stable@vger.kernel.org> # v3.8+
---
Technically speaking, this patch is not needed up to 3.8, since the
code was completely different back then. However, the next commit has
to be backported up to 3.8, and relies on this patch.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/coherency_ll.S | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-mvebu/coherency_ll.S b/arch/arm/mach-mvebu/coherency_ll.S
index 510c29e..48fc0d8 100644
--- a/arch/arm/mach-mvebu/coherency_ll.S
+++ b/arch/arm/mach-mvebu/coherency_ll.S
@@ -24,7 +24,10 @@
 #include <asm/cp15.h>
 
 	.text
-/* Returns the coherency base address in r1 (r0 is untouched) */
+/*
+ * Returns the coherency base address in r1 (r0 is untouched), or 0 if
+ * the coherency fabric is not enabled.
+ */
 ENTRY(ll_get_coherency_base)
 	mrc	p15, 0, r1, c1, c0, 0
 	tst	r1, #CR_M @ Check MMU bit enabled
@@ -32,8 +35,13 @@ ENTRY(ll_get_coherency_base)
 
 	/*
 	 * MMU is disabled, use the physical address of the coherency
-	 * base address.
+	 * base address. However, if the coherency fabric isn't mapped
+	 * (i.e its virtual address is zero), it means coherency is
+	 * not enabled, so we return 0.
 	 */
+	ldr	r1, =coherency_base
+	cmp	r1, #0
+	beq	2f
 	adr	r1, 3f
 	ldr	r3, [r1]
 	ldr	r1, [r1, r3]
@@ -85,6 +93,9 @@ ENTRY(ll_add_cpu_to_smp_group)
 	 */
 	mov 	r0, lr
 	bl	ll_get_coherency_base
+	/* Bail out if the coherency is not enabled */
+	cmp	r1, #0
+	moveq	pc, r0
 	bl	ll_get_coherency_cpumask
 	mov 	lr, r0
 	add	r0, r1, #ARMADA_XP_CFB_CFG_REG_OFFSET
@@ -107,6 +118,9 @@ ENTRY(ll_enable_coherency)
 	 */
 	mov r0, lr
 	bl	ll_get_coherency_base
+	/* Bail out if the coherency is not enabled */
+	cmp	r1, #0
+	moveq	pc, r0
 	bl	ll_get_coherency_cpumask
 	mov lr, r0
 	add	r0, r1, #ARMADA_XP_CFB_CTL_REG_OFFSET
@@ -131,6 +145,9 @@ ENTRY(ll_disable_coherency)
 	 */
 	mov 	r0, lr
 	bl	ll_get_coherency_base
+	/* Bail out if the coherency is not enabled */
+	cmp	r1, #0
+	moveq	pc, r0
 	bl	ll_get_coherency_cpumask
 	mov 	lr, r0
 	add	r0, r1, #ARMADA_XP_CFB_CTL_REG_OFFSET
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/3] ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/XP
  2014-07-02 16:21 [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Thomas Petazzoni
  2014-07-02 16:21 ` [PATCH 1/3] ARM: mvebu: make the coherency_ll.S functions work with no coherency fabric Thomas Petazzoni
@ 2014-07-02 16:21 ` Thomas Petazzoni
  2014-07-02 16:21 ` [PATCH 3/3] ARM: mvebu: add missing of_node_put() call in coherency.c Thomas Petazzoni
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-02 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Enabling the hardware I/O coherency on Armada 370 and Armada XP
requires a certain number of conditions:

 - On Armada 370, the cache policy must be set to write-allocate.
 - On Armada XP, the cache policy must be set to write-allocate, the
   pages must be mapped with the shareable attribute, and the SMP bit
   must be set

Currently, on Armada XP, when CONFIG_SMP is enabled, those conditions
are met. However, when Armada XP is used in a !CONFIG_SMP kernel, none
of these conditions are met. With Armada 370, the situation is worse:
since the processor is single core, regardless of whether CONFIG_SMP
or !CONFIG_SMP is used, the cache policy will be set to write-back by
the kernel and not write-allocate.

Since solving this problem turns out to be quite complicated, and we
don't want to let users with a mainline kernel known to have
infrequent but existing data corruptions, this commit proposes to
simply disable hardware I/O coherency in situations where it is known
not to work.

And basically, the is_smp() function of the kernel tells us whether it
is OK to enable hardware I/O coherency or not, so this commit slightly
refactors the coherency_type() function to return
COHERENCY_FABRIC_TYPE_NONE when is_smp() is false, or the appropriate
type of the coherency fabric in the other case.

Thanks to this, the I/O coherency fabric will no longer be used at all
in !CONFIG_SMP configurations. It will continue to be used in
CONFIG_SMP configurations on Armada XP, Armada 375 and Armada 38x
(which are multiple cores processors), but will no longer be used on
Armada 370 (which is a single core processor).

In the process, it simplifies the implementation of the
coherency_type() function, and adds a missing call to of_node_put().

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Fixes: e60304f8cb7bb545e79fe62d9b9762460c254ec2 ("arm: mvebu: Add hardware I/O Coherency support")
Cc: <stable@vger.kernel.org> # v3.8+
---
Obviously, there have been numerous changes between 3.8 and 3.16 on
the coherency code, so this patch will hardly apply on older versions
of the kernel, but we will provide backport versions when requested.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/coherency.c | 37 +++++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 477202f..2260a40 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -357,25 +357,34 @@ static int coherency_type(void)
 {
 	struct device_node *np;
 	const struct of_device_id *match;
+	int type;

-	np = of_find_matching_node_and_match(NULL, of_coherency_table, &match);
-	if (np) {
-		int type = (int) match->data;
+	/*
+	 * The coherency fabric is needed:
+	 * - For coherency between processors on Armada XP, so only
+	 *   when SMP is enabled.
+	 * - For coherency between the processor and I/O devices, but
+	 *   this coherency requires many pre-requisites (write
+	 *   allocate cache policy, shareable pages, SMP bit set) that
+	 *   are only meant in SMP situations.
+	 *
+	 * Note that this means that on Armada 370, there is currently
+	 * no way to use hardware I/O coherency, because even when
+	 * CONFIG_SMP is enabled, is_smp() returns false due to the
+	 * Armada 370 being a single-core processor.
+	 */
+	if (!is_smp())
+		return COHERENCY_FABRIC_TYPE_NONE;

-		/* Armada 370/XP coherency works in both UP and SMP */
-		if (type == COHERENCY_FABRIC_TYPE_ARMADA_370_XP)
-			return type;
+	np = of_find_matching_node_and_match(NULL, of_coherency_table, &match);
+	if (!np)
+		return COHERENCY_FABRIC_TYPE_NONE;

-		/* Armada 375 coherency works only on SMP */
-		else if (type == COHERENCY_FABRIC_TYPE_ARMADA_375 && is_smp())
-			return type;
+	type = (int) match->data;

-		/* Armada 380 coherency works only on SMP */
-		else if (type == COHERENCY_FABRIC_TYPE_ARMADA_380 && is_smp())
-			return type;
-	}
+	of_node_put(np);

-	return COHERENCY_FABRIC_TYPE_NONE;
+	return type;
 }

 int coherency_available(void)
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/3] ARM: mvebu: add missing of_node_put() call in coherency.c
  2014-07-02 16:21 [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Thomas Petazzoni
  2014-07-02 16:21 ` [PATCH 1/3] ARM: mvebu: make the coherency_ll.S functions work with no coherency fabric Thomas Petazzoni
  2014-07-02 16:21 ` [PATCH 2/3] ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/XP Thomas Petazzoni
@ 2014-07-02 16:21 ` Thomas Petazzoni
  2014-07-02 16:27 ` [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Andrew Lunn
  2014-07-02 16:41 ` Russell King - ARM Linux
  4 siblings, 0 replies; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-02 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

There is a missing of_node_put() to decrement the device_node
reference counter after a of_find_matching_node() in coherency_init().

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
While this patch technically fixes a leak on a reference count, it
doesn't cause any visible issue, so I don't think it's worth have it
in the stable process. However it should be in 3.16-rc.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
---
 arch/arm/mach-mvebu/coherency.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/mach-mvebu/coherency.c b/arch/arm/mach-mvebu/coherency.c
index 2260a40..a9655e7 100644
--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -405,6 +405,8 @@ int __init coherency_init(void)
 		 type == COHERENCY_FABRIC_TYPE_ARMADA_380)
 		armada_375_380_coherency_init(np);
 
+	of_node_put(np);
+
 	return 0;
 }
 
-- 
2.0.0

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 16:21 [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Thomas Petazzoni
                   ` (2 preceding siblings ...)
  2014-07-02 16:21 ` [PATCH 3/3] ARM: mvebu: add missing of_node_put() call in coherency.c Thomas Petazzoni
@ 2014-07-02 16:27 ` Andrew Lunn
  2014-07-02 16:41 ` Russell King - ARM Linux
  4 siblings, 0 replies; 26+ messages in thread
From: Andrew Lunn @ 2014-07-02 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

> Some independent testing/review of this patch series would definitely
> be welcome.

Hi Thomas

I've not used these platforms much, and have no idea how to trigger
and then detect corruption. I guess Simon Guinot is your best bet for
testing.

	Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 16:21 [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Thomas Petazzoni
                   ` (3 preceding siblings ...)
  2014-07-02 16:27 ` [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Andrew Lunn
@ 2014-07-02 16:41 ` Russell King - ARM Linux
  2014-07-02 17:18   ` Thomas Petazzoni
  2014-07-17  8:24   ` Thomas Petazzoni
  4 siblings, 2 replies; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-02 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 02, 2014 at 06:21:07PM +0200, Thomas Petazzoni wrote:
> Jason, Andrew, Gregory, Sebastian,
> 
> I believe you have seen the on-going discussion with Russell on how to
> ensure that the relevant requirements for hardware I/O coherency are
> meant in non-SMP situations. Since it appears it will take quite a bit
> of time to find a proper solution to this problem, this patch series
> proposes to simply disable hardware I/O coherency in problematic
> configurations.

It is unfortunate, but I do feel that there's been something of an
excessive amount of pressure here - it seems to me that the I/O
coherency as merged and enabled, without thinking about the consequences.
When it was then realised that some fundamental core changes were
needed, it's somehow been turned into my fault that this stuff doesn't
work, and my problem to solve.

While it /is/ my problem to solve (or rather, ensure that it does get
solved in a way which is compliant with the architecture) I'm also
juggling several other things which leaves me not enough time to look
at this right now.

It seems that people think that I have oodles of spare time to be able
to jump onto their problem at a moments notice.  I don't.

I've already said that device tree and single zImage makes solving your
problem /much/ harder than it otherwise was.  Right now, I don't have
any solution to it in mind that would both satisfy the requirements of
the architecture _and_ allow the coherency stuff to operate as Marvell
wants it to.

Before device tree, we had the atags and the machine ID, with the
machine_desc discovered in the early boot code.  This would have
allowed us to add a hook there which identified your coherent platform
and adjusted the page tables appropriately to ensure that the coherency
stuff worked as intended.

We don't have that anymore (not even in non-device tree mode), and
device tree doesn't offer any replacement for it.

As I've frequently said, device tree is great for some things, but it
makes solving other problems insanely more difficult.  This is one of
those which is insanely more difficult because it just doesn't fit
the device tree model.

While it's regrettable to have to disable the coherency support, I think
that's the best way to proceed at the moment until we're able to find
some kind of solution.

The most important first step in that is working out how, in the early
assembly code, we can identify these Armada SoCs.  That's a task I
can't undertake because I know next to nothing about the SoCs you're
dealing with.  Yes, I know that Marvell released a TRM a short while
back, which is great, but not everyone has time to go around reading
every TRM which every silicon vendor releases into the public domain.

If it /is/ possible to detect the Armada SoCs in the early assembly,
then we can start to solve this.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 16:41 ` Russell King - ARM Linux
@ 2014-07-02 17:18   ` Thomas Petazzoni
  2014-07-02 17:58     ` Russell King - ARM Linux
  2014-07-17  8:24   ` Thomas Petazzoni
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-02 17:18 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Wed, 2 Jul 2014 17:41:47 +0100, Russell King - ARM Linux wrote:

> > I believe you have seen the on-going discussion with Russell on how to
> > ensure that the relevant requirements for hardware I/O coherency are
> > meant in non-SMP situations. Since it appears it will take quite a bit
> > of time to find a proper solution to this problem, this patch series
> > proposes to simply disable hardware I/O coherency in problematic
> > configurations.
> 
> It is unfortunate, but I do feel that there's been something of an
> excessive amount of pressure here - it seems to me that the I/O
> coherency as merged and enabled, without thinking about the consequences.
> When it was then realised that some fundamental core changes were
> needed, it's somehow been turned into my fault that this stuff doesn't
> work, and my problem to solve.

Oh, really, not all. This patch series is absolutely not an attempt at
putting any pressure on you, or on anybody else. Really, really, really
not.

We (me, Gregory, Ezequiel, and other Marvell folks) completely
overlooked the core requirements of hardware I/O coherency when we
initially merged the support for this feature. And only recently,
thanks to the work of Simon Guinot and other folks at LaCie, we
realized our mistake. So, I started working with you on a solution
acceptable upstream to meet the requirement for I/O coherency, and it
clearly turns out that the solution for that is not going to be simple
and easy.

So, this patch series is not an attempt to put any sort of pressure on
you. It is merely a recognition of something we (me, Gregory, Ezequiel
and other Marvell folks) did wrong, and the realization that since the
proper fix to this issue is going to take some time, it is better not
to let users with a known-broken kernel.

Actually, what prompted the posting of this patch series is the fact
that I received a private e-mail from a user of the mainline Linux
kernel on Armada 370 who was worried about our discussion around I/O
coherency and data corruption. This patch series is intended to
temporarily revert back to a situation where I/O coherency is not used
in situations where we know it isn't safe, just to ensure that users
don't fall into troubles.

And then it gives us all the time we need to work out together the
appropriate solution to ultimately re-enable I/O coherency on these
problematic configuration.

So really, Russell, trust me about the intention of this patch series:
there is no intent to put pressure on anyone, only a temporary solution
to make the kernel sane today, and give us time to work out the
long-term solution. I would even rather say that it actually *reduces*
the pressure on solving the initial problem :-)

> While it /is/ my problem to solve (or rather, ensure that it does get
> solved in a way which is compliant with the architecture) I'm also
> juggling several other things which leaves me not enough time to look
> at this right now.
> 
> It seems that people think that I have oodles of spare time to be able
> to jump onto their problem at a moments notice.  I don't.

See above. I'm definitely not throwing the problem at you. I believe
that despite my lack of deep knowledge of the ARM core code base, I've
made several attempts at proposing some solutions, and trying to work
out with you some solutions that might be appropriate. So I am clearly
not trying to put all the work on you.

The insistent e-mails from the past days/weeks on getting an answer
where not calls for you to do the coding work, but only calls for you
to give me a little bit of guidance in the direction to follow to make
progress with this problem. Nothing more.

> I've already said that device tree and single zImage makes solving your
> problem /much/ harder than it otherwise was.  Right now, I don't have
> any solution to it in mind that would both satisfy the requirements of
> the architecture _and_ allow the coherency stuff to operate as Marvell
> wants it to.
> 
> Before device tree, we had the atags and the machine ID, with the
> machine_desc discovered in the early boot code.  This would have
> allowed us to add a hook there which identified your coherent platform
> and adjusted the page tables appropriately to ensure that the coherency
> stuff worked as intended.
> 
> We don't have that anymore (not even in non-device tree mode), and
> device tree doesn't offer any replacement for it.
> 
> As I've frequently said, device tree is great for some things, but it
> makes solving other problems insanely more difficult.  This is one of
> those which is insanely more difficult because it just doesn't fit
> the device tree model.

Yes, I do understand. In the mean time, I've asked Marvell internally
to see if there is a CP15-based way of differentiating their I/O
coherency capable Cortex-A9 from the usual Cortex-A9.

> While it's regrettable to have to disable the coherency support, I think
> that's the best way to proceed at the moment until we're able to find
> some kind of solution.

Sure, see above :)

> The most important first step in that is working out how, in the early
> assembly code, we can identify these Armada SoCs.  That's a task I
> can't undertake because I know next to nothing about the SoCs you're
> dealing with.  Yes, I know that Marvell released a TRM a short while
> back, which is great, but not everyone has time to go around reading
> every TRM which every silicon vendor releases into the public domain.

Absolutely.

> If it /is/ possible to detect the Armada SoCs in the early assembly,
> then we can start to solve this.

I think there are confusions about several problems here:

 1) The Armada 370 and Armada XP are perfectly detectable in the early
    assembly, since they have different values in the main
    identification register (they use PJ4B and PJ4B-MP cores). But for
    the Armada 370/XP, there is still the open question on whether the
    TTB flags should match the PMD flags in terms of caching policy
    attribute and shareability attribute. That's *completely* unrelated
    to detecting the SoC in the early assembly, and having the answer
    to this question would allow us to potentially re-enable I/O
    coherency on Armada 370 (both CONFIG_SMP and !CONFIG_SMP) and Armada
    XP (for !CONFIG_SMP)

 2) The Armada 375 and 38x are Cortex-A9, so it's only for those SoCs
    that we have the potential issue of differentiating those
    I/O-coherency capable A9 from the non-I/O-coherency capable A9 in
    the early assembly code.

If we could first make progress about problem (1), it would be great.
Then we can have a look about problem (2).

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 17:18   ` Thomas Petazzoni
@ 2014-07-02 17:58     ` Russell King - ARM Linux
  2014-07-02 20:28       ` Thomas Petazzoni
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-02 17:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 02, 2014 at 07:18:32PM +0200, Thomas Petazzoni wrote:
>  1) The Armada 370 and Armada XP are perfectly detectable in the early
>     assembly, since they have different values in the main
>     identification register (they use PJ4B and PJ4B-MP cores).

You say that, but Dove is PJ4B too (I forget whether it's PJ4B or not.)
The only way to be sure is to compare the ID numbers.  The AP510 in
Dove I have here has an ID register value of 0x560f5815.  This ties up
with the PJ4B entry in proc-v7.S:

        .type   __v7_pj4b_proc_info, #object
__v7_pj4b_proc_info:
        .long   0x560f5800
        .long   0xff0fff00
        __v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
        .size   __v7_pj4b_proc_info, . - __v7_pj4b_proc_info

How does that compare to the ID you see on Armada 370 and XP?  (I've
no idea what IDs you get there...)  If the ID is different, then we
should be able to solve this pretty easily for these CPUs by creating
a new proc_info record as I think I have previously suggested.

With a separate record, we can set the S bit if we need to, and we can
also set the write-allocate mode too, now that the patch I sent you
is in mainline (which is something I definitely indicated along with
that patch.)

I've not really said anything new in this email which I haven't said
before, with the exception of stating what the ID is for the Dove SoC
I have.  That's the only ID I know, and I assume that those working on
mvebu have a better idea what ID numbers are found across all the
families.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 17:58     ` Russell King - ARM Linux
@ 2014-07-02 20:28       ` Thomas Petazzoni
  2014-07-02 21:33         ` Russell King - ARM Linux
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-02 20:28 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Wed, 2 Jul 2014 18:58:57 +0100, Russell King - ARM Linux wrote:

> You say that, but Dove is PJ4B too (I forget whether it's PJ4B or not.)
> The only way to be sure is to compare the ID numbers.  The AP510 in
> Dove I have here has an ID register value of 0x560f5815.  This ties up
> with the PJ4B entry in proc-v7.S:
> 
>         .type   __v7_pj4b_proc_info, #object
> __v7_pj4b_proc_info:
>         .long   0x560f5800
>         .long   0xff0fff00
>         __v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
>         .size   __v7_pj4b_proc_info, . - __v7_pj4b_proc_info
> 
> How does that compare to the ID you see on Armada 370 and XP?  (I've
> no idea what IDs you get there...)  If the ID is different, then we
> should be able to solve this pretty easily for these CPUs by creating
> a new proc_info record as I think I have previously suggested.

The ID for Armada 370 is 0x560f5811, but the last digit is the "Product
Revision Number", so I'm not sure it can be used to distinguish Dove
from Armada 370. However, I believe we could simply run Dove in a
Write-Allocate cache policy.

> With a separate record, we can set the S bit if we need to, and we can
> also set the write-allocate mode too, now that the patch I sent you
> is in mainline (which is something I definitely indicated along with
> that patch.)
> 
> I've not really said anything new in this email which I haven't said
> before, with the exception of stating what the ID is for the Dove SoC
> I have.  That's the only ID I know, and I assume that those working on
> mvebu have a better idea what ID numbers are found across all the
> families.

Russell, please again what I've asked numerous times. I already wrote
the patches that creates specific proc_info structures for Armada 370
and Armada XP to set the cache policy and S bit, as you suggested. But
this is only setting the PMD flags, but doesn't touch the TTB flags,
which remain only defined by ALT_UP/ALT_SMP conditional. And I remember
Catalin saying that having PMD flags varying from the TTB flags could
be a problem.

That is the question I'm asking to you since several e-mails that you
never answered, and that is completely unrelated to having separate
proc_info structure, because the TTB flags are *not* defined in the
proc_info structure.

Sorry to be a bit harsh here but you seem to assume that I'm an idiot
who didn't read what you said, and you simply repeat again and again
that your patch already solves the problem by allowing the proc_info
structure to define the PMD flags. But you never answered my question
about the TTB flags, which is the question I've been asking since
several e-mails already.

See the two attached patches for what I've already done for Armada 370
and Armada XP using the proc_info structures. Writing those patches
lead me to the problem of the TTB flags, which remains unanswered.

Thanks for your help!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ARM-mm-split-PJ4B-and-PJ4B-MP.patch
Type: text/x-patch
Size: 1231 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140702/c8f38af7/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-ARM-mm-set-appropriate-mm_mmuflags-for-PJ4B-variants.patch
Type: text/x-patch
Size: 1222 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140702/c8f38af7/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 20:28       ` Thomas Petazzoni
@ 2014-07-02 21:33         ` Russell King - ARM Linux
  2014-07-02 21:50           ` Thomas Petazzoni
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-02 21:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 02, 2014 at 10:28:17PM +0200, Thomas Petazzoni wrote:
> Russell,
> 
> On Wed, 2 Jul 2014 18:58:57 +0100, Russell King - ARM Linux wrote:
> 
> > You say that, but Dove is PJ4B too (I forget whether it's PJ4B or not.)
> > The only way to be sure is to compare the ID numbers.  The AP510 in
> > Dove I have here has an ID register value of 0x560f5815.  This ties up
> > with the PJ4B entry in proc-v7.S:
> > 
> >         .type   __v7_pj4b_proc_info, #object
> > __v7_pj4b_proc_info:
> >         .long   0x560f5800
> >         .long   0xff0fff00
> >         __v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
> >         .size   __v7_pj4b_proc_info, . - __v7_pj4b_proc_info
> > 
> > How does that compare to the ID you see on Armada 370 and XP?  (I've
> > no idea what IDs you get there...)  If the ID is different, then we
> > should be able to solve this pretty easily for these CPUs by creating
> > a new proc_info record as I think I have previously suggested.
> 
> The ID for Armada 370 is 0x560f5811, but the last digit is the "Product
> Revision Number", so I'm not sure it can be used to distinguish Dove
> from Armada 370. However, I believe we could simply run Dove in a
> Write-Allocate cache policy.
> 
> > With a separate record, we can set the S bit if we need to, and we can
> > also set the write-allocate mode too, now that the patch I sent you
> > is in mainline (which is something I definitely indicated along with
> > that patch.)
> > 
> > I've not really said anything new in this email which I haven't said
> > before, with the exception of stating what the ID is for the Dove SoC
> > I have.  That's the only ID I know, and I assume that those working on
> > mvebu have a better idea what ID numbers are found across all the
> > families.
> 
> Russell, please again what I've asked numerous times. I already wrote
> the patches that creates specific proc_info structures for Armada 370
> and Armada XP to set the cache policy and S bit, as you suggested. But
> this is only setting the PMD flags, but doesn't touch the TTB flags,
> which remain only defined by ALT_UP/ALT_SMP conditional. And I remember
> Catalin saying that having PMD flags varying from the TTB flags could
> be a problem.

That's where having a separate processor struct comes in (which we
already have for the pj4b) and having a separate switch_mm which knows
to set the TTB flags to write allocate.

> Sorry to be a bit harsh here but you seem to assume that I'm an idiot
> who didn't read what you said, and you simply repeat again and again
> that your patch already solves the problem by allowing the proc_info
> structure to define the PMD flags. But you never answered my question
> about the TTB flags, which is the question I've been asking since
> several e-mails already.

You're assuming that I remember all the details each time... I know
that you have sent me several times all the details about what each
Armada device requires, and each time I've briefly read it and thought
"yes, I must get to that" and then never have done.  That's why we're
not making progress - I haven't had the time to read and digest your
messages properly.  For example, if you asked me right now what
Armada XP requires, whether it's SMP or not, I really couldn't tell
you.

> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 6e1b2da..310834e 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -483,14 +483,14 @@ __v7_ca9mp_proc_info:
>  __v7_pj4b_proc_info:
>  	.long	0x560f5810
>  	.long	0xff0ffff0
> -	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
> +	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions, mm_mmuflags = PMD_SECT_WBWA
>  	.size	__v7_pj4b_proc_info, . - __v7_pj4b_proc_info
>  
>  	.type   __v7_pj4b_mp_proc_info, #object
>  __v7_pj4b_mp_proc_info:
>  	.long	0x560f5840
>  	.long	0xff0ffff0
> -	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions
> +	__v7_proc __v7_pj4b_setup, proc_fns = pj4b_processor_functions, mm_mmuflags = PMD_SECT_WBWA|PMD_SECT_S
>  	.size	__v7_pj4b_mp_proc_info, . - __v7_pj4b_mp_proc_info
>  #endif

Right, so you're using pj4b_processor_functions here.

#ifdef CONFIG_CPU_PJ4B
        globl_equ       cpu_pj4b_switch_mm,     cpu_v7_switch_mm
        globl_equ       cpu_pj4b_set_pte_ext,   cpu_v7_set_pte_ext
        globl_equ       cpu_pj4b_proc_init,     cpu_v7_proc_init

So we need to delete the first (switch_mm) so we can replace it.  I
think we may need a second set of processor functions to deal with
the MP version as well.  Most of these would need to be aliased
to the cpu_pj4b versions, except for the switch_mm one too.

You may wish to consider changing switch_mm to look something like the
following.  I'm assuming that for pj4b you just need WBWA for both
the inner and the outer, but not the shared bit - you'll need to change
that if it needs something else:

ENTRY(cpu_pj4b_switch_mm)
	orr	r0, r0, #TTB_IRGN_WBWA | TTB_RGN_OC_WBWA
	b	1f
ENTRY(cpu_pj4b_mp_switch_mm)
	orr	r0, r0, #TTB_FLAGS_SMP
	b	1f

ENTRY(cpu_v7_switch_mm)
#ifdef CONFIG_MMU
        ALT_SMP(orr     r0, r0, #TTB_FLAGS_SMP)
        ALT_UP(orr      r0, r0, #TTB_FLAGS_UP)
1:      mmid    r1, r1                          @ get mm->context.id
        mov     r2, #0
#ifdef CONFIG_ARM_ERRATA_430973
        mcr     p15, 0, r2, c7, c5, 6           @ flush BTAC/BTB
#endif
...
ENDPROC(cpu_v7_switch_mm)
ENDPROC(cpu_pj4b_switch_mm)
ENDPROC(cpu_pj4b_mp_switch_mm)

That gets rid of the one in the switch_mm path.  The setup
function is much more icky.

I think __v7_setup needs splitting up - especially due to all the
errata in there.  I think we ought to add a whole load of processor
proc_info's - for ARM Cortex-A8, Cortex-A9, Cortex-A15 etc.  Each
one calls a separate setup function which runs only the appropriate
errata fixups for that core (which eliminates a load of testing in
__v7_setup).

This also means we can split the v7_ttb_setup macro and change it
to be per CPU.  (I'm not entirely convinced right now that TTB1
isn't initialised with an undefined value - I can't see where r8
is setup amongst the mess that this has turned into.

I'm not sure what this means for the 3level v7 code - that's something
I've never looked at except a brief glance when it went in.  The
v7_ttb_setup there looks rather complicated and might make the changes
I outlined above much harder.  That also isn't something I can play
around with because I've never been able to get Linux running on the
Versatile Express with the CPU card that would support that code.  We
may need to enlist Catalin's help to modify that code.

That's about the best guidance I can give this evening, and I hope
it helps you.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 21:33         ` Russell King - ARM Linux
@ 2014-07-02 21:50           ` Thomas Petazzoni
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-02 21:50 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Wed, 2 Jul 2014 22:33:40 +0100, Russell King - ARM Linux wrote:

> That's where having a separate processor struct comes in (which we
> already have for the pj4b) and having a separate switch_mm which knows
> to set the TTB flags to write allocate.

Ah, okay.

> You're assuming that I remember all the details each time... I know
> that you have sent me several times all the details about what each
> Armada device requires, and each time I've briefly read it and thought
> "yes, I must get to that" and then never have done.  That's why we're
> not making progress - I haven't had the time to read and digest your
> messages properly.  For example, if you asked me right now what
> Armada XP requires, whether it's SMP or not, I really couldn't tell
> you.

That's surely something I understand perfectly. It took me a while to
get in my head all the requirements of the various SoCs, so I certainly
understand that someone not working on a daily with only those specific
SoCs can easily lose track of what each SoC requires.

Maybe I should write up a summary document/web-page of the requirements
of each SoC, with their identifier values, all progressively add all
the relevant details as the discussion progresses.

> Right, so you're using pj4b_processor_functions here.
> 
> #ifdef CONFIG_CPU_PJ4B
>         globl_equ       cpu_pj4b_switch_mm,     cpu_v7_switch_mm
>         globl_equ       cpu_pj4b_set_pte_ext,   cpu_v7_set_pte_ext
>         globl_equ       cpu_pj4b_proc_init,     cpu_v7_proc_init
> 
> So we need to delete the first (switch_mm) so we can replace it.  I
> think we may need a second set of processor functions to deal with
> the MP version as well.  Most of these would need to be aliased
> to the cpu_pj4b versions, except for the switch_mm one too.
> 
> You may wish to consider changing switch_mm to look something like the
> following.  I'm assuming that for pj4b you just need WBWA for both
> the inner and the outer, but not the shared bit - you'll need to change
> that if it needs something else:
> 
> ENTRY(cpu_pj4b_switch_mm)
> 	orr	r0, r0, #TTB_IRGN_WBWA | TTB_RGN_OC_WBWA
> 	b	1f
> ENTRY(cpu_pj4b_mp_switch_mm)
> 	orr	r0, r0, #TTB_FLAGS_SMP
> 	b	1f
> 	
> ENTRY(cpu_v7_switch_mm)
> #ifdef CONFIG_MMU
>         ALT_SMP(orr     r0, r0, #TTB_FLAGS_SMP)
>         ALT_UP(orr      r0, r0, #TTB_FLAGS_UP)
> 1:      mmid    r1, r1                          @ get mm->context.id
>         mov     r2, #0
> #ifdef CONFIG_ARM_ERRATA_430973
>         mcr     p15, 0, r2, c7, c5, 6           @ flush BTAC/BTB
> #endif
> ...
> ENDPROC(cpu_v7_switch_mm)
> ENDPROC(cpu_pj4b_switch_mm)
> ENDPROC(cpu_pj4b_mp_switch_mm)
> 
> That gets rid of the one in the switch_mm path.  The setup
> function is much more icky.
> 
> I think __v7_setup needs splitting up - especially due to all the
> errata in there.  I think we ought to add a whole load of processor
> proc_info's - for ARM Cortex-A8, Cortex-A9, Cortex-A15 etc.  Each
> one calls a separate setup function which runs only the appropriate
> errata fixups for that core (which eliminates a load of testing in
> __v7_setup).
> 
> This also means we can split the v7_ttb_setup macro and change it
> to be per CPU.  (I'm not entirely convinced right now that TTB1
> isn't initialised with an undefined value - I can't see where r8
> is setup amongst the mess that this has turned into.
> 
> I'm not sure what this means for the 3level v7 code - that's something
> I've never looked at except a brief glance when it went in.  The
> v7_ttb_setup there looks rather complicated and might make the changes
> I outlined above much harder.  That also isn't something I can play
> around with because I've never been able to get Linux running on the
> Versatile Express with the CPU card that would support that code.  We
> may need to enlist Catalin's help to modify that code.
> 
> That's about the best guidance I can give this evening, and I hope
> it helps you.

Wow, thanks a lot for all those details. It'll also take me a little
while to digest these informations, and see what I can implement based
on that.

Thanks again!

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-02 16:41 ` Russell King - ARM Linux
  2014-07-02 17:18   ` Thomas Petazzoni
@ 2014-07-17  8:24   ` Thomas Petazzoni
  2014-07-17  8:33     ` Russell King - ARM Linux
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-17  8:24 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

Jumping back to this topic.

On Wed, 2 Jul 2014 17:41:47 +0100, Russell King - ARM Linux wrote:

> The most important first step in that is working out how, in the early
> assembly code, we can identify these Armada SoCs.  That's a task I
> can't undertake because I know next to nothing about the SoCs you're
> dealing with.  Yes, I know that Marvell released a TRM a short while
> back, which is great, but not everyone has time to go around reading
> every TRM which every silicon vendor releases into the public domain.
> 
> If it /is/ possible to detect the Armada SoCs in the early assembly,
> then we can start to solve this.

Instead of having the absolute requirement to detect the Armada SoC in
the early assembly code, wouldn't it be possible to re-adjust the page
tables afterwards?

If I understand correctly, we are already changing the page tables
anyway, to switch certain pages to be mapped uncached, to do DMA
coherent allocations, no? So wouldn't it be possible to keep the early
assembly code as it is today, and then later, in C code, once we have
identified the platform on which we're running, modify the page tables
to switch to the appropriate page attributes (write allocate cache
policy, shareable attribute, etc.) ?

Remember that this special page attribute configuration is only needed
to provide hardware I/O coherency, i.e when we start doing DMA
transfers. It is therefore possible to do the change of the page
attributes in for example ->init_early(), which happens before any DMA
transfer has taken place.

What would be the complexity of achieving this? Would there be a way to
handle the TTBR register value as well?

Thanks,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17  8:24   ` Thomas Petazzoni
@ 2014-07-17  8:33     ` Russell King - ARM Linux
  2014-07-17  8:45       ` Thomas Petazzoni
                         ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-17  8:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 10:24:25AM +0200, Thomas Petazzoni wrote:
> If I understand correctly, we are already changing the page tables
> anyway, to switch certain pages to be mapped uncached, to do DMA
> coherent allocations, no?

I've no idea, I never looked at that code.  I hope that Marek has
considered the requirements of the architecture when creating that
code...

> So wouldn't it be possible to keep the early
> assembly code as it is today, and then later, in C code, once we have
> identified the platform on which we're running, modify the page tables
> to switch to the appropriate page attributes (write allocate cache
> policy, shareable attribute, etc.) ?

No - the problem is that we're running from the page table in question
with global mappings, and we need to switch all these mappings, including
the ones we're currently using to execute from.

We can't even create a new page table and switch to it because the
mappings in question are global mappings.

The only way to do that safely from an architectural point of view would
be to turn the MMU off, and drop back to assembly code to change the
page tables, and re-enable the MMU.  For something as obscure as Marvell's
coherency stuff, that's not something I want to see in core code.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17  8:33     ` Russell King - ARM Linux
@ 2014-07-17  8:45       ` Thomas Petazzoni
  2014-07-17 10:27         ` Russell King - ARM Linux
  2014-07-17 10:16       ` Russell King - ARM Linux
  2014-07-17 12:39       ` Arnd Bergmann
  2 siblings, 1 reply; 26+ messages in thread
From: Thomas Petazzoni @ 2014-07-17  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

Russell,

On Thu, 17 Jul 2014 09:33:42 +0100, Russell King - ARM Linux wrote:

> > So wouldn't it be possible to keep the early
> > assembly code as it is today, and then later, in C code, once we have
> > identified the platform on which we're running, modify the page tables
> > to switch to the appropriate page attributes (write allocate cache
> > policy, shareable attribute, etc.) ?
> 
> No - the problem is that we're running from the page table in question
> with global mappings, and we need to switch all these mappings, including
> the ones we're currently using to execute from.
> 
> We can't even create a new page table and switch to it because the
> mappings in question are global mappings.
> 
> The only way to do that safely from an architectural point of view would
> be to turn the MMU off, and drop back to assembly code to change the
> page tables, and re-enable the MMU.  For something as obscure as Marvell's
> coherency stuff, that's not something I want to see in core code.

That's indeed tricky. But then, if we don't find solutions to support
in mainline the features of those SoCs, why are we asking all those SoC
vendors to push their code mainline?

As far as I can see, the conclusion of this discussion seems to be that
there is no solution that looks acceptable to you to support the
hardware I/O coherency functionality in mainline. I am trying hard to
get Marvell to stop using their evil vendor tree as much as possible,
but the message we're sending here is basically exactly the opposite:
we're telling them that there's no way to properly support hardware
I/O coherency in mainline and that the only solution for them is to keep
their evil vendor tree. Is this really what we want?

Best regards,

Thomas
-- 
Thomas Petazzoni, CTO, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17  8:33     ` Russell King - ARM Linux
  2014-07-17  8:45       ` Thomas Petazzoni
@ 2014-07-17 10:16       ` Russell King - ARM Linux
  2014-07-17 12:39       ` Arnd Bergmann
  2 siblings, 0 replies; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-17 10:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 09:33:42AM +0100, Russell King - ARM Linux wrote:
> On Thu, Jul 17, 2014 at 10:24:25AM +0200, Thomas Petazzoni wrote:
> > If I understand correctly, we are already changing the page tables
> > anyway, to switch certain pages to be mapped uncached, to do DMA
> > coherent allocations, no?
> 
> I've no idea, I never looked at that code.  I hope that Marek has
> considered the requirements of the architecture when creating that
> code...

On this, it appears that (confirmed by Will) the DMA code is indeed buggy
in that it doesn't take account of the possibility of mismatched aliases.

This was raised before in this thread:

http://archive.arm.linux.org.uk/lurker/thread/20120922.052207.ff853126.en.html

but it was claimed that because it's done very early, it's safe.  That's
not really good enough - what the code relies upon is the hope that the
CPU will not speculatively prefetch from the area being modified.  While
that's unlikely, it's not impossible - and when if it were to happen
mid-update, then we could end up with the TLB containing both a section
mapping _and_ a page mapping.

I suspect that the only reason we haven't seen issues is that we haven't
had seen such aggressive speculation yet.

The code in principle is doing the right thing by clearing the section
mappings first.  What has been forgotten is that if speculative prefetches
have already happened, the TLB may well be populated, and so it needs a
TLB flush immediately after clearing the section mappings with pmd_clear().

Will Deacon agrees with me on this... so, CMA is buggy in this respect.

The reason this can't be done for coherency becomes obvious - in order to
make this change, we would need to clear the section mapping, flush it
from the TLB, and then create the new section mapping.  If the section
mapping we're modifying in that way happens to be the one which maps the
code performing that update, or the one which contains the page table,
than kaboom...

That's why I said that the only alternative is to turn the MMU off.
There are really only two choices here: either detect the platform early
in assembly where we can avoid this issue completely, or turn the MMU
off, update the page tables from assembly code, and then turn the MMU
back on and resume executing C code.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17  8:45       ` Thomas Petazzoni
@ 2014-07-17 10:27         ` Russell King - ARM Linux
  0 siblings, 0 replies; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-17 10:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 10:45:12AM +0200, Thomas Petazzoni wrote:
> Russell,
> 
> On Thu, 17 Jul 2014 09:33:42 +0100, Russell King - ARM Linux wrote:
> 
> > > So wouldn't it be possible to keep the early
> > > assembly code as it is today, and then later, in C code, once we have
> > > identified the platform on which we're running, modify the page tables
> > > to switch to the appropriate page attributes (write allocate cache
> > > policy, shareable attribute, etc.) ?
> > 
> > No - the problem is that we're running from the page table in question
> > with global mappings, and we need to switch all these mappings, including
> > the ones we're currently using to execute from.
> > 
> > We can't even create a new page table and switch to it because the
> > mappings in question are global mappings.
> > 
> > The only way to do that safely from an architectural point of view would
> > be to turn the MMU off, and drop back to assembly code to change the
> > page tables, and re-enable the MMU.  For something as obscure as Marvell's
> > coherency stuff, that's not something I want to see in core code.
> 
> That's indeed tricky. But then, if we don't find solutions to support
> in mainline the features of those SoCs, why are we asking all those SoC
> vendors to push their code mainline?

As I said to you, this would not have been a problem if it weren't for
us moving to device tree, and single zImage.  By doing that, we've made
various problems such as the one you are experiencing several orders of
magnitude more difficult to solve.

Let me illustrate why this is the case.  With the old ATAGs + machine ID
solution, the assembly code used to look up the machine ID in the table
of built-in machines.  Once located, we'd have access to any information
stored in there.

We could have added an entry to that to indicate what level of Marvell
coherency workarounds were required, which could have been read by the
assembly page table setup code to set the page table attributes correctly.

This would have been /really/ easy to do, and would be simple to implement.

Instead, with DT, we have a massive problem instead, because we can't
parse the DT from assembly code to work out which platform we're running
on, which means the early assembly code can't adjust itself according to
the content of DT.

I don't have an answer for this other than those which I've already
stated... and I again feel like I'm being blamed for this issue.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17  8:33     ` Russell King - ARM Linux
  2014-07-17  8:45       ` Thomas Petazzoni
  2014-07-17 10:16       ` Russell King - ARM Linux
@ 2014-07-17 12:39       ` Arnd Bergmann
  2014-07-17 13:12         ` Russell King - ARM Linux
  2 siblings, 1 reply; 26+ messages in thread
From: Arnd Bergmann @ 2014-07-17 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 17 July 2014 09:33:42 Russell King - ARM Linux wrote:
> On Thu, Jul 17, 2014 at 10:24:25AM +0200, Thomas Petazzoni wrote:
> > If I understand correctly, we are already changing the page tables
> > anyway, to switch certain pages to be mapped uncached, to do DMA
> > coherent allocations, no?
> 
> I've no idea, I never looked at that code.  I hope that Marek has
> considered the requirements of the architecture when creating that
> code...
> 
> > So wouldn't it be possible to keep the early
> > assembly code as it is today, and then later, in C code, once we have
> > identified the platform on which we're running, modify the page tables
> > to switch to the appropriate page attributes (write allocate cache
> > policy, shareable attribute, etc.) ?
> 
> No - the problem is that we're running from the page table in question
> with global mappings, and we need to switch all these mappings, including
> the ones we're currently using to execute from.
> 
> We can't even create a new page table and switch to it because the
> mappings in question are global mappings.
> 
> The only way to do that safely from an architectural point of view would
> be to turn the MMU off, and drop back to assembly code to change the
> page tables, and re-enable the MMU.  For something as obscure as Marvell's
> coherency stuff, that's not something I want to see in core code.

Is this different from what we do in the LPAE version of
early_paging_init()? That code already adjusts all the page
table entries on a per-platform setting and should be very
easy to extend for a modified procinfo->__cpu_mm_mmu_flags,
and possibly able to extend for traditional (non-LPAE)
page tables without a lot of duplication.

At the moment it only handles a platform-specific undetectable
PHYS_OFFSET for mach-keystone, which to me sounds equally
hacky as the platform-specific undetectable pmdprot value
for mvebu.

Is there anything I'm missing why one works and the other doesn't?

	Arnd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 12:39       ` Arnd Bergmann
@ 2014-07-17 13:12         ` Russell King - ARM Linux
  2014-07-17 14:27           ` Will Deacon
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-17 13:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 02:39:42PM +0200, Arnd Bergmann wrote:
> On Thursday 17 July 2014 09:33:42 Russell King - ARM Linux wrote:
> > No - the problem is that we're running from the page table in question
> > with global mappings, and we need to switch all these mappings, including
> > the ones we're currently using to execute from.
> > 
> > We can't even create a new page table and switch to it because the
> > mappings in question are global mappings.
> > 
> > The only way to do that safely from an architectural point of view would
> > be to turn the MMU off, and drop back to assembly code to change the
> > page tables, and re-enable the MMU.  For something as obscure as Marvell's
> > coherency stuff, that's not something I want to see in core code.
> 
> Is this different from what we do in the LPAE version of
> early_paging_init()? That code already adjusts all the page
> table entries on a per-platform setting and should be very
> easy to extend for a modified procinfo->__cpu_mm_mmu_flags,
> and possibly able to extend for traditional (non-LPAE)
> page tables without a lot of duplication.

I'm going to have to profess no knowledge of how the LPAE stuff works.
LPAE is something I can't run, and something that I was not involved
in its creation.  Therefore, I know very little about it and zero
experience thereof.

>From a glance over that function, I think that is unsafe for all the
reasons I've stated.  However, I'll pass this over to Will Deacon
to comment further on.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 13:12         ` Russell King - ARM Linux
@ 2014-07-17 14:27           ` Will Deacon
  2014-07-17 14:40             ` Arnd Bergmann
  2014-07-28 17:46             ` Will Deacon
  0 siblings, 2 replies; 26+ messages in thread
From: Will Deacon @ 2014-07-17 14:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 02:12:24PM +0100, Russell King - ARM Linux wrote:
> On Thu, Jul 17, 2014 at 02:39:42PM +0200, Arnd Bergmann wrote:
> > On Thursday 17 July 2014 09:33:42 Russell King - ARM Linux wrote:
> > > No - the problem is that we're running from the page table in question
> > > with global mappings, and we need to switch all these mappings, including
> > > the ones we're currently using to execute from.
> > > 
> > > We can't even create a new page table and switch to it because the
> > > mappings in question are global mappings.
> > > 
> > > The only way to do that safely from an architectural point of view would
> > > be to turn the MMU off, and drop back to assembly code to change the
> > > page tables, and re-enable the MMU.  For something as obscure as Marvell's
> > > coherency stuff, that's not something I want to see in core code.
> > 
> > Is this different from what we do in the LPAE version of
> > early_paging_init()? That code already adjusts all the page
> > table entries on a per-platform setting and should be very
> > easy to extend for a modified procinfo->__cpu_mm_mmu_flags,
> > and possibly able to extend for traditional (non-LPAE)
> > page tables without a lot of duplication.
> 
> I'm going to have to profess no knowledge of how the LPAE stuff works.
> LPAE is something I can't run, and something that I was not involved
> in its creation.  Therefore, I know very little about it and zero
> experience thereof.
> 
> From a glance over that function, I think that is unsafe for all the
> reasons I've stated.  However, I'll pass this over to Will Deacon
> to comment further on.

Just an FYI, but I've had to check internally to clarify the behaviour
around TLB conflict aborts. We're changing live page tables here, but the
*only* thing to differ is the output address, which I would hope is ok but
need to check to be sure.

The ARM ARM recommends always going via an invalid mapping, but that's not
always the most practical thing to do.

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 14:27           ` Will Deacon
@ 2014-07-17 14:40             ` Arnd Bergmann
  2014-07-17 15:34               ` Russell King - ARM Linux
  2014-07-28 17:46             ` Will Deacon
  1 sibling, 1 reply; 26+ messages in thread
From: Arnd Bergmann @ 2014-07-17 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 17 July 2014 15:27:20 Will Deacon wrote:
> On Thu, Jul 17, 2014 at 02:12:24PM +0100, Russell King - ARM Linux wrote:
> > On Thu, Jul 17, 2014 at 02:39:42PM +0200, Arnd Bergmann wrote:
> > > On Thursday 17 July 2014 09:33:42 Russell King - ARM Linux wrote:
> > > > No - the problem is that we're running from the page table in question
> > > > with global mappings, and we need to switch all these mappings, including
> > > > the ones we're currently using to execute from.
> > > > 
> > > > We can't even create a new page table and switch to it because the
> > > > mappings in question are global mappings.
> > > > 
> > > > The only way to do that safely from an architectural point of view would
> > > > be to turn the MMU off, and drop back to assembly code to change the
> > > > page tables, and re-enable the MMU.  For something as obscure as Marvell's
> > > > coherency stuff, that's not something I want to see in core code.
> > > 
> > > Is this different from what we do in the LPAE version of
> > > early_paging_init()? That code already adjusts all the page
> > > table entries on a per-platform setting and should be very
> > > easy to extend for a modified procinfo->__cpu_mm_mmu_flags,
> > > and possibly able to extend for traditional (non-LPAE)
> > > page tables without a lot of duplication.
> > 
> > I'm going to have to profess no knowledge of how the LPAE stuff works.
> > LPAE is something I can't run, and something that I was not involved
> > in its creation.  Therefore, I know very little about it and zero
> > experience thereof.
> > 
> > From a glance over that function, I think that is unsafe for all the
> > reasons I've stated.  However, I'll pass this over to Will Deacon
> > to comment further on.
> 
> Just an FYI, but I've had to check internally to clarify the behaviour
> around TLB conflict aborts. We're changing live page tables here, but the
> *only* thing to differ is the output address, which I would hope is ok but
> need to check to be sure.

The two cases are slightly different:

- The existing keystone code changes the output address but not the flags
- The hypothetical mvebu code needs to change the flags but not the address.

	Arnd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 14:40             ` Arnd Bergmann
@ 2014-07-17 15:34               ` Russell King - ARM Linux
  2014-07-17 15:53                 ` Arnd Bergmann
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-17 15:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 04:40:21PM +0200, Arnd Bergmann wrote:
> On Thursday 17 July 2014 15:27:20 Will Deacon wrote:
> > Just an FYI, but I've had to check internally to clarify the behaviour
> > around TLB conflict aborts. We're changing live page tables here, but the
> > *only* thing to differ is the output address, which I would hope is ok but
> > need to check to be sure.
> 
> The two cases are slightly different:
> 
> - The existing keystone code changes the output address but not the flags
> - The hypothetical mvebu code needs to change the flags but not the address.

Don't base too much confidence in the keystone code:

        map_start = init_mm.start_code;
        map_end   = init_mm.brk;

        phys = __pa(map_start) & PMD_MASK;
        do {
...
                phys += PMD_SIZE;
        } while (phys < map_end);

Consider what happens when map_end is not PMD aligned - it misses the
PMD covering the end of the kernel BSS, which will be quite a common
case.  Obviously something that needs fixing...

-- 
FTTC broadband for 0.8mile line: currently@9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 15:34               ` Russell King - ARM Linux
@ 2014-07-17 15:53                 ` Arnd Bergmann
  2014-07-17 16:04                   ` Santosh Shilimkar
  0 siblings, 1 reply; 26+ messages in thread
From: Arnd Bergmann @ 2014-07-17 15:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 17 July 2014 16:34:01 Russell King - ARM Linux wrote:
> On Thu, Jul 17, 2014 at 04:40:21PM +0200, Arnd Bergmann wrote:
> > On Thursday 17 July 2014 15:27:20 Will Deacon wrote:
> > > Just an FYI, but I've had to check internally to clarify the behaviour
> > > around TLB conflict aborts. We're changing live page tables here, but the
> > > *only* thing to differ is the output address, which I would hope is ok but
> > > need to check to be sure.
> > 
> > The two cases are slightly different:
> > 
> > - The existing keystone code changes the output address but not the flags
> > - The hypothetical mvebu code needs to change the flags but not the address.
> 
> Don't base too much confidence in the keystone code:

What I meant was that verifying the correctness of one of the two above
with the architecture team doesn't mean that the other one is okay too.

>         map_start = init_mm.start_code;
>         map_end   = init_mm.brk;
> 
>         phys = __pa(map_start) & PMD_MASK;
>         do {
> ...
>                 phys += PMD_SIZE;
>         } while (phys < map_end);
> 
> Consider what happens when map_end is not PMD aligned - it misses the
> PMD covering the end of the kernel BSS, which will be quite a common
> case.  Obviously something that needs fixing...

Ouch.

	Arnd

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 15:53                 ` Arnd Bergmann
@ 2014-07-17 16:04                   ` Santosh Shilimkar
  2014-07-17 16:10                     ` Russell King - ARM Linux
  0 siblings, 1 reply; 26+ messages in thread
From: Santosh Shilimkar @ 2014-07-17 16:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 17 July 2014 11:53 AM, Arnd Bergmann wrote:
> On Thursday 17 July 2014 16:34:01 Russell King - ARM Linux wrote:
>> On Thu, Jul 17, 2014 at 04:40:21PM +0200, Arnd Bergmann wrote:
>>> On Thursday 17 July 2014 15:27:20 Will Deacon wrote:
>>>> Just an FYI, but I've had to check internally to clarify the behaviour
>>>> around TLB conflict aborts. We're changing live page tables here, but the
>>>> *only* thing to differ is the output address, which I would hope is ok but
>>>> need to check to be sure.
>>>
>>> The two cases are slightly different:
>>>
>>> - The existing keystone code changes the output address but not the flags
>>> - The hypothetical mvebu code needs to change the flags but not the address.
>>
>> Don't base too much confidence in the keystone code:
> 
> What I meant was that verifying the correctness of one of the two above
> with the architecture team doesn't mean that the other one is okay too.
> 
>>         map_start = init_mm.start_code;
>>         map_end   = init_mm.brk;
>>
>>         phys = __pa(map_start) & PMD_MASK;
>>         do {
>> ...
>>                 phys += PMD_SIZE;
>>         } while (phys < map_end);
>>
>> Consider what happens when map_end is not PMD aligned - it misses the
>> PMD covering the end of the kernel BSS, which will be quite a common
>> case.  Obviously something that needs fixing...
> 
> Ouch.
> 
Thanks for spotting it RMK. Will have a loot at it.

regards,
Santosh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 16:04                   ` Santosh Shilimkar
@ 2014-07-17 16:10                     ` Russell King - ARM Linux
  2014-07-17 16:18                       ` Santosh Shilimkar
  0 siblings, 1 reply; 26+ messages in thread
From: Russell King - ARM Linux @ 2014-07-17 16:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 12:04:26PM -0400, Santosh Shilimkar wrote:
> Thanks for spotting it RMK. Will have a loot at it.

I already have this for it - which includes a lot more comments on the
code (some in anticipation of the reply from the architecture people.)
The first hunk alone should fix the spotted problem.

diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index ab14b79b03f0..917aabd6b2dc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1406,8 +1406,8 @@ void __init early_paging_init(const struct machine_desc *mdesc,
 		return;
 
 	/* remap kernel code and data */
-	map_start = init_mm.start_code;
-	map_end   = init_mm.brk;
+	map_start = init_mm.start_code & PMD_MASK;
+	map_end   = ALIGN(init_mm.brk, PMD_SIZE);
 
 	/* get a handle on things... */
 	pgd0 = pgd_offset_k(0);
@@ -1434,23 +1434,60 @@ void __init early_paging_init(const struct machine_desc *mdesc,
 	dsb(ishst);
 	isb();
 
-	/* remap level 1 table */
+	/*
+	 * WARNING: This code is not architecturally compliant: we modify
+	 * the mappings in-place, indeed while they are in use by this
+	 * very same code.
+	 *
+	 * Even modifying the mappings in a separate page table does
+	 * not resolve this.
+	 *
+	 * The architecture strongly recommends that when a mapping is
+	 * changed, that it is changed by first going via an invalid
+	 * mapping and back to the new mapping.  This is to ensure
+	 * that no TLB conflicts (caused by the TLB having more than
+	 * one TLB entry match a translation) can occur.
+	 */
+
+	/*
+	 * Remap level 1 table.  This changes the physical addresses
+	 * used to refer to the level 2 page tables to the high
+	 * physical address alias, leaving everything else the same.
+	 */
 	for (i = 0; i < PTRS_PER_PGD; pud0++, i++) {
 		set_pud(pud0,
 			__pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER));
 		pmd0 += PTRS_PER_PMD;
 	}
 
-	/* remap pmds for kernel mapping */
-	phys = __pa(map_start) & PMD_MASK;
+	/*
+	 * Remap the level 2 table, pointing the mappings at the high
+	 * physical address alias of these pages.
+	 */
+	phys = __pa(map_start);
 	do {
 		*pmdk++ = __pmd(phys | pmdprot);
 		phys += PMD_SIZE;
 	} while (phys < map_end);
 
+	/*
+	 * Ensure that the above updates are flushed out of the cache.
+	 * This is not strictly correct; on a system where the caches
+	 * are coherent with each other, but the MMU page table walks
+	 * may not be coherent, flush_cache_all() may be a no-op, and
+	 * this will fail.
+	 */
 	flush_cache_all();
+
+	/*
+	 * Re-write the TTBR values to point them at the high physical
+	 * alias of the page tables.  We expect __va() will work on
+	 * cpu_get_pgd(), which returns the value of TTBR0.
+	 */
 	cpu_switch_mm(pgd0, &init_mm);
 	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
+
+	/* Finally flush any stale TLB values. */
 	local_flush_bp_all();
 	local_flush_tlb_all();
 }


-- 
FTTC broadband for 0.8mile line: currently@9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 16:10                     ` Russell King - ARM Linux
@ 2014-07-17 16:18                       ` Santosh Shilimkar
  0 siblings, 0 replies; 26+ messages in thread
From: Santosh Shilimkar @ 2014-07-17 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 17 July 2014 12:10 PM, Russell King - ARM Linux wrote:
> On Thu, Jul 17, 2014 at 12:04:26PM -0400, Santosh Shilimkar wrote:
>> Thanks for spotting it RMK. Will have a loot at it.
> 
> I already have this for it - which includes a lot more comments on the
> code (some in anticipation of the reply from the architecture people.)
> The first hunk alone should fix the spotted problem.
> 
Cool. The alignment fix was easy but I was more worried about the
other part. Thanks for the those additional comments. 

> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index ab14b79b03f0..917aabd6b2dc 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1406,8 +1406,8 @@ void __init early_paging_init(const struct machine_desc *mdesc,
>  		return;
>  
>  	/* remap kernel code and data */
> -	map_start = init_mm.start_code;
> -	map_end   = init_mm.brk;
> +	map_start = init_mm.start_code & PMD_MASK;
> +	map_end   = ALIGN(init_mm.brk, PMD_SIZE);
>  
>  	/* get a handle on things... */
>  	pgd0 = pgd_offset_k(0);
> @@ -1434,23 +1434,60 @@ void __init early_paging_init(const struct machine_desc *mdesc,
>  	dsb(ishst);
>  	isb();
>  
> -	/* remap level 1 table */
> +	/*
> +	 * WARNING: This code is not architecturally compliant: we modify
> +	 * the mappings in-place, indeed while they are in use by this
> +	 * very same code.
> +	 *
> +	 * Even modifying the mappings in a separate page table does
> +	 * not resolve this.
> +	 *
> +	 * The architecture strongly recommends that when a mapping is
> +	 * changed, that it is changed by first going via an invalid
> +	 * mapping and back to the new mapping.  This is to ensure
> +	 * that no TLB conflicts (caused by the TLB having more than
> +	 * one TLB entry match a translation) can occur.
> +	 */
> +
> +	/*
> +	 * Remap level 1 table.  This changes the physical addresses
> +	 * used to refer to the level 2 page tables to the high
> +	 * physical address alias, leaving everything else the same.
> +	 */
>  	for (i = 0; i < PTRS_PER_PGD; pud0++, i++) {
>  		set_pud(pud0,
>  			__pud(__pa(pmd0) | PMD_TYPE_TABLE | L_PGD_SWAPPER));
>  		pmd0 += PTRS_PER_PMD;
>  	}
>  
> -	/* remap pmds for kernel mapping */
> -	phys = __pa(map_start) & PMD_MASK;
> +	/*
> +	 * Remap the level 2 table, pointing the mappings at the high
> +	 * physical address alias of these pages.
> +	 */
> +	phys = __pa(map_start);
>  	do {
>  		*pmdk++ = __pmd(phys | pmdprot);
>  		phys += PMD_SIZE;
>  	} while (phys < map_end);
>  
> +	/*
> +	 * Ensure that the above updates are flushed out of the cache.
> +	 * This is not strictly correct; on a system where the caches
> +	 * are coherent with each other, but the MMU page table walks
> +	 * may not be coherent, flush_cache_all() may be a no-op, and
> +	 * this will fail.
> +	 */
>  	flush_cache_all();
> +
> +	/*
> +	 * Re-write the TTBR values to point them at the high physical
> +	 * alias of the page tables.  We expect __va() will work on
> +	 * cpu_get_pgd(), which returns the value of TTBR0.
> +	 */
>  	cpu_switch_mm(pgd0, &init_mm);
>  	cpu_set_ttbr(1, __pa(pgd0) + TTBR1_OFFSET);
> +
> +	/* Finally flush any stale TLB values. */
>  	local_flush_bp_all();
>  	local_flush_tlb_all();
>  }
> 
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP
  2014-07-17 14:27           ` Will Deacon
  2014-07-17 14:40             ` Arnd Bergmann
@ 2014-07-28 17:46             ` Will Deacon
  1 sibling, 0 replies; 26+ messages in thread
From: Will Deacon @ 2014-07-28 17:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jul 17, 2014 at 03:27:20PM +0100, Will Deacon wrote:
> On Thu, Jul 17, 2014 at 02:12:24PM +0100, Russell King - ARM Linux wrote:
> > On Thu, Jul 17, 2014 at 02:39:42PM +0200, Arnd Bergmann wrote:
> > > On Thursday 17 July 2014 09:33:42 Russell King - ARM Linux wrote:
> > > > No - the problem is that we're running from the page table in question
> > > > with global mappings, and we need to switch all these mappings, including
> > > > the ones we're currently using to execute from.
> > > > 
> > > > We can't even create a new page table and switch to it because the
> > > > mappings in question are global mappings.
> > > > 
> > > > The only way to do that safely from an architectural point of view would
> > > > be to turn the MMU off, and drop back to assembly code to change the
> > > > page tables, and re-enable the MMU.  For something as obscure as Marvell's
> > > > coherency stuff, that's not something I want to see in core code.
> > > 
> > > Is this different from what we do in the LPAE version of
> > > early_paging_init()? That code already adjusts all the page
> > > table entries on a per-platform setting and should be very
> > > easy to extend for a modified procinfo->__cpu_mm_mmu_flags,
> > > and possibly able to extend for traditional (non-LPAE)
> > > page tables without a lot of duplication.
> > 
> > I'm going to have to profess no knowledge of how the LPAE stuff works.
> > LPAE is something I can't run, and something that I was not involved
> > in its creation.  Therefore, I know very little about it and zero
> > experience thereof.
> > 
> > From a glance over that function, I think that is unsafe for all the
> > reasons I've stated.  However, I'll pass this over to Will Deacon
> > to comment further on.
> 
> Just an FYI, but I've had to check internally to clarify the behaviour
> around TLB conflict aborts. We're changing live page tables here, but the
> *only* thing to differ is the output address, which I would hope is ok but
> need to check to be sure.
> 
> The ARM ARM recommends always going via an invalid mapping, but that's not
> always the most practical thing to do.

Ok, the answer from the architects is that we really shouldn't be doing
this. If we have two mappings for the same VA, then we run the risk of a
TLB conflict abort on implementations that can detect it. If we don't get
the abort, we can still have other things silently go wrong like exclusive
accesses, memory-ordering but, more importantly, it places us square in
UNPREDICTABLE territory, which isn't a friendly place to be.

So, we should fix the code we currently have for keystone so that it avoids
this possibility and avoid duplicating it elsewhere before we've done that.

I think Russell had a patch adding a comment, but I'd go further and add a
WARN_ON(1) because this is only going to be a source of subtle,
hard-to-debug issues that we'll really struggle to get to the bottom of.

Will

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-07-28 17:46 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-02 16:21 [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Thomas Petazzoni
2014-07-02 16:21 ` [PATCH 1/3] ARM: mvebu: make the coherency_ll.S functions work with no coherency fabric Thomas Petazzoni
2014-07-02 16:21 ` [PATCH 2/3] ARM: mvebu: disable I/O coherency on non-SMP situations on Armada 370/XP Thomas Petazzoni
2014-07-02 16:21 ` [PATCH 3/3] ARM: mvebu: add missing of_node_put() call in coherency.c Thomas Petazzoni
2014-07-02 16:27 ` [PATCH 0/3] ARM: mvebu: disable I/O coherency on !SMP Andrew Lunn
2014-07-02 16:41 ` Russell King - ARM Linux
2014-07-02 17:18   ` Thomas Petazzoni
2014-07-02 17:58     ` Russell King - ARM Linux
2014-07-02 20:28       ` Thomas Petazzoni
2014-07-02 21:33         ` Russell King - ARM Linux
2014-07-02 21:50           ` Thomas Petazzoni
2014-07-17  8:24   ` Thomas Petazzoni
2014-07-17  8:33     ` Russell King - ARM Linux
2014-07-17  8:45       ` Thomas Petazzoni
2014-07-17 10:27         ` Russell King - ARM Linux
2014-07-17 10:16       ` Russell King - ARM Linux
2014-07-17 12:39       ` Arnd Bergmann
2014-07-17 13:12         ` Russell King - ARM Linux
2014-07-17 14:27           ` Will Deacon
2014-07-17 14:40             ` Arnd Bergmann
2014-07-17 15:34               ` Russell King - ARM Linux
2014-07-17 15:53                 ` Arnd Bergmann
2014-07-17 16:04                   ` Santosh Shilimkar
2014-07-17 16:10                     ` Russell King - ARM Linux
2014-07-17 16:18                       ` Santosh Shilimkar
2014-07-28 17:46             ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).