* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
@ 2014-05-23 1:27 Marc Carino
2014-05-23 2:11 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Marc Carino @ 2014-05-23 1:27 UTC (permalink / raw)
To: linux-arm-kernel
The zImage loader will relocate the image if it determines that
decompression will overwrite its current location. Since the act
of relocation is basically a form of code self-modification, we
need to ensure that the CPU fetches the updated instruction stream.
Since cache maintenance is skipped during the relocation phase (the
MMUs and caches are off), we need to execute both a data sync
and instruction sync barrier prior to jumping to the relocated code.
Skipping the barriers can result in execution of stale prefetched
code, leading to hangs or an UNDEFINED INSTRUCTION exception.
Signed-off-by: Marc Carino <marc.ceeeee@gmail.com>
---
arch/arm/boot/compressed/head.S | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 066b034..df067ca 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -400,8 +400,16 @@ dtb_check_done:
add sp, sp, r6
#endif
+ /*
+ * Perform cache maintenance if caches were enabled earlier.
+ * Otherwise, execute data and instruction barriers prior to
+ * jumping to the newly-written code.
+ */
tst r4, #1
bleq cache_clean_flush
+ movne r0, #0
+ mcrne p15, 0, r0, c7, c10, 4 @ DSB
+ mcrne p15, 0, r0, c7, c5, 4 @ ISB
adr r0, BSYM(restart)
add r0, r0, r6
--
1.8.1.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
2014-05-23 1:27 [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code Marc Carino
@ 2014-05-23 2:11 ` Nicolas Pitre
2014-05-23 2:56 ` Marc Carino
0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2014-05-23 2:11 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 22 May 2014, Marc Carino wrote:
> The zImage loader will relocate the image if it determines that
> decompression will overwrite its current location. Since the act
> of relocation is basically a form of code self-modification, we
> need to ensure that the CPU fetches the updated instruction stream.
>
> Since cache maintenance is skipped during the relocation phase (the
> MMUs and caches are off),
Beware... This statement isn't true in most cases.
> we need to execute both a data sync
> and instruction sync barrier prior to jumping to the relocated code.
> Skipping the barriers can result in execution of stale prefetched
> code, leading to hangs or an UNDEFINED INSTRUCTION exception.
Are you sure of that? When the MMU is off, memory access is strongly
ordered. I'm wondering if instruction prefetching applies in that case.
Also, is this a problem that you actually experienced in practice?
> @@ -400,8 +400,16 @@ dtb_check_done:
> add sp, sp, r6
> #endif
>
> + /*
> + * Perform cache maintenance if caches were enabled earlier.
> + * Otherwise, execute data and instruction barriers prior to
> + * jumping to the newly-written code.
> + */
> tst r4, #1
> bleq cache_clean_flush
> + movne r0, #0
> + mcrne p15, 0, r0, c7, c10, 4 @ DSB
> + mcrne p15, 0, r0, c7, c5, 4 @ ISB
Are you sure those CP15 accesses are fine on all the CPU variants that
may execute this code? Officially we still support back to ARMv4
implementations.
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
2014-05-23 2:11 ` Nicolas Pitre
@ 2014-05-23 2:56 ` Marc Carino
2014-05-23 3:15 ` Nicolas Pitre
2014-06-02 9:53 ` Dave Martin
0 siblings, 2 replies; 8+ messages in thread
From: Marc Carino @ 2014-05-23 2:56 UTC (permalink / raw)
To: linux-arm-kernel
Hi Nicolas,
> Beware... This statement isn't true in most cases.
Ah, yes - I do have to reword that bit. (Per the booting document, the
I-cache can either be on or off upon entry, correct?)
> Are you sure of that? When the MMU is off, memory access is strongly
> ordered. I'm wondering if instruction prefetching applies in that
> case.
I'm unable to find a statement in the ARM architecture reference manual
(ARM ARM) supporting the need for the DSB, so upon second glance I think
I'll eliminate it.
Anyway, you are correct. While the MMU is off, data accesses are
assigned the strongly-ordered memory type. However for instruction
fetches, memory is assigned the Normal memory type (B3.2.1). Further,
the architecture allows for prefetching of instructions within the
current and subsequent 4K page from the currently-executing program
counter (B3.2.3).
Furthermore, I've encountered the "self-modifying code" example within
the ARM ARM, which implies that a pipeline flush is needed after
performing stores memory that is a part of the instruction stream.
Another interesting tidbit is that there id no specification of
coherency order between data and instructions for _any_ memory type
(A3.9.3). It is expected that the programmer to insert an ISB to ensure
instruction fetches are synced-up with the most-recent view of memory.
> Also, is this a problem that you actually experienced in practice?
Yes.
> Are you sure those CP15 accesses are fine on all the CPU variants
> that may execute this code? Officially we still support back to
> ARMv4 implementations.
No. I suppose I would have to leverage a lookup table scheme as is done
with the cache maintenance operations in order for this patch to be
portable?
Thank you,
Marc
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
2014-05-23 2:56 ` Marc Carino
@ 2014-05-23 3:15 ` Nicolas Pitre
2014-05-27 17:15 ` Marc Carino
2014-06-02 9:53 ` Dave Martin
1 sibling, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2014-05-23 3:15 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, 22 May 2014, Marc Carino wrote:
> Hi Nicolas,
>
> > Beware... This statement isn't true in most cases.
>
> Ah, yes - I do have to reword that bit. (Per the booting document, the
> I-cache can either be on or off upon entry, correct?)
Ah, yes.
> > Are you sure of that? When the MMU is off, memory access is strongly
> > ordered. I'm wondering if instruction prefetching applies in that
> > case.
>
> I'm unable to find a statement in the ARM architecture reference manual
> (ARM ARM) supporting the need for the DSB, so upon second glance I think
> I'll eliminate it.
>
> Anyway, you are correct. While the MMU is off, data accesses are
> assigned the strongly-ordered memory type. However for instruction
> fetches, memory is assigned the Normal memory type (B3.2.1). Further,
> the architecture allows for prefetching of instructions within the
> current and subsequent 4K page from the currently-executing program
> counter (B3.2.3).
>
> Furthermore, I've encountered the "self-modifying code" example within
> the ARM ARM, which implies that a pipeline flush is needed after
> performing stores memory that is a part of the instruction stream.
I assume that invalidating the i-cache would be a good thing to do in
all cases.
> Another interesting tidbit is that there id no specification of
> coherency order between data and instructions for _any_ memory type
> (A3.9.3). It is expected that the programmer to insert an ISB to ensure
> instruction fetches are synced-up with the most-recent view of memory.
>
> > Also, is this a problem that you actually experienced in practice?
>
> Yes.
Good to know.
I'm curious though: at what address do you load your zImage, and what
address does RAM start?
> > Are you sure those CP15 accesses are fine on all the CPU variants
> > that may execute this code? Officially we still support back to
> > ARMv4 implementations.
>
> No. I suppose I would have to leverage a lookup table scheme as is done
> with the cache maintenance operations in order for this patch to be
> portable?
I would think so.
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
2014-05-23 3:15 ` Nicolas Pitre
@ 2014-05-27 17:15 ` Marc Carino
0 siblings, 0 replies; 8+ messages in thread
From: Marc Carino @ 2014-05-27 17:15 UTC (permalink / raw)
To: linux-arm-kernel
Hello Nicolas,
> I assume that invalidating the i-cache would be a good thing to do in
> all cases.
Great! It looks like I'll be able to augment the existing cache function
lookup table with an additional I-cache invalidate operation.
> I'm curious though: at what address do you load your zImage, and what
> address does RAM start?
On our platform, we load the kernel zImage at address PA:0x8000. The
start of physical memory is at PA:0x0.
Thank you,
Marc
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
2014-05-23 2:56 ` Marc Carino
2014-05-23 3:15 ` Nicolas Pitre
@ 2014-06-02 9:53 ` Dave Martin
2014-07-18 17:32 ` Marc C
1 sibling, 1 reply; 8+ messages in thread
From: Dave Martin @ 2014-06-02 9:53 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, May 22, 2014 at 07:56:50PM -0700, Marc Carino wrote:
> Hi Nicolas,
>
> > Beware... This statement isn't true in most cases.
>
> Ah, yes - I do have to reword that bit. (Per the booting document, the
> I-cache can either be on or off upon entry, correct?)
>
> > Are you sure of that? When the MMU is off, memory access is strongly
> > ordered. I'm wondering if instruction prefetching applies in that
> > case.
>
> I'm unable to find a statement in the ARM architecture reference manual
> (ARM ARM) supporting the need for the DSB, so upon second glance I think
> I'll eliminate it.
I believe you won't find a statement in the ARM ARM supporting the view
that you _don't_ need a DSB either :)
I believe you need it. DSB is the only operation that can serialise
program order against the D-side for abitrary code.
Strongly-ordered memory on the D-side guarantees that D-side operations
cannot occur out of order with respect to this CPU, but there is still
no guarantee about how that relates to the I-side, even with ISB.
> Anyway, you are correct. While the MMU is off, data accesses are
> assigned the strongly-ordered memory type. However for instruction
> fetches, memory is assigned the Normal memory type (B3.2.1). Further,
> the architecture allows for prefetching of instructions within the
> current and subsequent 4K page from the currently-executing program
> counter (B3.2.3).
>
> Furthermore, I've encountered the "self-modifying code" example within
> the ARM ARM, which implies that a pipeline flush is needed after
> performing stores memory that is a part of the instruction stream.
>
> Another interesting tidbit is that there id no specification of
> coherency order between data and instructions for _any_ memory type
> (A3.9.3). It is expected that the programmer to insert an ISB to ensure
> instruction fetches are synced-up with the most-recent view of memory.
>
> > Also, is this a problem that you actually experienced in practice?
>
> Yes.
>
> > Are you sure those CP15 accesses are fine on all the CPU variants
> > that may execute this code? Officially we still support back to
> > ARMv4 implementations.
>
> No. I suppose I would have to leverage a lookup table scheme as is done
> with the cache maintenance operations in order for this patch to be
> portable?
How is what we're trying to achieve here different from the operation
that needs to be done in between decompressing the kernel and jumping
to it? Can't we just call cache_clean_flush?
Except that a possibly-redundant flush of the D-cache will be done,
this should do exactly the right thing.
This code is hairy to get right ... it would be better not to fragment
it unnecessarily.
Cheers
---Dave
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
2014-06-02 9:53 ` Dave Martin
@ 2014-07-18 17:32 ` Marc C
2014-07-18 18:08 ` Nicolas Pitre
0 siblings, 1 reply; 8+ messages in thread
From: Marc C @ 2014-07-18 17:32 UTC (permalink / raw)
To: linux-arm-kernel
Hi Dave,
My apologies for not addressing this prior to submitting my v2 of the patch.
> How is what we're trying to achieve here different from the operation
> that needs to be done in between decompressing the kernel and jumping
> to it? Can't we just call cache_clean_flush?
>
> Except that a possibly-redundant flush of the D-cache will be done,
> this should do exactly the right thing.
I agree - I don't see why we can't just perform a cache_clean_flush
prior to jumping to the relocated code. It seems like a simpler and less
error-prone solution.
-Marc
On 6/2/14, 2:53 AM, Dave Martin wrote:
> On Thu, May 22, 2014 at 07:56:50PM -0700, Marc Carino wrote:
>> Hi Nicolas,
>>
>>> Beware... This statement isn't true in most cases.
>>
>> Ah, yes - I do have to reword that bit. (Per the booting document, the
>> I-cache can either be on or off upon entry, correct?)
>>
>>> Are you sure of that? When the MMU is off, memory access is strongly
>>> ordered. I'm wondering if instruction prefetching applies in that
>>> case.
>>
>> I'm unable to find a statement in the ARM architecture reference manual
>> (ARM ARM) supporting the need for the DSB, so upon second glance I think
>> I'll eliminate it.
>
> I believe you won't find a statement in the ARM ARM supporting the view
> that you _don't_ need a DSB either :)
>
> I believe you need it. DSB is the only operation that can serialise
> program order against the D-side for abitrary code.
>
> Strongly-ordered memory on the D-side guarantees that D-side operations
> cannot occur out of order with respect to this CPU, but there is still
> no guarantee about how that relates to the I-side, even with ISB.
>
>> Anyway, you are correct. While the MMU is off, data accesses are
>> assigned the strongly-ordered memory type. However for instruction
>> fetches, memory is assigned the Normal memory type (B3.2.1). Further,
>> the architecture allows for prefetching of instructions within the
>> current and subsequent 4K page from the currently-executing program
>> counter (B3.2.3).
>>
>> Furthermore, I've encountered the "self-modifying code" example within
>> the ARM ARM, which implies that a pipeline flush is needed after
>> performing stores memory that is a part of the instruction stream.
>>
>> Another interesting tidbit is that there id no specification of
>> coherency order between data and instructions for _any_ memory type
>> (A3.9.3). It is expected that the programmer to insert an ISB to ensure
>> instruction fetches are synced-up with the most-recent view of memory.
>>
>>> Also, is this a problem that you actually experienced in practice?
>>
>> Yes.
>>
>>> Are you sure those CP15 accesses are fine on all the CPU variants
>>> that may execute this code? Officially we still support back to
>>> ARMv4 implementations.
>>
>> No. I suppose I would have to leverage a lookup table scheme as is done
>> with the cache maintenance operations in order for this patch to be
>> portable?
>
> How is what we're trying to achieve here different from the operation
> that needs to be done in between decompressing the kernel and jumping
> to it? Can't we just call cache_clean_flush?
>
> Except that a possibly-redundant flush of the D-cache will be done,
> this should do exactly the right thing.
>
> This code is hairy to get right ... it would be better not to fragment
> it unnecessarily.
>
> Cheers
> ---Dave
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code
2014-07-18 17:32 ` Marc C
@ 2014-07-18 18:08 ` Nicolas Pitre
0 siblings, 0 replies; 8+ messages in thread
From: Nicolas Pitre @ 2014-07-18 18:08 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, 18 Jul 2014, Marc C wrote:
> Hi Dave,
>
> My apologies for not addressing this prior to submitting my v2 of the patch.
>
> > How is what we're trying to achieve here different from the operation
> > that needs to be done in between decompressing the kernel and jumping
> > to it? Can't we just call cache_clean_flush?
> >
> > Except that a possibly-redundant flush of the D-cache will be done,
> > this should do exactly the right thing.
>
> I agree - I don't see why we can't just perform a cache_clean_flush
> prior to jumping to the relocated code. It seems like a simpler and less
> error-prone solution.
Agreed.
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-07-18 18:08 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-23 1:27 [PATCH] ARM: zImage: add DSB and ISB barriers after relocating code Marc Carino
2014-05-23 2:11 ` Nicolas Pitre
2014-05-23 2:56 ` Marc Carino
2014-05-23 3:15 ` Nicolas Pitre
2014-05-27 17:15 ` Marc Carino
2014-06-02 9:53 ` Dave Martin
2014-07-18 17:32 ` Marc C
2014-07-18 18:08 ` Nicolas Pitre
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox