* [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y
@ 2023-09-07 14:33 Zhizhou Zhang
2023-09-08 12:58 ` Linus Walleij
0 siblings, 1 reply; 7+ messages in thread
From: Zhizhou Zhang @ 2023-09-07 14:33 UTC (permalink / raw)
To: linux, rmk+kernel, rppt, linus.walleij, akpm, vishal.moola, arnd,
wangkefeng.wang, willy
Cc: linux-arm-kernel, linux-kernel, Zhizhou Zhang
From: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
flush_cache_all() save registers to stack at function entry.
If it's called after cache disabled, the data is written to
memory directly. So the following clean cache operation corrupted
registers saved by flush_cache_all(), including lr register.
calling flush_cache_all() before turn off cache fixed the problem.
Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
---
arch/arm/mm/mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 674ed71573a8..03fb0fe926f3 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1675,6 +1675,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
/* Run the patch stub to update the constants */
fixup_pv_table(&__pv_table_begin,
(&__pv_table_end - &__pv_table_begin) << 2);
+ flush_cache_all();
/*
* We changing not only the virtual to physical mapping, but also
@@ -1690,7 +1691,6 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
asm("mrc p15, 0, %0, c2, c0, 2" : "=r" (ttbcr));
asm volatile("mcr p15, 0, %0, c2, c0, 2"
: : "r" (ttbcr & ~(3 << 8 | 3 << 10)));
- flush_cache_all();
/*
* Fixup the page tables - this must be in the idmap region as
--
2.34.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y
2023-09-07 14:33 [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y Zhizhou Zhang
@ 2023-09-08 12:58 ` Linus Walleij
2023-09-08 13:50 ` Russell King (Oracle)
0 siblings, 1 reply; 7+ messages in thread
From: Linus Walleij @ 2023-09-08 12:58 UTC (permalink / raw)
To: Zhizhou Zhang
Cc: linux, rmk+kernel, rppt, akpm, vishal.moola, arnd,
wangkefeng.wang, willy, linux-arm-kernel, linux-kernel,
Zhizhou Zhang
Hi Zhizhou,
wow a great patch! I'm surprised no-one has been hit by this before.
I guess we were lucky.
On Thu, Sep 7, 2023 at 4:33 PM Zhizhou Zhang <zhizhou.zh@gmail.com> wrote:
> From: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
>
> flush_cache_all() save registers to stack at function entry.
> If it's called after cache disabled, the data is written to
> memory directly. So the following clean cache operation corrupted
> registers saved by flush_cache_all(), including lr register.
> calling flush_cache_all() before turn off cache fixed the problem.
>
> Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
I would also add
Cc: stable@vger.kernel.org
Then please put this into Russell's patch tracker once review
is complete.
Yours,
Linus Walleij
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y
2023-09-08 12:58 ` Linus Walleij
@ 2023-09-08 13:50 ` Russell King (Oracle)
2023-09-08 21:00 ` Linus Walleij
0 siblings, 1 reply; 7+ messages in thread
From: Russell King (Oracle) @ 2023-09-08 13:50 UTC (permalink / raw)
To: Linus Walleij
Cc: Zhizhou Zhang, rppt, akpm, vishal.moola, arnd, wangkefeng.wang,
willy, linux-arm-kernel, linux-kernel, Zhizhou Zhang
On Fri, Sep 08, 2023 at 02:58:49PM +0200, Linus Walleij wrote:
> Hi Zhizhou,
>
> wow a great patch! I'm surprised no-one has been hit by this before.
> I guess we were lucky.
>
> On Thu, Sep 7, 2023 at 4:33 PM Zhizhou Zhang <zhizhou.zh@gmail.com> wrote:
>
> > From: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
> >
> > flush_cache_all() save registers to stack at function entry.
> > If it's called after cache disabled, the data is written to
> > memory directly. So the following clean cache operation corrupted
> > registers saved by flush_cache_all(), including lr register.
> > calling flush_cache_all() before turn off cache fixed the problem.
> >
> > Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com>
>
> Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
>
> I would also add
> Cc: stable@vger.kernel.org
>
> Then please put this into Russell's patch tracker once review
> is complete.
However, it makes a total nonsense of the comment, which explains
precisely why the flush_cache_all() is where it is. Moving it before
that comment means that the comment is now rediculous.
So, please don't put it in the patch system.
The patch certainly needs to be tested on TI Keystone which is the
primary user of this code.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y
2023-09-08 13:50 ` Russell King (Oracle)
@ 2023-09-08 21:00 ` Linus Walleij
2023-09-09 8:23 ` Zhi-zhou Zhang
2023-09-11 13:04 ` Nishanth Menon
0 siblings, 2 replies; 7+ messages in thread
From: Linus Walleij @ 2023-09-08 21:00 UTC (permalink / raw)
To: Russell King (Oracle), Andrew Davis, Nishanth Menon,
Zhizhou Zhang
Cc: rppt, akpm, vishal.moola, arnd, wangkefeng.wang, willy,
linux-arm-kernel, linux-kernel, Zhizhou Zhang
On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
> However, it makes a total nonsense of the comment, which explains
> precisely why the flush_cache_all() is where it is. Moving it before
> that comment means that the comment is now rediculous.
Zhizhou, can you look over the comment placement?
> So, please don't put it in the patch system.
>
> The patch certainly needs to be tested on TI Keystone which is the
> primary user of this code.
Added Andrew Davis and Nishanth Menon to the thread:
can you folks review and test this for Keystone?
Yours,
Linus Walleij
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y
2023-09-08 21:00 ` Linus Walleij
@ 2023-09-09 8:23 ` Zhi-zhou Zhang
2023-10-02 14:17 ` Andrew Davis
2023-09-11 13:04 ` Nishanth Menon
1 sibling, 1 reply; 7+ messages in thread
From: Zhi-zhou Zhang @ 2023-09-09 8:23 UTC (permalink / raw)
To: Linus Walleij
Cc: Russell King (Oracle), Andrew Davis, Nishanth Menon,
Zhizhou Zhang, rppt, akpm, vishal.moola, arnd, wangkefeng.wang,
willy, linux-arm-kernel, linux-kernel
On Fri, Sep 08, 2023 at 11:00:31PM +0200, Linus Walleij wrote:
> On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
>
> > However, it makes a total nonsense of the comment, which explains
> > precisely why the flush_cache_all() is where it is. Moving it before
> > that comment means that the comment is now rediculous.
>
> Zhizhou, can you look over the comment placement?
Linus, I found the bug on a cortex-a55 cpu with high address memory.
Since the lr is also corruptted, when flush_cache_all() is done, the
program continues at the next instruction after fixup_pv_table(). So
the disabling cache and flush_cache_all() is executed a secondary time.
Then this time lr is correct so the kernel may boot up as usual.
I read the comment carefully, I am not sure how "to ensure nothing is
prefetched into the caches" affects the system. My patch doesn't
prevent instrution prefetch though. But in my board everythings looks
good.
So I come up with a new fixup plan, that's keep the location of
flush_cache_all() with adding a flush stack cache before disabling
cache, the code is as follow, the fix is a bit ugly -- it makes
assumption stack grow towards low address and flush_cache_all() will
not occupy more than 32 bytes in the future. Comparing with move
flush_cache_all() before disabling cache, Which one do you prefer?
Thanks!
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 03fb0fe926f3..83a54c61a86b 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1640,6 +1640,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
unsigned long pa_pgd;
unsigned int cr, ttbcr;
long long offset;
+ void *stack;
if (!mdesc->pv_fixup)
return;
@@ -1675,7 +1676,14 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
/* Run the patch stub to update the constants */
fixup_pv_table(&__pv_table_begin,
(&__pv_table_end - &__pv_table_begin) << 2);
- flush_cache_all();
+
+ /*
+ * clean stack in cacheline that undering memory will be changed in
+ * the following flush_cache_all(). assuming 32 bytes is enough for
+ * flush_cache_all().
+ */
+ stack = (void *) (current_stack_pointer - 32);
+ __cpuc_flush_dcache_area(stack, 32);
/*
* We changing not only the virtual to physical mapping, but also
@@ -1691,6 +1699,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
asm("mrc p15, 0, %0, c2, c0, 2" : "=r" (ttbcr));
asm volatile("mcr p15, 0, %0, c2, c0, 2"
: : "r" (ttbcr & ~(3 << 8 | 3 << 10)));
+ flush_cache_all();
/*
* Fixup the page tables - this must be in the idmap region as
>
> > So, please don't put it in the patch system.
> >
> > The patch certainly needs to be tested on TI Keystone which is the
> > primary user of this code.
>
> Added Andrew Davis and Nishanth Menon to the thread:
> can you folks review and test this for Keystone?
>
> Yours,
> Linus Walleij
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y
2023-09-08 21:00 ` Linus Walleij
2023-09-09 8:23 ` Zhi-zhou Zhang
@ 2023-09-11 13:04 ` Nishanth Menon
1 sibling, 0 replies; 7+ messages in thread
From: Nishanth Menon @ 2023-09-11 13:04 UTC (permalink / raw)
To: Linus Walleij
Cc: Russell King (Oracle), Andrew Davis, Zhizhou Zhang, rppt, akpm,
vishal.moola, arnd, wangkefeng.wang, willy, linux-arm-kernel,
linux-kernel, Zhizhou Zhang
On 23:00-20230908, Linus Walleij wrote:
> On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
>
> > However, it makes a total nonsense of the comment, which explains
> > precisely why the flush_cache_all() is where it is. Moving it before
> > that comment means that the comment is now rediculous.
>
> Zhizhou, can you look over the comment placement?
>
> > So, please don't put it in the patch system.
> >
> > The patch certainly needs to be tested on TI Keystone which is the
> > primary user of this code.
>
> Added Andrew Davis and Nishanth Menon to the thread:
> can you folks review and test this for Keystone?
next-20230911 alone: (boots fine):
https://gist.github.com/nmenon/c097b4a7ce3971964a5a56a34b018c4d
With
https://lore.kernel.org/all/20230907143302.4940-1-zhizhou.zh@gmail.com/
applied on top (fails to boot):
https://gist.github.com/nmenon/308cfeb84098f41d340cd0e61845a507
--
Regards,
Nishanth Menon
Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y
2023-09-09 8:23 ` Zhi-zhou Zhang
@ 2023-10-02 14:17 ` Andrew Davis
0 siblings, 0 replies; 7+ messages in thread
From: Andrew Davis @ 2023-10-02 14:17 UTC (permalink / raw)
To: Linus Walleij, Russell King (Oracle), Nishanth Menon,
Zhizhou Zhang, rppt, akpm, vishal.moola, arnd, wangkefeng.wang,
willy, linux-arm-kernel, linux-kernel
On 9/9/23 3:23 AM, Zhi-zhou Zhang wrote:
> On Fri, Sep 08, 2023 at 11:00:31PM +0200, Linus Walleij wrote:
>> On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle)
>> <linux@armlinux.org.uk> wrote:
>>
>>> However, it makes a total nonsense of the comment, which explains
>>> precisely why the flush_cache_all() is where it is. Moving it before
>>> that comment means that the comment is now rediculous.
>>
>> Zhizhou, can you look over the comment placement?
>
> Linus, I found the bug on a cortex-a55 cpu with high address memory.
> Since the lr is also corruptted, when flush_cache_all() is done, the
> program continues at the next instruction after fixup_pv_table(). So
> the disabling cache and flush_cache_all() is executed a secondary time.
> Then this time lr is correct so the kernel may boot up as usual.
>
> I read the comment carefully, I am not sure how "to ensure nothing is
> prefetched into the caches" affects the system. My patch doesn't
> prevent instrution prefetch though. But in my board everythings looks
> good.
>
> So I come up with a new fixup plan, that's keep the location of
> flush_cache_all() with adding a flush stack cache before disabling
> cache, the code is as follow, the fix is a bit ugly -- it makes
> assumption stack grow towards low address and flush_cache_all() will
> not occupy more than 32 bytes in the future. Comparing with move
> flush_cache_all() before disabling cache, Which one do you prefer?
> Thanks!
>
> diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> index 03fb0fe926f3..83a54c61a86b 100644
> --- a/arch/arm/mm/mmu.c
> +++ b/arch/arm/mm/mmu.c
> @@ -1640,6 +1640,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
> unsigned long pa_pgd;
> unsigned int cr, ttbcr;
> long long offset;
> + void *stack;
>
> if (!mdesc->pv_fixup)
> return;
> @@ -1675,7 +1676,14 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
> /* Run the patch stub to update the constants */
> fixup_pv_table(&__pv_table_begin,
> (&__pv_table_end - &__pv_table_begin) << 2);
> - flush_cache_all();
> +
> + /*
> + * clean stack in cacheline that undering memory will be changed in
> + * the following flush_cache_all(). assuming 32 bytes is enough for
> + * flush_cache_all().
Adding this extra clean here seems reasonable, but this comment needs
fixed to give the exact reasoning and warn others to not dirty the stack
after this point. Maybe something like
/*
* The stack is currently in cacheable memory, after caching is disabled
* writes to the stack will bypass the cached stack. If this now stale
* cached stack is then evicted it will overwrite the updated stack in
* memory. Clean the stack's cache-line and then ensure no writes to the
* stack are made between here and disabling the cache below.
*/
Andrew
> + */
> + stack = (void *) (current_stack_pointer - 32);
> + __cpuc_flush_dcache_area(stack, 32);
>
> /*
> * We changing not only the virtual to physical mapping, but also
> @@ -1691,6 +1699,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc)
> asm("mrc p15, 0, %0, c2, c0, 2" : "=r" (ttbcr));
> asm volatile("mcr p15, 0, %0, c2, c0, 2"
> : : "r" (ttbcr & ~(3 << 8 | 3 << 10)));
> + flush_cache_all();
>
> /*
> * Fixup the page tables - this must be in the idmap region as
>
>>
>>> So, please don't put it in the patch system.
>>>
>>> The patch certainly needs to be tested on TI Keystone which is the
>>> primary user of this code.
>>
>> Added Andrew Davis and Nishanth Menon to the thread:
>> can you folks review and test this for Keystone?
>>
>> Yours,
>> Linus Walleij
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-10-02 14:18 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-07 14:33 [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y Zhizhou Zhang
2023-09-08 12:58 ` Linus Walleij
2023-09-08 13:50 ` Russell King (Oracle)
2023-09-08 21:00 ` Linus Walleij
2023-09-09 8:23 ` Zhi-zhou Zhang
2023-10-02 14:17 ` Andrew Davis
2023-09-11 13:04 ` Nishanth Menon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).