* [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y @ 2023-09-07 14:33 Zhizhou Zhang 2023-09-08 12:58 ` Linus Walleij 0 siblings, 1 reply; 7+ messages in thread From: Zhizhou Zhang @ 2023-09-07 14:33 UTC (permalink / raw) To: linux, rmk+kernel, rppt, linus.walleij, akpm, vishal.moola, arnd, wangkefeng.wang, willy Cc: linux-arm-kernel, linux-kernel, Zhizhou Zhang From: Zhizhou Zhang <zhizhouzhang@asrmicro.com> flush_cache_all() save registers to stack at function entry. If it's called after cache disabled, the data is written to memory directly. So the following clean cache operation corrupted registers saved by flush_cache_all(), including lr register. calling flush_cache_all() before turn off cache fixed the problem. Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com> --- arch/arm/mm/mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 674ed71573a8..03fb0fe926f3 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -1675,6 +1675,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc) /* Run the patch stub to update the constants */ fixup_pv_table(&__pv_table_begin, (&__pv_table_end - &__pv_table_begin) << 2); + flush_cache_all(); /* * We changing not only the virtual to physical mapping, but also @@ -1690,7 +1691,6 @@ static void __init early_paging_init(const struct machine_desc *mdesc) asm("mrc p15, 0, %0, c2, c0, 2" : "=r" (ttbcr)); asm volatile("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr & ~(3 << 8 | 3 << 10))); - flush_cache_all(); /* * Fixup the page tables - this must be in the idmap region as -- 2.34.1 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y 2023-09-07 14:33 [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y Zhizhou Zhang @ 2023-09-08 12:58 ` Linus Walleij 2023-09-08 13:50 ` Russell King (Oracle) 0 siblings, 1 reply; 7+ messages in thread From: Linus Walleij @ 2023-09-08 12:58 UTC (permalink / raw) To: Zhizhou Zhang Cc: linux, rmk+kernel, rppt, akpm, vishal.moola, arnd, wangkefeng.wang, willy, linux-arm-kernel, linux-kernel, Zhizhou Zhang Hi Zhizhou, wow a great patch! I'm surprised no-one has been hit by this before. I guess we were lucky. On Thu, Sep 7, 2023 at 4:33 PM Zhizhou Zhang <zhizhou.zh@gmail.com> wrote: > From: Zhizhou Zhang <zhizhouzhang@asrmicro.com> > > flush_cache_all() save registers to stack at function entry. > If it's called after cache disabled, the data is written to > memory directly. So the following clean cache operation corrupted > registers saved by flush_cache_all(), including lr register. > calling flush_cache_all() before turn off cache fixed the problem. > > Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> I would also add Cc: stable@vger.kernel.org Then please put this into Russell's patch tracker once review is complete. Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y 2023-09-08 12:58 ` Linus Walleij @ 2023-09-08 13:50 ` Russell King (Oracle) 2023-09-08 21:00 ` Linus Walleij 0 siblings, 1 reply; 7+ messages in thread From: Russell King (Oracle) @ 2023-09-08 13:50 UTC (permalink / raw) To: Linus Walleij Cc: Zhizhou Zhang, rppt, akpm, vishal.moola, arnd, wangkefeng.wang, willy, linux-arm-kernel, linux-kernel, Zhizhou Zhang On Fri, Sep 08, 2023 at 02:58:49PM +0200, Linus Walleij wrote: > Hi Zhizhou, > > wow a great patch! I'm surprised no-one has been hit by this before. > I guess we were lucky. > > On Thu, Sep 7, 2023 at 4:33 PM Zhizhou Zhang <zhizhou.zh@gmail.com> wrote: > > > From: Zhizhou Zhang <zhizhouzhang@asrmicro.com> > > > > flush_cache_all() save registers to stack at function entry. > > If it's called after cache disabled, the data is written to > > memory directly. So the following clean cache operation corrupted > > registers saved by flush_cache_all(), including lr register. > > calling flush_cache_all() before turn off cache fixed the problem. > > > > Signed-off-by: Zhizhou Zhang <zhizhouzhang@asrmicro.com> > > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > > I would also add > Cc: stable@vger.kernel.org > > Then please put this into Russell's patch tracker once review > is complete. However, it makes a total nonsense of the comment, which explains precisely why the flush_cache_all() is where it is. Moving it before that comment means that the comment is now rediculous. So, please don't put it in the patch system. The patch certainly needs to be tested on TI Keystone which is the primary user of this code. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y 2023-09-08 13:50 ` Russell King (Oracle) @ 2023-09-08 21:00 ` Linus Walleij 2023-09-09 8:23 ` Zhi-zhou Zhang 2023-09-11 13:04 ` Nishanth Menon 0 siblings, 2 replies; 7+ messages in thread From: Linus Walleij @ 2023-09-08 21:00 UTC (permalink / raw) To: Russell King (Oracle), Andrew Davis, Nishanth Menon, Zhizhou Zhang Cc: rppt, akpm, vishal.moola, arnd, wangkefeng.wang, willy, linux-arm-kernel, linux-kernel, Zhizhou Zhang On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > However, it makes a total nonsense of the comment, which explains > precisely why the flush_cache_all() is where it is. Moving it before > that comment means that the comment is now rediculous. Zhizhou, can you look over the comment placement? > So, please don't put it in the patch system. > > The patch certainly needs to be tested on TI Keystone which is the > primary user of this code. Added Andrew Davis and Nishanth Menon to the thread: can you folks review and test this for Keystone? Yours, Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y 2023-09-08 21:00 ` Linus Walleij @ 2023-09-09 8:23 ` Zhi-zhou Zhang 2023-10-02 14:17 ` Andrew Davis 2023-09-11 13:04 ` Nishanth Menon 1 sibling, 1 reply; 7+ messages in thread From: Zhi-zhou Zhang @ 2023-09-09 8:23 UTC (permalink / raw) To: Linus Walleij Cc: Russell King (Oracle), Andrew Davis, Nishanth Menon, Zhizhou Zhang, rppt, akpm, vishal.moola, arnd, wangkefeng.wang, willy, linux-arm-kernel, linux-kernel On Fri, Sep 08, 2023 at 11:00:31PM +0200, Linus Walleij wrote: > On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > > However, it makes a total nonsense of the comment, which explains > > precisely why the flush_cache_all() is where it is. Moving it before > > that comment means that the comment is now rediculous. > > Zhizhou, can you look over the comment placement? Linus, I found the bug on a cortex-a55 cpu with high address memory. Since the lr is also corruptted, when flush_cache_all() is done, the program continues at the next instruction after fixup_pv_table(). So the disabling cache and flush_cache_all() is executed a secondary time. Then this time lr is correct so the kernel may boot up as usual. I read the comment carefully, I am not sure how "to ensure nothing is prefetched into the caches" affects the system. My patch doesn't prevent instrution prefetch though. But in my board everythings looks good. So I come up with a new fixup plan, that's keep the location of flush_cache_all() with adding a flush stack cache before disabling cache, the code is as follow, the fix is a bit ugly -- it makes assumption stack grow towards low address and flush_cache_all() will not occupy more than 32 bytes in the future. Comparing with move flush_cache_all() before disabling cache, Which one do you prefer? Thanks! diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 03fb0fe926f3..83a54c61a86b 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -1640,6 +1640,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc) unsigned long pa_pgd; unsigned int cr, ttbcr; long long offset; + void *stack; if (!mdesc->pv_fixup) return; @@ -1675,7 +1676,14 @@ static void __init early_paging_init(const struct machine_desc *mdesc) /* Run the patch stub to update the constants */ fixup_pv_table(&__pv_table_begin, (&__pv_table_end - &__pv_table_begin) << 2); - flush_cache_all(); + + /* + * clean stack in cacheline that undering memory will be changed in + * the following flush_cache_all(). assuming 32 bytes is enough for + * flush_cache_all(). + */ + stack = (void *) (current_stack_pointer - 32); + __cpuc_flush_dcache_area(stack, 32); /* * We changing not only the virtual to physical mapping, but also @@ -1691,6 +1699,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc) asm("mrc p15, 0, %0, c2, c0, 2" : "=r" (ttbcr)); asm volatile("mcr p15, 0, %0, c2, c0, 2" : : "r" (ttbcr & ~(3 << 8 | 3 << 10))); + flush_cache_all(); /* * Fixup the page tables - this must be in the idmap region as > > > So, please don't put it in the patch system. > > > > The patch certainly needs to be tested on TI Keystone which is the > > primary user of this code. > > Added Andrew Davis and Nishanth Menon to the thread: > can you folks review and test this for Keystone? > > Yours, > Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y 2023-09-09 8:23 ` Zhi-zhou Zhang @ 2023-10-02 14:17 ` Andrew Davis 0 siblings, 0 replies; 7+ messages in thread From: Andrew Davis @ 2023-10-02 14:17 UTC (permalink / raw) To: Linus Walleij, Russell King (Oracle), Nishanth Menon, Zhizhou Zhang, rppt, akpm, vishal.moola, arnd, wangkefeng.wang, willy, linux-arm-kernel, linux-kernel On 9/9/23 3:23 AM, Zhi-zhou Zhang wrote: > On Fri, Sep 08, 2023 at 11:00:31PM +0200, Linus Walleij wrote: >> On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle) >> <linux@armlinux.org.uk> wrote: >> >>> However, it makes a total nonsense of the comment, which explains >>> precisely why the flush_cache_all() is where it is. Moving it before >>> that comment means that the comment is now rediculous. >> >> Zhizhou, can you look over the comment placement? > > Linus, I found the bug on a cortex-a55 cpu with high address memory. > Since the lr is also corruptted, when flush_cache_all() is done, the > program continues at the next instruction after fixup_pv_table(). So > the disabling cache and flush_cache_all() is executed a secondary time. > Then this time lr is correct so the kernel may boot up as usual. > > I read the comment carefully, I am not sure how "to ensure nothing is > prefetched into the caches" affects the system. My patch doesn't > prevent instrution prefetch though. But in my board everythings looks > good. > > So I come up with a new fixup plan, that's keep the location of > flush_cache_all() with adding a flush stack cache before disabling > cache, the code is as follow, the fix is a bit ugly -- it makes > assumption stack grow towards low address and flush_cache_all() will > not occupy more than 32 bytes in the future. Comparing with move > flush_cache_all() before disabling cache, Which one do you prefer? > Thanks! > > diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c > index 03fb0fe926f3..83a54c61a86b 100644 > --- a/arch/arm/mm/mmu.c > +++ b/arch/arm/mm/mmu.c > @@ -1640,6 +1640,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc) > unsigned long pa_pgd; > unsigned int cr, ttbcr; > long long offset; > + void *stack; > > if (!mdesc->pv_fixup) > return; > @@ -1675,7 +1676,14 @@ static void __init early_paging_init(const struct machine_desc *mdesc) > /* Run the patch stub to update the constants */ > fixup_pv_table(&__pv_table_begin, > (&__pv_table_end - &__pv_table_begin) << 2); > - flush_cache_all(); > + > + /* > + * clean stack in cacheline that undering memory will be changed in > + * the following flush_cache_all(). assuming 32 bytes is enough for > + * flush_cache_all(). Adding this extra clean here seems reasonable, but this comment needs fixed to give the exact reasoning and warn others to not dirty the stack after this point. Maybe something like /* * The stack is currently in cacheable memory, after caching is disabled * writes to the stack will bypass the cached stack. If this now stale * cached stack is then evicted it will overwrite the updated stack in * memory. Clean the stack's cache-line and then ensure no writes to the * stack are made between here and disabling the cache below. */ Andrew > + */ > + stack = (void *) (current_stack_pointer - 32); > + __cpuc_flush_dcache_area(stack, 32); > > /* > * We changing not only the virtual to physical mapping, but also > @@ -1691,6 +1699,7 @@ static void __init early_paging_init(const struct machine_desc *mdesc) > asm("mrc p15, 0, %0, c2, c0, 2" : "=r" (ttbcr)); > asm volatile("mcr p15, 0, %0, c2, c0, 2" > : : "r" (ttbcr & ~(3 << 8 | 3 << 10))); > + flush_cache_all(); > > /* > * Fixup the page tables - this must be in the idmap region as > >> >>> So, please don't put it in the patch system. >>> >>> The patch certainly needs to be tested on TI Keystone which is the >>> primary user of this code. >> >> Added Andrew Davis and Nishanth Menon to the thread: >> can you folks review and test this for Keystone? >> >> Yours, >> Linus Walleij _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y 2023-09-08 21:00 ` Linus Walleij 2023-09-09 8:23 ` Zhi-zhou Zhang @ 2023-09-11 13:04 ` Nishanth Menon 1 sibling, 0 replies; 7+ messages in thread From: Nishanth Menon @ 2023-09-11 13:04 UTC (permalink / raw) To: Linus Walleij Cc: Russell King (Oracle), Andrew Davis, Zhizhou Zhang, rppt, akpm, vishal.moola, arnd, wangkefeng.wang, willy, linux-arm-kernel, linux-kernel, Zhizhou Zhang On 23:00-20230908, Linus Walleij wrote: > On Fri, Sep 8, 2023 at 3:50 PM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > > However, it makes a total nonsense of the comment, which explains > > precisely why the flush_cache_all() is where it is. Moving it before > > that comment means that the comment is now rediculous. > > Zhizhou, can you look over the comment placement? > > > So, please don't put it in the patch system. > > > > The patch certainly needs to be tested on TI Keystone which is the > > primary user of this code. > > Added Andrew Davis and Nishanth Menon to the thread: > can you folks review and test this for Keystone? next-20230911 alone: (boots fine): https://gist.github.com/nmenon/c097b4a7ce3971964a5a56a34b018c4d With https://lore.kernel.org/all/20230907143302.4940-1-zhizhou.zh@gmail.com/ applied on top (fails to boot): https://gist.github.com/nmenon/308cfeb84098f41d340cd0e61845a507 -- Regards, Nishanth Menon Key (0xDDB5849D1736249D) / Fingerprint: F8A2 8693 54EB 8232 17A3 1A34 DDB5 849D 1736 249D _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-10-02 14:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-07 14:33 [PATCH] ARM: mm: fix stack corruption when CONFIG_ARM_PV_FIXUP=y Zhizhou Zhang 2023-09-08 12:58 ` Linus Walleij 2023-09-08 13:50 ` Russell King (Oracle) 2023-09-08 21:00 ` Linus Walleij 2023-09-09 8:23 ` Zhi-zhou Zhang 2023-10-02 14:17 ` Andrew Davis 2023-09-11 13:04 ` Nishanth Menon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).