* Re: [PATCH v4 7/7] ARM: implement support for vmap'ed stacks
[not found] ` <CAMuHMdWNuCVeFiDrhnFmX0F1jxz8Fs4eFx55ojJF3d2ro-udrA@mail.gmail.com>
@ 2022-01-05 11:08 ` Jon Hunter
2022-01-05 11:12 ` Ard Biesheuvel
0 siblings, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2022-01-05 11:08 UTC (permalink / raw)
To: Geert Uytterhoeven, Ard Biesheuvel
Cc: Marek Szyprowski, Linux ARM, Russell King, Nicolas Pitre,
Arnd Bergmann, Kees Cook, Keith Packard, Linus Walleij,
Nick Desaulniers, Tony Lindgren, Krzysztof Kozlowski,
Linux Samsung SOC, Linux-Renesas, linux-tegra@vger.kernel.org
Hi Ard,
On 28/12/2021 14:39, Geert Uytterhoeven wrote:
...
>> As i don't have access to this hardware, I am going to have to rely on
>> someone who does to debug this further. The only alternative is
>> marking CONFIG_VMAP_STACK broken on MACH_EXYNOS but that would be
>> unfortunate.
>
> Wish I had seen this thread before...
>
> I've just bisected a resume after s2ram failure on R-Car Gen2 to the same
> commit a1c510d0adc604bb ("ARM: implement support for vmap'ed stacks")
> in arm/for-next.
>
> Expected output:
>
> PM: suspend entry (deep)
> Filesystems sync: 0.000 seconds
> Freezing user space processes ... (elapsed 0.010 seconds) done.
> OOM killer disabled.
> Freezing remaining freezable tasks ... (elapsed 0.009 seconds) done.
> Disabling non-boot CPUs ...
>
> [system suspended, this is also where it hangs on failure]
>
> Enabling non-boot CPUs ...
> CPU1 is up
> sh-eth ee700000.ethernet eth0: Link is Down
> Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=193)
> OOM killer enabled.
> Restarting tasks ... done.
> PM: suspend exit
>
> Both wake-on-LAN and wake-up by gpio-keys fail.
> Nothing interesting in the kernel log, cfr. above.
>
> Disabling CONFIG_VMAP_STACK fixes the issue for me.
>
> Just like arch/arm/mach-exynos/ (and others), arch/arm/mach-shmobile/
> has several *.S files related to secondary CPU bringup.
This is also breaking suspend on our 32-bit Tegra platforms. Reverting
this change on top of -next fixes the problem.
Cheers
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 7/7] ARM: implement support for vmap'ed stacks
2022-01-05 11:08 ` [PATCH v4 7/7] ARM: implement support for vmap'ed stacks Jon Hunter
@ 2022-01-05 11:12 ` Ard Biesheuvel
2022-01-05 11:33 ` Jon Hunter
2022-01-05 16:49 ` Jon Hunter
0 siblings, 2 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2022-01-05 11:12 UTC (permalink / raw)
To: Jon Hunter
Cc: Geert Uytterhoeven, Marek Szyprowski, Linux ARM, Russell King,
Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
Linus Walleij, Nick Desaulniers, Tony Lindgren,
Krzysztof Kozlowski, Linux Samsung SOC, Linux-Renesas,
linux-tegra@vger.kernel.org
On Wed, 5 Jan 2022 at 12:08, Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Ard,
>
> On 28/12/2021 14:39, Geert Uytterhoeven wrote:
>
> ...
>
> >> As i don't have access to this hardware, I am going to have to rely on
> >> someone who does to debug this further. The only alternative is
> >> marking CONFIG_VMAP_STACK broken on MACH_EXYNOS but that would be
> >> unfortunate.
> >
> > Wish I had seen this thread before...
> >
> > I've just bisected a resume after s2ram failure on R-Car Gen2 to the same
> > commit a1c510d0adc604bb ("ARM: implement support for vmap'ed stacks")
> > in arm/for-next.
> >
> > Expected output:
> >
> > PM: suspend entry (deep)
> > Filesystems sync: 0.000 seconds
> > Freezing user space processes ... (elapsed 0.010 seconds) done.
> > OOM killer disabled.
> > Freezing remaining freezable tasks ... (elapsed 0.009 seconds) done.
> > Disabling non-boot CPUs ...
> >
> > [system suspended, this is also where it hangs on failure]
> >
> > Enabling non-boot CPUs ...
> > CPU1 is up
> > sh-eth ee700000.ethernet eth0: Link is Down
> > Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> > driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=193)
> > OOM killer enabled.
> > Restarting tasks ... done.
> > PM: suspend exit
> >
> > Both wake-on-LAN and wake-up by gpio-keys fail.
> > Nothing interesting in the kernel log, cfr. above.
> >
> > Disabling CONFIG_VMAP_STACK fixes the issue for me.
> >
> > Just like arch/arm/mach-exynos/ (and others), arch/arm/mach-shmobile/
> > has several *.S files related to secondary CPU bringup.
>
>
> This is also breaking suspend on our 32-bit Tegra platforms. Reverting
> this change on top of -next fixes the problem.
>
Thanks for the report.
It would be helpful if you could provide some more context:
- does it happen on a LPAE build too?
- does it only happen on SMP capable systems?
- does it reproduce on such systems when using only a single CPU?
(i.e., pass 'nosmp' on the kernel command line)
- when passing 'no_console_suspend' on the kernel command line, are
any useful diagnostics produced?
- is there any way you could tell whether the crash/hang (assuming
that is what you are observing) occurs on the suspend path or on
resume?
- any other observations that could narrow this down?
Thanks,
Ard.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 7/7] ARM: implement support for vmap'ed stacks
2022-01-05 11:12 ` Ard Biesheuvel
@ 2022-01-05 11:33 ` Jon Hunter
2022-01-05 13:53 ` Russell King (Oracle)
2022-01-05 16:49 ` Jon Hunter
1 sibling, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2022-01-05 11:33 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Geert Uytterhoeven, Marek Szyprowski, Linux ARM, Russell King,
Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
Linus Walleij, Nick Desaulniers, Tony Lindgren,
Krzysztof Kozlowski, Linux Samsung SOC, Linux-Renesas,
linux-tegra@vger.kernel.org
On 05/01/2022 11:12, Ard Biesheuvel wrote:
...
> Thanks for the report.
>
> It would be helpful if you could provide some more context:
> - does it happen on a LPAE build too?
> - does it only happen on SMP capable systems?
These are all SMP systems.
> - does it reproduce on such systems when using only a single CPU?
> (i.e., pass 'nosmp' on the kernel command line)
I would need to try this.
> - when passing 'no_console_suspend' on the kernel command line, are
> any useful diagnostics produced?
> - is there any way you could tell whether the crash/hang (assuming
> that is what you are observing) occurs on the suspend path or on
> resume?
> - any other observations that could narrow this down?
I can run the above and let you know what I find.
Cheers
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 7/7] ARM: implement support for vmap'ed stacks
2022-01-05 11:33 ` Jon Hunter
@ 2022-01-05 13:53 ` Russell King (Oracle)
0 siblings, 0 replies; 6+ messages in thread
From: Russell King (Oracle) @ 2022-01-05 13:53 UTC (permalink / raw)
To: Jon Hunter
Cc: Ard Biesheuvel, Geert Uytterhoeven, Marek Szyprowski, Linux ARM,
Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
Linus Walleij, Nick Desaulniers, Tony Lindgren,
Krzysztof Kozlowski, Linux Samsung SOC, Linux-Renesas,
linux-tegra@vger.kernel.org
On Wed, Jan 05, 2022 at 11:33:48AM +0000, Jon Hunter wrote:
> On 05/01/2022 11:12, Ard Biesheuvel wrote:
> > Thanks for the report.
> >
> > It would be helpful if you could provide some more context:
> > - does it happen on a LPAE build too?
> > - does it only happen on SMP capable systems?
>
> These are all SMP systems.
>
> > - does it reproduce on such systems when using only a single CPU?
> > (i.e., pass 'nosmp' on the kernel command line)
>
> I would need to try this.
Please note that I want an answer on the vmap stack patches by the
end of today (UK time - so about five hours after this email has
been sent) as we have only tonight and tomorrow's linux-next before
the probable opening of the merge window.
The options are:
1. The problem gets fixed today and I merge the fix today so it can
get tested in linux-next over the next few days by the various
build farms and test setups.
2. We postpone the merging of this until the very end of the merge
window to give more time to sort out this mess - but what it
means is keeping it in linux-next and keeping various platforms
broken during that period. However, this is really not fair for
other people, and some would say this isn't even an option.
3. We drop the entire series for this merge window, meaning it gets
dropped from linux-next, and have another go for the neext merge
window.
Sorry for being so demanding, but we're far too close to the merge
window to be trying to debug a feature that is clearly causing a
regression for several platforms.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 7/7] ARM: implement support for vmap'ed stacks
2022-01-05 11:12 ` Ard Biesheuvel
2022-01-05 11:33 ` Jon Hunter
@ 2022-01-05 16:49 ` Jon Hunter
2022-01-05 17:02 ` Ard Biesheuvel
1 sibling, 1 reply; 6+ messages in thread
From: Jon Hunter @ 2022-01-05 16:49 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Geert Uytterhoeven, Marek Szyprowski, Linux ARM, Russell King,
Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
Linus Walleij, Nick Desaulniers, Tony Lindgren,
Krzysztof Kozlowski, Linux Samsung SOC, Linux-Renesas,
linux-tegra@vger.kernel.org
On 05/01/2022 11:12, Ard Biesheuvel wrote:
...
> Thanks for the report.
>
> It would be helpful if you could provide some more context:
> - does it happen on a LPAE build too?
Enabling CONFIG_ARM_LPAE does work.
> - does it only happen on SMP capable systems?
> - does it reproduce on such systems when using only a single CPU?
> (i.e., pass 'nosmp' on the kernel command line)
Adding 'nosmp' does not help.
> - when passing 'no_console_suspend' on the kernel command line, are
> any useful diagnostics produced?
Adding 'no_console_suspend' does not produce any interesting logs.
> - is there any way you could tell whether the crash/hang (assuming
> that is what you are observing) occurs on the suspend path or on
> resume?
That is not clear. I see it entering suspend, but not clear if it is
failing on entering suspend or resuming.
Cheers
Jon
--
nvpublic
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v4 7/7] ARM: implement support for vmap'ed stacks
2022-01-05 16:49 ` Jon Hunter
@ 2022-01-05 17:02 ` Ard Biesheuvel
0 siblings, 0 replies; 6+ messages in thread
From: Ard Biesheuvel @ 2022-01-05 17:02 UTC (permalink / raw)
To: Jon Hunter
Cc: Geert Uytterhoeven, Marek Szyprowski, Linux ARM, Russell King,
Nicolas Pitre, Arnd Bergmann, Kees Cook, Keith Packard,
Linus Walleij, Nick Desaulniers, Tony Lindgren,
Krzysztof Kozlowski, Linux Samsung SOC, Linux-Renesas,
linux-tegra@vger.kernel.org
On Wed, 5 Jan 2022 at 17:50, Jon Hunter <jonathanh@nvidia.com> wrote:
>
>
> On 05/01/2022 11:12, Ard Biesheuvel wrote:
>
> ...
>
> > Thanks for the report.
> >
> > It would be helpful if you could provide some more context:
> > - does it happen on a LPAE build too?
>
> Enabling CONFIG_ARM_LPAE does work.
>
> > - does it only happen on SMP capable systems?
> > - does it reproduce on such systems when using only a single CPU?
> > (i.e., pass 'nosmp' on the kernel command line)
>
> Adding 'nosmp' does not help.
>
> > - when passing 'no_console_suspend' on the kernel command line, are
> > any useful diagnostics produced?
>
> Adding 'no_console_suspend' does not produce any interesting logs.
>
> > - is there any way you could tell whether the crash/hang (assuming
> > that is what you are observing) occurs on the suspend path or on
> > resume?
>
> That is not clear. I see it entering suspend, but not clear if it is
> failing on entering suspend or resuming.
>
Thanks a lot for providing this info.
The fact that enabling LPAE makes the issue go away is a fairly strong
hint that one of the CPUs comes up running in an address space that
lacks the stack's vmapping in its copy of the swapper_pg_dir region -
LPAE builds map swapper_pg_dir directly so there it can never go out
of sync.
Given that vmappings are global, and therefore cached in the TLB
across context switches, it is not unlikely that the missing vmapping
of the stack is in a task that runs before suspend, but does not cause
any issues until after the CPU is reset completely (which takes cached
TLB entries down with it)
So in summary, this gives me something to chew on, and hopefully, I
will be able to provide a proper fix shortly.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-01-05 17:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20211122092816.2865873-1-ardb@kernel.org>
[not found] ` <CGME20211221103854eucas1p2592e38fcc84c1c3506fce87f1dab6739@eucas1p2.samsung.com>
[not found] ` <20211122092816.2865873-8-ardb@kernel.org>
[not found] ` <0ffc858f-27e7-6789-4be1-c4c5ad61eb9d@samsung.com>
[not found] ` <CAMj1kXG+P5AU-26t_16FL5xfQNd+ByQH_cfBLiwMSdoGPmvCuw@mail.gmail.com>
[not found] ` <e07a229a-e565-0077-9f8a-a24ffa45f395@samsung.com>
[not found] ` <CAMj1kXG3neg0riLAaU32KLvB2PLBNzwqgO0F21nbK1ivS=FwMg@mail.gmail.com>
[not found] ` <b22077f6-0925-ee00-41ea-3e52241926e2@samsung.com>
[not found] ` <CAMj1kXHQrqZSE1kHaQyQyK6R58EV3cUyvJFmM1JYifaMemyUhQ@mail.gmail.com>
[not found] ` <f469726d-86fb-cf54-2775-d4658d2f3a5d@samsung.com>
[not found] ` <CAMj1kXGyL7yTV4+pOs9iBWYuVvVmPTZrV5r=nzqttqpZ6-vYJA@mail.gmail.com>
[not found] ` <CAMuHMdWNuCVeFiDrhnFmX0F1jxz8Fs4eFx55ojJF3d2ro-udrA@mail.gmail.com>
2022-01-05 11:08 ` [PATCH v4 7/7] ARM: implement support for vmap'ed stacks Jon Hunter
2022-01-05 11:12 ` Ard Biesheuvel
2022-01-05 11:33 ` Jon Hunter
2022-01-05 13:53 ` Russell King (Oracle)
2022-01-05 16:49 ` Jon Hunter
2022-01-05 17:02 ` Ard Biesheuvel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox