* Fwd: Kernel 6.5 hangs on shutdown
@ 2023-10-12 9:37 Bagas Sanjaya
2023-10-13 12:05 ` [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) Linux regression tracking (Thorsten Leemhuis)
2023-10-16 8:46 ` Fwd: Kernel 6.5 hangs on shutdown Linux regression tracking #update (Thorsten Leemhuis)
0 siblings, 2 replies; 6+ messages in thread
From: Bagas Sanjaya @ 2023-10-12 9:37 UTC (permalink / raw)
To: Linux Kernel Mailing List, Linux Regressions
Cc: Linus Torvalds, Thomas Gleixner, Yanjun Yang
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> I use Dell OptiPlex 7050, and kernel hangs when shutting down the computer.
> Similar symptom has been reported on some forums, and all of them are using
> Dell computers:
> https://bbs.archlinux.org/viewtopic.php?pid=2124429
> https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/
> https://forum.artixlinux.org/index.php/topic,5997.0.html
>
> Tested with various kernel and this bug seems to be caused by commit: 88afbb21d4b36fee6acaa167641f9f0fc122f01b.
See Bugzilla for the full thread.
Anyway, I'm adding this regression to be tracked by regzbot:
#regzbot introduced: 88afbb21d4b36f https://bugzilla.kernel.org/show_bug.cgi?id=217995
#regzbot title: x86 core fix pull causes shutdown hang on Dell OptiPlex 7050
#regzbot link: https://bbs.archlinux.org/viewtopic.php?pid=2124429
#regzbot link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/
#regzbot link: https://forum.artixlinux.org/index.php/topic,5997.0.html
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217995
--
An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 6+ messages in thread* [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) 2023-10-12 9:37 Fwd: Kernel 6.5 hangs on shutdown Bagas Sanjaya @ 2023-10-13 12:05 ` Linux regression tracking (Thorsten Leemhuis) 2023-10-13 17:48 ` Linus Torvalds 2023-10-16 8:46 ` Fwd: Kernel 6.5 hangs on shutdown Linux regression tracking #update (Thorsten Leemhuis) 1 sibling, 1 reply; 6+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-10-13 12:05 UTC (permalink / raw) To: Thomas Gleixner Cc: Linus Torvalds, Yanjun Yang, Linux Kernel Mailing List, Linux Regressions, Bagas Sanjaya, Borislav Petkov (AMD), Ashok Raj, Ingo Molnar, Dave Hansen, the arch/x86 maintainers [CCing x86 maintainers] Hi Thomas! On 12.10.23 11:37, Bagas Sanjaya wrote: > > I notice a regression report on Bugzilla [1]. Quoting from it: >>> I use Dell OptiPlex 7050, and kernel hangs when shutting down the computer. >> Similar symptom has been reported on some forums, and all of them are using >> Dell computers: >> https://bbs.archlinux.org/viewtopic.php?pid=2124429 >> https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/ >> https://forum.artixlinux.org/index.php/topic,5997.0.html Another report: https://bugzilla.redhat.com/show_bug.cgi?id=2241279 From all those links it seems quite a lot of users with Dell machines are affected by this problem. >> Tested with various kernel and this bug seems to be caused by commit: 88afbb21d4b36fee6acaa167641f9f0fc122f01b. Thomas, turns out that bisection result was slightly wrong: a recheck confirmed that the regression is actually caused by 45e34c8af58f23 ("x86/smp: Put CPUs into INIT on shutdown if possible") [v6.5-rc1] of yours. See https://bugzilla.kernel.org/show_bug.cgi?id=217995 for details. Ciao, Thorsten > Anyway, I'm adding this regression to be tracked by regzbot: > [...] #regzbot introduced: 45e34c8af58f #regzbot link: https://bugzilla.redhat.com/show_bug.cgi?id=2241279 ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) 2023-10-13 12:05 ` [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) Linux regression tracking (Thorsten Leemhuis) @ 2023-10-13 17:48 ` Linus Torvalds 2023-10-13 18:28 ` Ashok Raj 2023-10-13 19:40 ` Thomas Gleixner 0 siblings, 2 replies; 6+ messages in thread From: Linus Torvalds @ 2023-10-13 17:48 UTC (permalink / raw) To: Linux regressions mailing list Cc: Thomas Gleixner, Yanjun Yang, Linux Kernel Mailing List, Bagas Sanjaya, Borislav Petkov (AMD), Ashok Raj, Ingo Molnar, Dave Hansen, the arch/x86 maintainers On Fri, 13 Oct 2023 at 05:05, Linux regression tracking (Thorsten Leemhuis) <regressions@leemhuis.info> wrote: > > Thomas, turns out that bisection result was slightly wrong: a recheck > confirmed that the regression is actually caused by 45e34c8af58f23 > ("x86/smp: Put CPUs into INIT on shutdown if possible") [v6.5-rc1] of > yours. See https://bugzilla.kernel.org/show_bug.cgi?id=217995 for details. That commit does look pretty dangerous. If *anything* is done through SMI after the code does that smp_park_other_cpus_in_init() sequence, I wouldn't be surprised in the least if the machine is hung. That's made worse since it looks like the shutdown sequence isn't necessarily run on the boot CPU, so the boot CPU itself may be in INIT, and any SMI quite possibly ends up treating that CPU specially. Who knows what SMI does, but the fact that the affected machines seem to be mainly from one particular manufacturer does tend to imply it's something like that. And the code does do a fair amount *after* shutting down cpu's. Not just things like calling x86_platform.iommu_shutdown(), but also things like possibly the tboot shutdown sequence (which almost *certainly* is some SMI thing). I dunno. Thomas - I htink the argument for that commit was fairly theoretical, and reverting it seems the obvious thing, unless you have some idea of what might be wrong. Linus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) 2023-10-13 17:48 ` Linus Torvalds @ 2023-10-13 18:28 ` Ashok Raj 2023-10-13 19:40 ` Thomas Gleixner 1 sibling, 0 replies; 6+ messages in thread From: Ashok Raj @ 2023-10-13 18:28 UTC (permalink / raw) To: Linus Torvalds Cc: Linux regressions mailing list, Thomas Gleixner, Yanjun Yang, Linux Kernel Mailing List, Bagas Sanjaya, Borislav Petkov (AMD), Ingo Molnar, Dave Hansen, the arch/x86 maintainers, Ashok Raj Hi On Fri, Oct 13, 2023 at 10:48:19AM -0700, Linus Torvalds wrote: > On Fri, 13 Oct 2023 at 05:05, Linux regression tracking (Thorsten > Leemhuis) <regressions@leemhuis.info> wrote: > > > > Thomas, turns out that bisection result was slightly wrong: a recheck > > confirmed that the regression is actually caused by 45e34c8af58f23 > > ("x86/smp: Put CPUs into INIT on shutdown if possible") [v6.5-rc1] of > > yours. See https://bugzilla.kernel.org/show_bug.cgi?id=217995 for details. > > That commit does look pretty dangerous. > > If *anything* is done through SMI after the code does that > smp_park_other_cpus_in_init() sequence, I wouldn't be surprised in the > least if the machine is hung. > > That's made worse since it looks like the shutdown sequence isn't > necessarily run on the boot CPU, so the boot CPU itself may be in > INIT, and any SMI quite possibly ends up treating that CPU specially. Sending INIT to processor marked as BSP will tank the system. > > Who knows what SMI does, but the fact that the affected machines seem > to be mainly from one particular manufacturer does tend to imply it's > something like that. There was a report (probably this same one), and it turns out it was a bug in the BIOS SMI handler. The client BIOS's were waiting for the lowest APICID to be the SMI rendevous master. If this is MeteorLake, the BSP wasn't the one with the lowest APIC and it triped here. The BIOS change is also being pushed to others for assimilation :) Server BIOS's had this correctly for a while now. > > And the code does do a fair amount *after* shutting down cpu's. Not > just things like calling x86_platform.iommu_shutdown(), but also > things like possibly the tboot shutdown sequence (which almost > *certainly* is some SMI thing). > > I dunno. Thomas - I htink the argument for that commit was fairly > theoretical, and reverting it seems the obvious thing, unless you have > some idea of what might be wrong. > > Linus -- Cheers, Ashok ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) 2023-10-13 17:48 ` Linus Torvalds 2023-10-13 18:28 ` Ashok Raj @ 2023-10-13 19:40 ` Thomas Gleixner 1 sibling, 0 replies; 6+ messages in thread From: Thomas Gleixner @ 2023-10-13 19:40 UTC (permalink / raw) To: Linus Torvalds, Linux regressions mailing list Cc: Yanjun Yang, Linux Kernel Mailing List, Bagas Sanjaya, Borislav Petkov (AMD), Ashok Raj, Ingo Molnar, Dave Hansen, the arch/x86 maintainers On Fri, Oct 13 2023 at 10:48, Linus Torvalds wrote: > On Fri, 13 Oct 2023 at 05:05, Linux regression tracking (Thorsten > Leemhuis) <regressions@leemhuis.info> wrote: >> >> Thomas, turns out that bisection result was slightly wrong: a recheck >> confirmed that the regression is actually caused by 45e34c8af58f23 >> ("x86/smp: Put CPUs into INIT on shutdown if possible") [v6.5-rc1] of >> yours. See https://bugzilla.kernel.org/show_bug.cgi?id=217995 for details. > > That commit does look pretty dangerous. > > If *anything* is done through SMI after the code does that > smp_park_other_cpus_in_init() sequence, I wouldn't be surprised in the > least if the machine is hung. > > That's made worse since it looks like the shutdown sequence isn't > necessarily run on the boot CPU, so the boot CPU itself may be in > INIT, and any SMI quite possibly ends up treating that CPU specially. smp_park_other_cpus_in_init() bails out early when it's not invoked on the boot CPU because sending INIT to the BSP results in a full machine reset. So that's definitely not the problem. > Who knows what SMI does, but the fact that the affected machines seem > to be mainly from one particular manufacturer does tend to imply it's > something like that. It's mostly DELL machines. The rest seems to be Lenovo and Sony with Alderlake/Raptorlake CPUs - at least that's what I could figure out from the various bug reports. I don't know which CPUs the DELL machines have, so I can't say it's a pattern. Bagas, can you please provide the output of /proc/cpuinfo ? > And the code does do a fair amount *after* shutting down cpu's. Not > just things like calling x86_platform.iommu_shutdown(), but also > things like possibly the tboot shutdown sequence (which almost > *certainly* is some SMI thing). That should not matter, but who the heck knows. > I dunno. Thomas - I htink the argument for that commit was fairly > theoretical, and reverting it seems the obvious thing, unless you have > some idea of what might be wrong. I agree with the revert for now. The problem is not entirely theoretical in the kexec() case, but yes for shutdown/reboot it's irrelevant. The reason why I ended up with this is the initial problem of soft offlined CPUs sitting in MWAIT. The kexec() kernel can end up writing to the monitor cache line reliably after it overwrote the original kernel mappings, which results in completely undebugable chaos or triple faults. The MWAIT issue is mitigated by writing to the monitor cache lines and forcing the CPUs into HLT. Extensive testing revealed that HLT is not entirely safe either, so we ended up with the INIT trick, which turned out to be very reliable in testing. Though it's obviously making some BIOSes very unhappy. Sigh... Did I mention before that I hate computers with a passion? Thanks, tglx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Fwd: Kernel 6.5 hangs on shutdown 2023-10-12 9:37 Fwd: Kernel 6.5 hangs on shutdown Bagas Sanjaya 2023-10-13 12:05 ` [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) Linux regression tracking (Thorsten Leemhuis) @ 2023-10-16 8:46 ` Linux regression tracking #update (Thorsten Leemhuis) 1 sibling, 0 replies; 6+ messages in thread From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-10-16 8:46 UTC (permalink / raw) To: Linux Kernel Mailing List, Linux Regressions [TLDR: This mail in primarily relevant for Linux kernel regression tracking. See link in footer if these mails annoy you.] On 12.10.23 11:37, Bagas Sanjaya wrote: > > I notice a regression report on Bugzilla [1]. Quoting from it: > >> I use Dell OptiPlex 7050, and kernel hangs when shutting down the computer. >> Similar symptom has been reported on some forums, and all of them are using >> Dell computers: >> https://bbs.archlinux.org/viewtopic.php?pid=2124429 >> https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/ >> https://forum.artixlinux.org/index.php/topic,5997.0.html >> >> Tested with various kernel and this bug seems to be caused by commit: 88afbb21d4b36fee6acaa167641f9f0fc122f01b. > > See Bugzilla for the full thread. > > Anyway, I'm adding this regression to be tracked by regzbot: > > #regzbot introduced: 88afbb21d4b36f https://bugzilla.kernel.org/show_bug.cgi?id=217995 > #regzbot title: x86 core fix pull causes shutdown hang on Dell OptiPlex 7050 > #regzbot link: https://bbs.archlinux.org/viewtopic.php?pid=2124429 > #regzbot link: https://www.reddit.com/r/openSUSE/comments/16qq99b/tumbleweed_shutdown_did_not_finish_completely/ > #regzbot link: https://forum.artixlinux.org/index.php/topic,5997.0.html #regzbot fix: fbe1bf1e5ff1e3b298420d7a8434983ef8d72bd1 #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-10-16 8:46 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-10-12 9:37 Fwd: Kernel 6.5 hangs on shutdown Bagas Sanjaya 2023-10-13 12:05 ` [regression] some Dell systems hang at shutdown due to "x86/smp: Put CPUs into INIT on shutdown if possible" (was Fwd: Kernel 6.5 hangs on shutdown) Linux regression tracking (Thorsten Leemhuis) 2023-10-13 17:48 ` Linus Torvalds 2023-10-13 18:28 ` Ashok Raj 2023-10-13 19:40 ` Thomas Gleixner 2023-10-16 8:46 ` Fwd: Kernel 6.5 hangs on shutdown Linux regression tracking #update (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox