All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] x86/CPU: Use correct macros for Cyrix calls on Geode processors
@ 2024-12-24  1:04 Russell Senior
  2024-12-24 12:17 ` Russell Senior
  0 siblings, 1 reply; 5+ messages in thread
From: Russell Senior @ 2024-12-24  1:04 UTC (permalink / raw)
  To: Matthew Whitehead; +Cc: linux-kernel, tglx, mingo, luto, Jonas Gorski

Hi,

I still have some Soekris net4826 in a Community Wireless Network I
volunteer with. These devices use an AMD SC1100 SoC. I am running
OpenWrt on them, which uses a patched kernel, that naturally has
evolved over time.  I haven't updated the ones in the field in a
number of years (circa 2017), but have one in a test bed, where I have
intermittently tried out test builds.

A few years ago, I noticed some trouble, particularly when "warm
booting", that is, doing a reboot without removing power, and noticed
the device was hanging after the kernel message:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.

If I removed power and then restarted, it would boot fine, continuing
through the message above, thusly:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
  [    0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
  [    0.100000] Enable Memory access reorder on Cyrix/NSC processor.
  [    0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  [    0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
  [    0.120037] CPU: NSC Geode(TM) Integrated Processor by National
Semi (family: 0x5, model: 0x9, stepping: 0x1)
  [...]

In order to continue using modern tools, like ssh, to interact with
the software on these old devices, I need modern builds of the OpenWrt
firmware on the devices. I confirmed that the warm boot hang was still
an issue in modern OpenWrt builds (currently using a patched linux
v6.6.65).

Last night, I decided it was time to get to the bottom of the warm
boot hang, and began bisecting. From preserved builds, I narrowed down
the bisection window from late February to late May 2019. During this
period, the OpenWrt builds were using 4.14.x. I was able to build
using period-correct Ubuntu 18.04.6. After a number of bisection
iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
the commit that introduced the warm boot hang.

  https://github.com/openwrt/openwrt/commit/07aaa7e3d62ad32767d7067107db64b6ade81537

Looking at the upstream changes in the stable kernel between 4.14.112
and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5

So, I tried reverting just that kernel change on top of the breaking
OpenWrt commit, and my warm boot hang went away.

Presumably, the warm boot hang is due to some register not getting
cleared in the same way that a loss of power does. That is
approximately as much as I understand about the problem.

Can you suggest a patch to try to either clarify what is going wrong
and/or to potentially fix the problem in a more appropriate way?

Thanks!

-- 
Russell Senior
russell@personaltelco.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] x86/CPU: Use correct macros for Cyrix calls on Geode processors
  2024-12-24  1:04 [PATCH 1/2] x86/CPU: Use correct macros for Cyrix calls on Geode processors Russell Senior
@ 2024-12-24 12:17 ` Russell Senior
  2025-02-25 21:41   ` [PATCH] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems Ingo Molnar
  2025-02-25 21:52   ` [tip: x86/urgent] " tip-bot2 for Russell Senior
  0 siblings, 2 replies; 5+ messages in thread
From: Russell Senior @ 2024-12-24 12:17 UTC (permalink / raw)
  To: Matthew Whitehead; +Cc: linux-kernel, tglx, mingo, luto, Jonas Gorski

More poking/prodding and coaching from Jonas Gorski (cc'd), it looks
like this test patch fixes the problem on my board: Tested against
v6.6.67 and v4.14.113:

--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -153,8 +153,8 @@ static void geode_configure(void)
        u8 ccr3;
        local_irq_save(flags);

-       /* Suspend on halt power saving and enable #SUSP pin */
-       setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
+       /* Suspend on halt power saving */
+       setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x08);

        ccr3 = getCx86(CX86_CCR3);
        setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10);       /* enable MAPEN */


On Mon, Dec 23, 2024 at 5:04 PM Russell Senior
<russell@personaltelco.net> wrote:
>
> Hi,
>
> I still have some Soekris net4826 in a Community Wireless Network I
> volunteer with. These devices use an AMD SC1100 SoC. I am running
> OpenWrt on them, which uses a patched kernel, that naturally has
> evolved over time.  I haven't updated the ones in the field in a
> number of years (circa 2017), but have one in a test bed, where I have
> intermittently tried out test builds.
>
> A few years ago, I noticed some trouble, particularly when "warm
> booting", that is, doing a reboot without removing power, and noticed
> the device was hanging after the kernel message:
>
>   [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
>
> If I removed power and then restarted, it would boot fine, continuing
> through the message above, thusly:
>
>   [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
>   [    0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
>   [    0.100000] Enable Memory access reorder on Cyrix/NSC processor.
>   [    0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
>   [    0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
>   [    0.120037] CPU: NSC Geode(TM) Integrated Processor by National
> Semi (family: 0x5, model: 0x9, stepping: 0x1)
>   [...]
>
> In order to continue using modern tools, like ssh, to interact with
> the software on these old devices, I need modern builds of the OpenWrt
> firmware on the devices. I confirmed that the warm boot hang was still
> an issue in modern OpenWrt builds (currently using a patched linux
> v6.6.65).
>
> Last night, I decided it was time to get to the bottom of the warm
> boot hang, and began bisecting. From preserved builds, I narrowed down
> the bisection window from late February to late May 2019. During this
> period, the OpenWrt builds were using 4.14.x. I was able to build
> using period-correct Ubuntu 18.04.6. After a number of bisection
> iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
> the commit that introduced the warm boot hang.
>
>   https://github.com/openwrt/openwrt/commit/07aaa7e3d62ad32767d7067107db64b6ade81537
>
> Looking at the upstream changes in the stable kernel between 4.14.112
> and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5
>
> So, I tried reverting just that kernel change on top of the breaking
> OpenWrt commit, and my warm boot hang went away.
>
> Presumably, the warm boot hang is due to some register not getting
> cleared in the same way that a loss of power does. That is
> approximately as much as I understand about the problem.
>
> Can you suggest a patch to try to either clarify what is going wrong
> and/or to potentially fix the problem in a more appropriate way?
>
> Thanks!
>
> --
> Russell Senior
> russell@personaltelco.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems
  2024-12-24 12:17 ` Russell Senior
@ 2025-02-25 21:41   ` Ingo Molnar
  2025-02-25 23:08     ` Russell Senior
  2025-02-25 21:52   ` [tip: x86/urgent] " tip-bot2 for Russell Senior
  1 sibling, 1 reply; 5+ messages in thread
From: Ingo Molnar @ 2025-02-25 21:41 UTC (permalink / raw)
  To: Russell Senior; +Cc: Matthew Whitehead, linux-kernel, tglx, luto, Jonas Gorski


* Russell Senior <russell@personaltelco.net> wrote:

> More poking/prodding and coaching from Jonas Gorski (cc'd), it looks
> like this test patch fixes the problem on my board: Tested against
> v6.6.67 and v4.14.113:
> 
> --- a/arch/x86/kernel/cpu/cyrix.c
> +++ b/arch/x86/kernel/cpu/cyrix.c
> @@ -153,8 +153,8 @@ static void geode_configure(void)
>         u8 ccr3;
>         local_irq_save(flags);
> 
> -       /* Suspend on halt power saving and enable #SUSP pin */
> -       setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
> +       /* Suspend on halt power saving */
> +       setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x08);
> 
>         ccr3 = getCx86(CX86_CCR3);
>         setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10);       /* enable MAPEN */

That's really useful - thank you!

I've constructed a fix patch from your mails, attached below. I added 
your Signed-off-by to the fix, let me know if that's OK with you.

I have applied your fix to the x86 tree, if everything goes fine it 
ought to go upstream during the next merge window in ~4 weeks, with 
v6.15.

Thanks,

	Ingo

==========================>
From f5b2656ee7616f595b4e1735893d1371f216619f Mon Sep 17 00:00:00 2001
From: Russell Senior <russell@personaltelco.net>
Date: Tue, 25 Feb 2025 22:31:20 +0100
Subject: [PATCH] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems

I still have some Soekris net4826 in a Community Wireless Network I
volunteer with. These devices use an AMD SC1100 SoC. I am running
OpenWrt on them, which uses a patched kernel, that naturally has
evolved over time.  I haven't updated the ones in the field in a
number of years (circa 2017), but have one in a test bed, where I have
intermittently tried out test builds.

A few years ago, I noticed some trouble, particularly when "warm
booting", that is, doing a reboot without removing power, and noticed
the device was hanging after the kernel message:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.

If I removed power and then restarted, it would boot fine, continuing
through the message above, thusly:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
  [    0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
  [    0.100000] Enable Memory access reorder on Cyrix/NSC processor.
  [    0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  [    0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
  [    0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1)
  [...]

In order to continue using modern tools, like ssh, to interact with
the software on these old devices, I need modern builds of the OpenWrt
firmware on the devices. I confirmed that the warm boot hang was still
an issue in modern OpenWrt builds (currently using a patched linux
v6.6.65).

Last night, I decided it was time to get to the bottom of the warm
boot hang, and began bisecting. From preserved builds, I narrowed down
the bisection window from late February to late May 2019. During this
period, the OpenWrt builds were using 4.14.x. I was able to build
using period-correct Ubuntu 18.04.6. After a number of bisection
iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
the commit that introduced the warm boot hang.

  https://github.com/openwrt/openwrt/commit/07aaa7e3d62ad32767d7067107db64b6ade81537

Looking at the upstream changes in the stable kernel between 4.14.112
and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5

So, I tried reverting just that kernel change on top of the breaking
OpenWrt commit, and my warm boot hang went away.

Presumably, the warm boot hang is due to some register not getting
cleared in the same way that a loss of power does. That is
approximately as much as I understand about the problem.

More poking/prodding and coaching from Jonas Gorski, it looks
like this test patch fixes the problem on my board: Tested against
v6.6.67 and v4.14.113.

Fixes: 18fb053f9b82 ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors")
Debugged-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/cyrix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index 9651275aecd1..dfec2c61e354 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -153,8 +153,8 @@ static void geode_configure(void)
 	u8 ccr3;
 	local_irq_save(flags);
 
-	/* Suspend on halt power saving and enable #SUSP pin */
-	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
+	/* Suspend on halt power saving */
+	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x08);
 
 	ccr3 = getCx86(CX86_CCR3);
 	setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10);	/* enable MAPEN */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip: x86/urgent] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems
  2024-12-24 12:17 ` Russell Senior
  2025-02-25 21:41   ` [PATCH] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems Ingo Molnar
@ 2025-02-25 21:52   ` tip-bot2 for Russell Senior
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Russell Senior @ 2025-02-25 21:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Russell Senior, Ingo Molnar, Matthew Whitehead, Thomas Gleixner,
	x86, linux-kernel

The following commit has been merged into the x86/urgent branch of tip:

Commit-ID:     bebe35bb738b573c32a5033499cd59f20293f2a3
Gitweb:        https://git.kernel.org/tip/bebe35bb738b573c32a5033499cd59f20293f2a3
Author:        Russell Senior <russell@personaltelco.net>
AuthorDate:    Tue, 25 Feb 2025 22:31:20 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 25 Feb 2025 22:44:01 +01:00

x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems

I still have some Soekris net4826 in a Community Wireless Network I
volunteer with. These devices use an AMD SC1100 SoC. I am running
OpenWrt on them, which uses a patched kernel, that naturally has
evolved over time.  I haven't updated the ones in the field in a
number of years (circa 2017), but have one in a test bed, where I have
intermittently tried out test builds.

A few years ago, I noticed some trouble, particularly when "warm
booting", that is, doing a reboot without removing power, and noticed
the device was hanging after the kernel message:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.

If I removed power and then restarted, it would boot fine, continuing
through the message above, thusly:

  [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
  [    0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
  [    0.100000] Enable Memory access reorder on Cyrix/NSC processor.
  [    0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  [    0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
  [    0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1)
  [...]

In order to continue using modern tools, like ssh, to interact with
the software on these old devices, I need modern builds of the OpenWrt
firmware on the devices. I confirmed that the warm boot hang was still
an issue in modern OpenWrt builds (currently using a patched linux
v6.6.65).

Last night, I decided it was time to get to the bottom of the warm
boot hang, and began bisecting. From preserved builds, I narrowed down
the bisection window from late February to late May 2019. During this
period, the OpenWrt builds were using 4.14.x. I was able to build
using period-correct Ubuntu 18.04.6. After a number of bisection
iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
the commit that introduced the warm boot hang.

  https://github.com/openwrt/openwrt/commit/07aaa7e3d62ad32767d7067107db64b6ade81537

Looking at the upstream changes in the stable kernel between 4.14.112
and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:

  https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5

So, I tried reverting just that kernel change on top of the breaking
OpenWrt commit, and my warm boot hang went away.

Presumably, the warm boot hang is due to some register not getting
cleared in the same way that a loss of power does. That is
approximately as much as I understand about the problem.

More poking/prodding and coaching from Jonas Gorski, it looks
like this test patch fixes the problem on my board: Tested against
v6.6.67 and v4.14.113.

Fixes: 18fb053f9b82 ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors")
Debugged-by: Jonas Gorski <jonas.gorski@gmail.com>
Signed-off-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com
Cc: Matthew Whitehead <tedheadster@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/cpu/cyrix.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/cyrix.c b/arch/x86/kernel/cpu/cyrix.c
index 9651275..dfec2c6 100644
--- a/arch/x86/kernel/cpu/cyrix.c
+++ b/arch/x86/kernel/cpu/cyrix.c
@@ -153,8 +153,8 @@ static void geode_configure(void)
 	u8 ccr3;
 	local_irq_save(flags);
 
-	/* Suspend on halt power saving and enable #SUSP pin */
-	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
+	/* Suspend on halt power saving */
+	setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x08);
 
 	ccr3 = getCx86(CX86_CCR3);
 	setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10);	/* enable MAPEN */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems
  2025-02-25 21:41   ` [PATCH] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems Ingo Molnar
@ 2025-02-25 23:08     ` Russell Senior
  0 siblings, 0 replies; 5+ messages in thread
From: Russell Senior @ 2025-02-25 23:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Matthew Whitehead, linux-kernel, tglx, luto, Jonas Gorski

On Tue, Feb 25, 2025 at 1:42 PM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Russell Senior <russell@personaltelco.net> wrote:
>
> > More poking/prodding and coaching from Jonas Gorski (cc'd), it looks
> > like this test patch fixes the problem on my board: Tested against
> > v6.6.67 and v4.14.113:
> >
> > --- a/arch/x86/kernel/cpu/cyrix.c
> > +++ b/arch/x86/kernel/cpu/cyrix.c
> > @@ -153,8 +153,8 @@ static void geode_configure(void)
> >         u8 ccr3;
> >         local_irq_save(flags);
> >
> > -       /* Suspend on halt power saving and enable #SUSP pin */
> > -       setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
> > +       /* Suspend on halt power saving */
> > +       setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x08);
> >
> >         ccr3 = getCx86(CX86_CCR3);
> >         setCx86(CX86_CCR3, (ccr3 & 0x0f) | 0x10);       /* enable MAPEN */
>
> That's really useful - thank you!
>
> I've constructed a fix patch from your mails, attached below. I added
> your Signed-off-by to the fix, let me know if that's OK with you.

That's OK with me.

>
> I have applied your fix to the x86 tree, if everything goes fine it
> ought to go upstream during the next merge window in ~4 weeks, with
> v6.15.
>
> Thanks,
>
>         Ingo
>

Thank you!

-- 
Russell Senior
russell@personaltelco.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-02-25 23:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-24  1:04 [PATCH 1/2] x86/CPU: Use correct macros for Cyrix calls on Geode processors Russell Senior
2024-12-24 12:17 ` Russell Senior
2025-02-25 21:41   ` [PATCH] x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems Ingo Molnar
2025-02-25 23:08     ` Russell Senior
2025-02-25 21:52   ` [tip: x86/urgent] " tip-bot2 for Russell Senior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.