From: Tony Lindgren <tony@atomide.com>
To: "Andrew F. Davis" <afd@ti.com>
Cc: Tero Kristo <t-kristo@ti.com>, Keerthy <j-keerthy@ti.com>,
linux-omap@vger.kernel.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] ARM: omap2+: Revert omap-smp.c changes resetting cpu1 during boot
Date: Wed, 15 Feb 2017 14:27:11 -0800 [thread overview]
Message-ID: <20170215222711.GQ21809@atomide.com> (raw)
In-Reply-To: <ca389104-df6f-b055-ec43-524d82cf1be7@ti.com>
* Andrew F. Davis <afd@ti.com> [170215 14:14]:
> On 02/15/2017 01:12 PM, Tony Lindgren wrote:
> > * Tony Lindgren <tony@atomide.com> [170215 10:40]:
> >> * Tony Lindgren <tony@atomide.com> [170214 11:39]:
> >>> * Tony Lindgren <tony@atomide.com> [170213 13:51]:
> >>>> Commit 3251885285e1 ("ARM: OMAP4+: Reset CPU1 properly for kexec") started
> >>>> resetting cpu1 because of a kexec boot issue I was seeing earlier in 2016
> >>>> on omap4 when doing kexec boot between two different kernel versions. The
> >>>> booted kernel ended up trying to use the old kernel start-up address unless
> >>>> cpu1 was reset before configuring the cpu1 start-up address.
> >>>>
> >>>> It seems the reset part was not correct but probably working around some
> >>>> other issue. I have not been able to reproduce this issue any longer despite
> >>>> testing with backported patches back to v4.6 kernel. So it is possible this
> >>>> issue was caused by other work in progress kexec patches I had applied. Or
> >>>> it is possible some other fixes have made the issue go way.
> >>>>
> >>>> The unconditional reset of cpu1 can cause issues booting some devices. For
> >>>> example, bootloader configured secure OS running on cpu1 will fail as the
> >>>> configuration is not preserved as reported by Andrew F. Davis <afd@ti.com>.
> >>>>
> >>>> Let's fix the issue by reverting the cpu1 reset parts. If it turns out we
> >>>> still need to reset cpu1 in some cases, we can add it back and do it
> >>>> conditionally.
> >>>
> >>> Actually with this I'm now seeing cpu1 not come up after a suspend/resume
> >>> cycle on duovero:
> >>>
> >>> [ 118.257415] CPU1: shutdown
> >>> [ 118.294616] Error taking CPU1 up: -2
> >>> [ 118.299072] PM: noirq resume of devices complete after 3.723 msecs
> >>> [ 118.303802] PM: early resume of devices complete after 3.723 msecs
> >>>
> >>> So this issue needs to be investigated more.
> >>
> >> And then today the omap4 suspend/resume issue is no longer reproducable..
> >> Go figure.
> >>
> >> But then doing more testing I noticed that also omap5 needs the reset.
> >> Without it we get the following on omap5-uevm doing a kexec boot. So clearly
> >> the reset cannot be just removed at least for omap4 and omap5.
> >
> > And also the same issue happens doing kexec on beagle-x15 naturally if
> > the cpu1 reset is removed.
> >
>
> When a core actually powers up it idles in ROM code waiting for
> OMAP_AUX_CORE_BOOT_0 to be set. When we shutdown a core it is not really
> powered off, we just let it spin in omap4_cpu_die() or
> omap4_secondary_startup() waiting on OMAP_AUX_CORE_BOOT_0, just like if
> it were still trapped in ROM after a reset.
>
> The issue with this fake startup idle loop is that, unlike the ROM based
> startup idle loop, these do *not* jump to the address we stored in
> OMAP_AUX_CORE_BOOT_1, they just make the assumption that they can safely
> jump to the kernel startup function.
>
> So when we tell this core to boot, and it is not in the real ROM startup
> loop, it breaks stuff as it jumps to the old kernel's
> secondary_startup() even though we gave it the correct address in
> OMAP_AUX_CORE_BOOT_1.
Yes this is probably what's going on here. Note that the error I pasted
was booting the same kernel where that address should be correct though.
So there might be something else to it also.
> Reseting the core to put it back in the real ROM idle loop is wrong, the
> two idle loop functions above should be fixed to respect the address in
> OMAP_AUX_CORE_BOOT_1 and not to make assumptions, this should take care
> of the kexec failure in a sane way.
OK care to try to patch it as now you also have a reproducable test
case for kexec too?
Regards,
Tony
WARNING: multiple messages have this Message-ID (diff)
From: tony@atomide.com (Tony Lindgren)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH] ARM: omap2+: Revert omap-smp.c changes resetting cpu1 during boot
Date: Wed, 15 Feb 2017 14:27:11 -0800 [thread overview]
Message-ID: <20170215222711.GQ21809@atomide.com> (raw)
In-Reply-To: <ca389104-df6f-b055-ec43-524d82cf1be7@ti.com>
* Andrew F. Davis <afd@ti.com> [170215 14:14]:
> On 02/15/2017 01:12 PM, Tony Lindgren wrote:
> > * Tony Lindgren <tony@atomide.com> [170215 10:40]:
> >> * Tony Lindgren <tony@atomide.com> [170214 11:39]:
> >>> * Tony Lindgren <tony@atomide.com> [170213 13:51]:
> >>>> Commit 3251885285e1 ("ARM: OMAP4+: Reset CPU1 properly for kexec") started
> >>>> resetting cpu1 because of a kexec boot issue I was seeing earlier in 2016
> >>>> on omap4 when doing kexec boot between two different kernel versions. The
> >>>> booted kernel ended up trying to use the old kernel start-up address unless
> >>>> cpu1 was reset before configuring the cpu1 start-up address.
> >>>>
> >>>> It seems the reset part was not correct but probably working around some
> >>>> other issue. I have not been able to reproduce this issue any longer despite
> >>>> testing with backported patches back to v4.6 kernel. So it is possible this
> >>>> issue was caused by other work in progress kexec patches I had applied. Or
> >>>> it is possible some other fixes have made the issue go way.
> >>>>
> >>>> The unconditional reset of cpu1 can cause issues booting some devices. For
> >>>> example, bootloader configured secure OS running on cpu1 will fail as the
> >>>> configuration is not preserved as reported by Andrew F. Davis <afd@ti.com>.
> >>>>
> >>>> Let's fix the issue by reverting the cpu1 reset parts. If it turns out we
> >>>> still need to reset cpu1 in some cases, we can add it back and do it
> >>>> conditionally.
> >>>
> >>> Actually with this I'm now seeing cpu1 not come up after a suspend/resume
> >>> cycle on duovero:
> >>>
> >>> [ 118.257415] CPU1: shutdown
> >>> [ 118.294616] Error taking CPU1 up: -2
> >>> [ 118.299072] PM: noirq resume of devices complete after 3.723 msecs
> >>> [ 118.303802] PM: early resume of devices complete after 3.723 msecs
> >>>
> >>> So this issue needs to be investigated more.
> >>
> >> And then today the omap4 suspend/resume issue is no longer reproducable..
> >> Go figure.
> >>
> >> But then doing more testing I noticed that also omap5 needs the reset.
> >> Without it we get the following on omap5-uevm doing a kexec boot. So clearly
> >> the reset cannot be just removed at least for omap4 and omap5.
> >
> > And also the same issue happens doing kexec on beagle-x15 naturally if
> > the cpu1 reset is removed.
> >
>
> When a core actually powers up it idles in ROM code waiting for
> OMAP_AUX_CORE_BOOT_0 to be set. When we shutdown a core it is not really
> powered off, we just let it spin in omap4_cpu_die() or
> omap4_secondary_startup() waiting on OMAP_AUX_CORE_BOOT_0, just like if
> it were still trapped in ROM after a reset.
>
> The issue with this fake startup idle loop is that, unlike the ROM based
> startup idle loop, these do *not* jump to the address we stored in
> OMAP_AUX_CORE_BOOT_1, they just make the assumption that they can safely
> jump to the kernel startup function.
>
> So when we tell this core to boot, and it is not in the real ROM startup
> loop, it breaks stuff as it jumps to the old kernel's
> secondary_startup() even though we gave it the correct address in
> OMAP_AUX_CORE_BOOT_1.
Yes this is probably what's going on here. Note that the error I pasted
was booting the same kernel where that address should be correct though.
So there might be something else to it also.
> Reseting the core to put it back in the real ROM idle loop is wrong, the
> two idle loop functions above should be fixed to respect the address in
> OMAP_AUX_CORE_BOOT_1 and not to make assumptions, this should take care
> of the kexec failure in a sane way.
OK care to try to patch it as now you also have a reproducable test
case for kexec too?
Regards,
Tony
next prev parent reply other threads:[~2017-02-15 22:27 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-13 21:50 [PATCH] ARM: omap2+: Revert omap-smp.c changes resetting cpu1 during boot Tony Lindgren
2017-02-13 21:50 ` Tony Lindgren
2017-02-14 19:36 ` Tony Lindgren
2017-02-14 19:36 ` Tony Lindgren
2017-02-15 18:39 ` Tony Lindgren
2017-02-15 18:39 ` Tony Lindgren
2017-02-15 19:12 ` Tony Lindgren
2017-02-15 19:12 ` Tony Lindgren
2017-02-15 22:13 ` Andrew F. Davis
2017-02-15 22:13 ` Andrew F. Davis
2017-02-15 22:27 ` Tony Lindgren [this message]
2017-02-15 22:27 ` Tony Lindgren
2017-02-16 16:10 ` Tony Lindgren
2017-02-16 16:10 ` Tony Lindgren
2017-02-16 16:21 ` Tony Lindgren
2017-02-16 16:21 ` Tony Lindgren
2017-02-16 16:29 ` Andrew F. Davis
2017-02-16 16:29 ` Andrew F. Davis
2017-02-16 16:54 ` Tony Lindgren
2017-02-16 16:54 ` Tony Lindgren
2017-02-16 19:07 ` Tony Lindgren
2017-02-16 19:07 ` Tony Lindgren
2017-02-17 15:55 ` Tony Lindgren
2017-02-17 15:55 ` Tony Lindgren
2017-02-17 20:27 ` Andrew F. Davis
2017-02-17 20:27 ` Andrew F. Davis
2017-02-17 21:09 ` Tony Lindgren
2017-02-17 21:09 ` Tony Lindgren
-- strict thread matches above, loose matches on Subject: below --
2017-03-13 20:52 [PATCH] ARM: omap2+: Revert omap-smp.c changes resetting CPU1 " Tony Lindgren
2017-03-13 20:52 ` Tony Lindgren
2017-03-13 21:28 ` Andrew F. Davis
2017-03-13 21:28 ` Andrew F. Davis
2017-03-13 21:47 ` Tony Lindgren
2017-03-13 21:47 ` Tony Lindgren
2017-03-14 7:30 ` Tero Kristo
2017-03-14 7:30 ` Tero Kristo
2017-03-14 15:17 ` Tony Lindgren
2017-03-14 15:17 ` Tony Lindgren
2017-03-14 16:02 ` Andrew F. Davis
2017-03-14 16:02 ` Andrew F. Davis
2017-03-14 16:41 ` Tony Lindgren
2017-03-14 16:41 ` Tony Lindgren
2017-03-14 17:57 ` Andrew F. Davis
2017-03-14 17:57 ` Andrew F. Davis
2017-03-14 18:14 ` Tony Lindgren
2017-03-14 18:14 ` Tony Lindgren
2017-03-15 17:22 ` Tony Lindgren
2017-03-15 17:22 ` Tony Lindgren
2017-03-16 15:29 ` Tony Lindgren
2017-03-16 15:29 ` Tony Lindgren
2017-03-17 9:24 ` Russell King - ARM Linux
2017-03-17 9:24 ` Russell King - ARM Linux
2017-03-17 13:57 ` Tony Lindgren
2017-03-17 13:57 ` Tony Lindgren
2017-03-17 16:25 ` Andrew F. Davis
2017-03-17 16:25 ` Andrew F. Davis
2017-03-22 17:57 ` Tony Lindgren
2017-03-22 17:57 ` Tony Lindgren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170215222711.GQ21809@atomide.com \
--to=tony@atomide.com \
--cc=afd@ti.com \
--cc=j-keerthy@ti.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-omap@vger.kernel.org \
--cc=t-kristo@ti.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.