* {PATCH] Firmware update using the update_flash -f <filename> results to soft lockup BUG
@ 2011-07-07 5:27 divya
2011-07-07 6:12 ` Michael Neuling
0 siblings, 1 reply; 4+ messages in thread
From: divya @ 2011-07-07 5:27 UTC (permalink / raw)
To: linuxppc-dev
Cc: antonb, naveedaus, kenistoj, chavez, arunbal, srikar, jlarrew,
lxie, mahesh.salgaonkar, Subrata Modak, suzukikp
Hi ,
Problem Description:
Firmware update using the update_flash -f<filename> results to soft lockup
BUG
FLASH: preparing saved firmware image for flash
FLASH: flash image is 50141296 bytes
FLASH: performing flash and reboot
FLASH: this will take several minutes. Do not power off!
BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
Steps to reproduce:
1. Check the firmware information on the machine (using ASM or lsmcode)
2. Update the system firmware with the update_flash command
update_flash -f 01FL350_039_038.img
info: Temporary side will be updated with a newer or
identical image
Projected Flash Update Results:
Current T Image: FL350_039
Current P Image: FL350_039
New T Image: FL350_039
New P Image: FL350_039
Flash image ready...rebooting the system...
Broadcast message from root@abc
(/dev/hvc0) at 5:25 ...
The system is going down for reboot NOW!
[root@abc /]# Stopping rhsmcertd[ OK ]
Stopping atd: [ OK ]
Stopping cups: [ OK ]
Stopping abrt daemon: [ OK ]
Stopping sshd: [ OK ]
Shutting down postfix: [ OK ]
Stopping rtas_errd (platform error handling) daemon: [ OK ]
Stopping crond: [ OK ]
Stopping automount: [ OK ]
Stopping HAL daemon: [ OK ]
Stopping iprdump: [ OK ]
Killing mdmonitor: [ OK ]]
Stopping system message bus: [ OK ]
Stopping rpcbind: [ OK ]
Stopping auditd: [ OK ]
Shutting down interface eth0: [ OK ]
Shutting down loopback interface: [ OK ]
ip6tables: Flushing firewall rules: [ OK ]
ip6tables: Setting chains to policy ACCEPT: filter [ OK ]
ip6tables: Unloading modules: [ OK ]
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
Sending all processes the TERM signal... [ OK ]
Sending all processes the KILL signal... [ OK ]
Saving random seed: [ OK ]
Turning off swap: [ OK ]
Turning off quotas: [ OK ]
Unmounting pipe file systems: [ OK ]
Unmounting file systems: [ OK ]
init: Re-executing /sbin/init
Please stand by while rebooting the system...
Restarting system.
FLASH: preparing saved firmware image for flash
FLASH: flash image is 50141296 bytes
FLASH: performing flash and reboot
FLASH: this will take several minutes. Do not power off!
BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
This is solved by the following patch
--- arch/powerpc/kernel/setup-common.c.orig 2011-07-01 22:41:12.952507971 -0400
+++ arch/powerpc/kernel/setup-common.c 2011-07-01 22:48:31.182507915 -0400
@@ -109,11 +109,12 @@ void machine_shutdown(void)
void machine_restart(char *cmd)
{
machine_shutdown();
- if (ppc_md.restart)
- ppc_md.restart(cmd);
#ifdef CONFIG_SMP
- smp_send_stop();
+ smp_send_stop();
#endif
+ if (ppc_md.restart)
+ ppc_md.restart(cmd);
+
printk(KERN_EMERG "System Halted, OK to turn off power\n");
local_irq_disable();
while (1) ;
Thanks
Divya
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: {PATCH] Firmware update using the update_flash -f <filename> results to soft lockup BUG
2011-07-07 5:27 {PATCH] Firmware update using the update_flash -f <filename> results to soft lockup BUG divya
@ 2011-07-07 6:12 ` Michael Neuling
2011-07-07 7:24 ` divya
0 siblings, 1 reply; 4+ messages in thread
From: Michael Neuling @ 2011-07-07 6:12 UTC (permalink / raw)
To: divya
Cc: suzukikp, antonb, naveedaus, kenistoj, chavez, arunbal,
linuxppc-dev, jlarrew, lxie, mahesh.salgaonkar, Subrata Modak,
srikar
In message <4E1543B6.9060800@linux.vnet.ibm.com> you wrote:
> Hi ,
>
> Problem Description:
> Firmware update using the update_flash -f<filename> results to soft lockup
> BUG
> FLASH: preparing saved firmware image for flash
> FLASH: flash image is 50141296 bytes
> FLASH: performing flash and reboot
> FLASH: this will take several minutes. Do not power off!
> BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
>
> Steps to reproduce:
> 1. Check the firmware information on the machine (using ASM or lsmcode)
> 2. Update the system firmware with the update_flash command
> update_flash -f 01FL350_039_038.img
> info: Temporary side will be updated with a newer or
> identical image
>
> Projected Flash Update Results:
> Current T Image: FL350_039
> Current P Image: FL350_039
> New T Image: FL350_039
> New P Image: FL350_039
> Flash image ready...rebooting the system...
>
> Broadcast message from root@abc
> (/dev/hvc0) at 5:25 ...
>
> The system is going down for reboot NOW!
> [root@abc /]# Stopping rhsmcertd[ OK ]
> Stopping atd: [ OK ]
> Stopping cups: [ OK ]
> Stopping abrt daemon: [ OK ]
> Stopping sshd: [ OK ]
> Shutting down postfix: [ OK ]
> Stopping rtas_errd (platform error handling) daemon: [ OK ]
> Stopping crond: [ OK ]
> Stopping automount: [ OK ]
> Stopping HAL daemon: [ OK ]
> Stopping iprdump: [ OK ]
> Killing mdmonitor: [ OK ]]
> Stopping system message bus: [ OK ]
> Stopping rpcbind: [ OK ]
> Stopping auditd: [ OK ]
> Shutting down interface eth0: [ OK ]
> Shutting down loopback interface: [ OK ]
> ip6tables: Flushing firewall rules: [ OK ]
> ip6tables: Setting chains to policy ACCEPT: filter [ OK ]
> ip6tables: Unloading modules: [ OK ]
> iptables: Flushing firewall rules: [ OK ]
> iptables: Setting chains to policy ACCEPT: filter [ OK ]
> iptables: Unloading modules: [ OK ]
> Sending all processes the TERM signal... [ OK ]
> Sending all processes the KILL signal... [ OK ]
> Saving random seed: [ OK ]
> Turning off swap: [ OK ]
> Turning off quotas: [ OK ]
> Unmounting pipe file systems: [ OK ]
> Unmounting file systems: [ OK ]
> init: Re-executing /sbin/init
> Please stand by while rebooting the system...
> Restarting system.
> FLASH: preparing saved firmware image for flash
> FLASH: flash image is 50141296 bytes
> FLASH: performing flash and reboot
> FLASH: this will take several minutes. Do not power off!
> BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
>
> This is solved by the following patch
Can you please explain how it fixes it?
Also you need a signed off by.
>
> --- arch/powerpc/kernel/setup-common.c.orig 2011-07-01 22:41:12.952507971 -
0400
> +++ arch/powerpc/kernel/setup-common.c 2011-07-01 22:48:31.182507915 -
0400
> @@ -109,11 +109,12 @@ void machine_shutdown(void)
> void machine_restart(char *cmd)
> {
> machine_shutdown();
> - if (ppc_md.restart)
> - ppc_md.restart(cmd);
> #ifdef CONFIG_SMP
> - smp_send_stop();
> + smp_send_stop();
Random white space change here.
> #endif
> + if (ppc_md.restart)
> + ppc_md.restart(cmd);
> +
> printk(KERN_EMERG "System Halted, OK to turn off power\n");
> local_irq_disable();
> while (1) ;
>
> Thanks
> Divya
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: {PATCH] Firmware update using the update_flash -f <filename> results to soft lockup BUG
2011-07-07 6:12 ` Michael Neuling
@ 2011-07-07 7:24 ` divya
2011-07-08 1:51 ` Michael Neuling
0 siblings, 1 reply; 4+ messages in thread
From: divya @ 2011-07-07 7:24 UTC (permalink / raw)
To: Michael Neuling
Cc: suzukikp, antonb, naveedaus, kenistoj, chavez, arunbal,
linuxppc-dev, jlarrew, lxie, mahesh.salgaonkar, Subrata Modak,
srikar
[-- Attachment #1: Type: text/plain, Size: 4748 bytes --]
On Thursday 07 July 2011 11:42 AM, Michael Neuling wrote:
> In message<4E1543B6.9060800@linux.vnet.ibm.com> you wrote:
>
>> Hi ,
>>
>> Problem Description:
>> Firmware update using the update_flash -f<filename> results to soft lockup
>> BUG
>> FLASH: preparing saved firmware image for flash
>> FLASH: flash image is 50141296 bytes
>> FLASH: performing flash and reboot
>> FLASH: this will take several minutes. Do not power off!
>> BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
>>
>> Steps to reproduce:
>> 1. Check the firmware information on the machine (using ASM or lsmcode)
>> 2. Update the system firmware with the update_flash command
>> update_flash -f 01FL350_039_038.img
>> info: Temporary side will be updated with a newer or
>> identical image
>>
>> Projected Flash Update Results:
>> Current T Image: FL350_039
>> Current P Image: FL350_039
>> New T Image: FL350_039
>> New P Image: FL350_039
>> Flash image ready...rebooting the system...
>>
>> Broadcast message from root@abc
>> (/dev/hvc0) at 5:25 ...
>>
>> The system is going down for reboot NOW!
>> [root@abc /]# Stopping rhsmcertd[ OK ]
>> Stopping atd: [ OK ]
>> Stopping cups: [ OK ]
>> Stopping abrt daemon: [ OK ]
>> Stopping sshd: [ OK ]
>> Shutting down postfix: [ OK ]
>> Stopping rtas_errd (platform error handling) daemon: [ OK ]
>> Stopping crond: [ OK ]
>> Stopping automount: [ OK ]
>> Stopping HAL daemon: [ OK ]
>> Stopping iprdump: [ OK ]
>> Killing mdmonitor: [ OK ]]
>> Stopping system message bus: [ OK ]
>> Stopping rpcbind: [ OK ]
>> Stopping auditd: [ OK ]
>> Shutting down interface eth0: [ OK ]
>> Shutting down loopback interface: [ OK ]
>> ip6tables: Flushing firewall rules: [ OK ]
>> ip6tables: Setting chains to policy ACCEPT: filter [ OK ]
>> ip6tables: Unloading modules: [ OK ]
>> iptables: Flushing firewall rules: [ OK ]
>> iptables: Setting chains to policy ACCEPT: filter [ OK ]
>> iptables: Unloading modules: [ OK ]
>> Sending all processes the TERM signal... [ OK ]
>> Sending all processes the KILL signal... [ OK ]
>> Saving random seed: [ OK ]
>> Turning off swap: [ OK ]
>> Turning off quotas: [ OK ]
>> Unmounting pipe file systems: [ OK ]
>> Unmounting file systems: [ OK ]
>> init: Re-executing /sbin/init
>> Please stand by while rebooting the system...
>> Restarting system.
>> FLASH: preparing saved firmware image for flash
>> FLASH: flash image is 50141296 bytes
>> FLASH: performing flash and reboot
>> FLASH: this will take several minutes. Do not power off!
>> BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
>>
>> This is solved by the following patch
>>
> Can you please explain how it fixes it?
>
Here goes the explanation for the fix.
The flash update is conducted with an RTAS call. The RTAS calls are serialized
by lock_rtas() which uses a spin_lock.
Now there is rtasd which keeps scanning for the RTAS events generated on the
machine. This is performed via workqueue mechanism. The rtas_event_scan() also
uses an RTAS call to scan the events, eventually taking the lock_rtas() before
it issues the request.
The flash update is an operation which takes long time, and hence while we are
at it, anyboy else who wants to make an RTAS call will have to wait until the
update is completed. Now in this case, the rtas_event_scan() is being kicked in
to check for events and it waits a long time on the spin_lock, getting us a
SOFT Lockup.
Before the rtas firmware update starts, all other CPUs should be stopped. Which
means no other CPU should be in lock_rtas(). We do not want other CPUs execute
while FW update is in progress and the system will be rebooted anyway after the
update.
> Also you need a signed off by.
>
> Signed-off-by: Divya <dipraksh@linux.vnet.ibm.com <mailto:dipraksh@linux.vnet.ibm.com>>
>
>
>> --- arch/powerpc/kernel/setup-common.c.orig 2011-07-01 22:41:12.952507971 -
>>
> 0400
>
>> +++ arch/powerpc/kernel/setup-common.c 2011-07-01 22:48:31.182507915 -
>>
> 0400
>
>> @@ -109,11 +109,12 @@ void machine_shutdown(void)
>> void machine_restart(char *cmd)
>> {
>> machine_shutdown();
>> - if (ppc_md.restart)
>> - ppc_md.restart(cmd);
>> #ifdef CONFIG_SMP
>> - smp_send_stop();
>> + smp_send_stop();
>>
> Random white space change here.
>
>
>> #endif
>> + if (ppc_md.restart)
>> + ppc_md.restart(cmd);
>> +
>> printk(KERN_EMERG "System Halted, OK to turn off power\n");
>> local_irq_disable();
>> while (1) ;
>>
>> Thanks
>> Divya
>>
>> _______________________________________________
>> Linuxppc-dev mailing list
>> Linuxppc-dev@lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-dev
>>
>>
[-- Attachment #2: Type: text/html, Size: 5647 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: {PATCH] Firmware update using the update_flash -f <filename> results to soft lockup BUG
2011-07-07 7:24 ` divya
@ 2011-07-08 1:51 ` Michael Neuling
0 siblings, 0 replies; 4+ messages in thread
From: Michael Neuling @ 2011-07-08 1:51 UTC (permalink / raw)
To: divya
Cc: antonb, naveedaus, suzukikp, chavez, arunbal, srikar,
linuxppc-dev, jlarrew, lxie, mahesh.salgaonkar, Subrata Modak,
kenistoj
In message <4E155F15.1030805@linux.vnet.ibm.com> you wrote:
> This is a multi-part message in MIME format.
> --===============3790206687486290502==
> Content-Type: multipart/alternative;
> boundary="------------080309090408040507080807"
>
> This is a multi-part message in MIME format.
> --------------080309090408040507080807
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
>
> On Thursday 07 July 2011 11:42 AM, Michael Neuling wrote:
> > In message<4E1543B6.9060800@linux.vnet.ibm.com> you wrote:
> >
> >> Hi ,
> >>
> >> Problem Description:
> >> Firmware update using the update_flash -f<filename> results to soft lock
up
> >> BUG
> >> FLASH: preparing saved firmware image for flash
> >> FLASH: flash image is 50141296 bytes
> >> FLASH: performing flash and reboot
> >> FLASH: this will take several minutes. Do not power off!
> >> BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
> >>
> >> Steps to reproduce:
> >> 1. Check the firmware information on the machine (using ASM or lsmcode)
> >> 2. Update the system firmware with the update_flash command
> >> update_flash -f 01FL350_039_038.img
> >> info: Temporary side will be updated with a newer or
> >> identical image
> >>
> >> Projected Flash Update Results:
> >> Current T Image: FL350_039
> >> Current P Image: FL350_039
> >> New T Image: FL350_039
> >> New P Image: FL350_039
> >> Flash image ready...rebooting the system...
> >>
> >> Broadcast message from root@abc
> >> (/dev/hvc0) at 5:25 ...
> >>
> >> The system is going down for reboot NOW!
> >> [root@abc /]# Stopping rhsmcertd[ OK ]
> >> Stopping atd: [ OK ]
> >> Stopping cups: [ OK ]
> >> Stopping abrt daemon: [ OK ]
> >> Stopping sshd: [ OK ]
> >> Shutting down postfix: [ OK ]
> >> Stopping rtas_errd (platform error handling) daemon: [ OK ]
> >> Stopping crond: [ OK ]
> >> Stopping automount: [ OK ]
> >> Stopping HAL daemon: [ OK ]
> >> Stopping iprdump: [ OK ]
> >> Killing mdmonitor: [ OK ]]
> >> Stopping system message bus: [ OK ]
> >> Stopping rpcbind: [ OK ]
> >> Stopping auditd: [ OK ]
> >> Shutting down interface eth0: [ OK ]
> >> Shutting down loopback interface: [ OK ]
> >> ip6tables: Flushing firewall rules: [ OK ]
> >> ip6tables: Setting chains to policy ACCEPT: filter [ OK ]
> >> ip6tables: Unloading modules: [ OK ]
> >> iptables: Flushing firewall rules: [ OK ]
> >> iptables: Setting chains to policy ACCEPT: filter [ OK ]
> >> iptables: Unloading modules: [ OK ]
> >> Sending all processes the TERM signal... [ OK ]
> >> Sending all processes the KILL signal... [ OK ]
> >> Saving random seed: [ OK ]
> >> Turning off swap: [ OK ]
> >> Turning off quotas: [ OK ]
> >> Unmounting pipe file systems: [ OK ]
> >> Unmounting file systems: [ OK ]
> >> init: Re-executing /sbin/init
> >> Please stand by while rebooting the system...
> >> Restarting system.
> >> FLASH: preparing saved firmware image for flash
> >> FLASH: flash image is 50141296 bytes
> >> FLASH: performing flash and reboot
> >> FLASH: this will take several minutes. Do not power off!
> >> BUG: soft lockup - CPU#1 stuck for 67s! [events/1:36]
> >>
> >> This is solved by the following patch
> >>
> > Can you please explain how it fixes it?
> >
> The flash update is conducted with an RTAS call. The RTAS calls are
> serialized by lock_rtas() which uses a spin_lock.
>
> Now there is rtasd which keeps scanning for the RTAS events generated
> on the machine. This is performed via workqueue mechanism. The
> rtas_event_scan() also uses an RTAS call to scan the events,
> eventually taking the lock_rtas() before it issues the request.
>
> The flash update is an operation which takes long time, and hence
> while we are at it, anyboy else who wants to make an RTAS call will
> have to wait until the update is completed. Now in this case, the
> rtas_event_scan() is being kicked in to check for events and it waits
> a long time on the spin_lock, getting us a SOFT Lockup.
What other RTAS calls are going on at this point? It worries me we are
stopping a CPU that's doing RTAS calls. Your solution would seem to be
papering over a more serious problem.
> Before the rtas firmware update starts, all other CPUs should be
> stopped. Which means no other CPU should be in lock_rtas(). We do not
> want other CPUs execute while FW update is in progress and the system
> will be rebooted anyway after the update.
Mikey
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-07-08 1:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-07-07 5:27 {PATCH] Firmware update using the update_flash -f <filename> results to soft lockup BUG divya
2011-07-07 6:12 ` Michael Neuling
2011-07-07 7:24 ` divya
2011-07-08 1:51 ` Michael Neuling
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).