public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/0] Panic on softdog timeout
@ 2011-01-18 12:44 Anithra P Janakiraman
  2011-01-18 15:52 ` Américo Wang
  2011-01-18 16:35 ` Dave Hansen
  0 siblings, 2 replies; 5+ messages in thread
From: Anithra P Janakiraman @ 2011-01-18 12:44 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 305 bytes --]


Hi.

We currently have no way of determining the reason for failure when a 
softdog timeout occurs. At the minimum a snapshot of the system would 
help to determine the cause.
The attached patch invokes panic on softdog timeout iff kdump is 
configured, if kdump is not configured it works as usual.





[-- Attachment #2: softdogpanic --]
[-- Type: text/plain, Size: 1444 bytes --]

Signed-off-by: Anithra P J <anithra@linux.vnet.ibm.com>

---
 drivers/watchdog/softdog.c |   14 ++++++++++----
 kernel/kexec.c             |    1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

Index: linux-2.6.37-rc4/drivers/watchdog/softdog.c
===================================================================
--- linux-2.6.37-rc4.orig/drivers/watchdog/softdog.c
+++ linux-2.6.37-rc4/drivers/watchdog/softdog.c
@@ -48,6 +48,7 @@
 #include <linux/init.h>
 #include <linux/jiffies.h>
 #include <linux/uaccess.h>
+#include <linux/kexec.h>

 #define PFX "SoftDog: "

@@ -99,10 +100,15 @@
 	if (soft_noboot)
 		printk(KERN_CRIT PFX "Triggered - Reboot ignored.\n");
 	else {
-		printk(KERN_CRIT PFX "Initiating system reboot.\n");
-		emergency_restart();
-		printk(KERN_CRIT PFX "Reboot didn't ?????\n");
-	}
+		if (kexec_crash_image) {
+			printk(KERN_CRIT PFX "Initiating kdump. \n");
+			panic("Watchdog timer expired.");
+		} else {
+			printk(KERN_CRIT PFX "Initiating system reboot. \n");
+			emergency_restart();
+			printk(KERN_CRIT PFX "Reboot didn't ?????\n");
+		}
+	      }
 }

 /*
Index: linux-2.6.37-rc4/kernel/kexec.c
===================================================================
--- linux-2.6.37-rc4.orig/kernel/kexec.c
+++ linux-2.6.37-rc4/kernel/kexec.c
@@ -935,6 +935,7 @@
  */
 struct kimage *kexec_image;
 struct kimage *kexec_crash_image;
+EXPORT_SYMBOL(kexec_crash_image);

 static DEFINE_MUTEX(kexec_mutex);



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/0] Panic on softdog timeout
  2011-01-18 12:44 [PATCH 0/0] Panic on softdog timeout Anithra P Janakiraman
@ 2011-01-18 15:52 ` Américo Wang
  2011-01-20  9:09   ` Anithra P Janakiraman
  2011-01-18 16:35 ` Dave Hansen
  1 sibling, 1 reply; 5+ messages in thread
From: Américo Wang @ 2011-01-18 15:52 UTC (permalink / raw)
  To: Anithra P Janakiraman; +Cc: linux-kernel

On Tue, Jan 18, 2011 at 06:14:36PM +0530, Anithra P Janakiraman wrote:
>
>Hi.
>
>We currently have no way of determining the reason for failure when a
>softdog timeout occurs. At the minimum a snapshot of the system would
>help to determine the cause.
>The attached patch invokes panic on softdog timeout iff kdump is
>configured, if kdump is not configured it works as usual.
>

We don't do it in this way, check softlockup_panic, we have
a boot parameter, i.e. "softlockup_panic=". :)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/0] Panic on softdog timeout
  2011-01-18 12:44 [PATCH 0/0] Panic on softdog timeout Anithra P Janakiraman
  2011-01-18 15:52 ` Américo Wang
@ 2011-01-18 16:35 ` Dave Hansen
  2011-01-25 15:10   ` Anithra P Janakiraman
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Hansen @ 2011-01-18 16:35 UTC (permalink / raw)
  To: Anithra P Janakiraman; +Cc: linux-kernel, Balbir Singh

On Tue, 2011-01-18 at 18:14 +0530, Anithra P Janakiraman wrote:
> We currently have no way of determining the reason for failure when a 
> softdog timeout occurs. At the minimum a snapshot of the system would 
> help to determine the cause.
> The attached patch invokes panic on softdog timeout iff kdump is 
> configured, if kdump is not configured it works as usual.

This sounds like a decent idea.  But, is it something that should be a
bit more optional?  We currently have boot options for when to reboot or
panic for other things, and this is really the first use of
kexec_crash_image outside of kexec itself.  Is it really the best switch
to use?

Will this break anyone who expects a quick, clean, reboot and instead
gets a kdump?  Should we make _all_ emergency_restart()s use kdump?

You might have noticed, but your subject is a little wonky.  It should
probably just omit the 1/1 stuff when you only have a single patch
series.  The subject is pretty short and doesn't really explain what's
going on.  Could you beef it up a bit?

> @@ -48,6 +48,7 @@
>  #include <linux/init.h>
>  #include <linux/jiffies.h>
>  #include <linux/uaccess.h>
> +#include <linux/kexec.h>
> 
>  #define PFX "SoftDog: "
> 
> @@ -99,10 +100,15 @@
>         if (soft_noboot)
>                 printk(KERN_CRIT PFX "Triggered - Reboot ignored.\n");
>         else {
> -               printk(KERN_CRIT PFX "Initiating system reboot.\n");
> -               emergency_restart();
> -               printk(KERN_CRIT PFX "Reboot didn't ?????\n");
> -       }
> +               if (kexec_crash_image) {
> +                       printk(KERN_CRIT PFX "Initiating kdump. \n");
> +                       panic("Watchdog timer expired.");
> +               } else {
> +                       printk(KERN_CRIT PFX "Initiating system reboot. \n");
> +                       emergency_restart();
> +                       printk(KERN_CRIT PFX "Reboot didn't ?????\n");
> +               }
> +             }
>  }

The whitespace here is a bit damaged.  You might want to double-check
what your editor did to it.

Also, it's a bit more conventional to append patches to emails rather
than actually attach them.

Please also find some maintainers of this code or people you expect to
accept it, and cc them.  People are likely to miss this on LKML.

>  struct kimage *kexec_image;
>  struct kimage *kexec_crash_image;
> +EXPORT_SYMBOL(kexec_crash_image);

EXPORT_SYMBOL_GPL(), perhaps?

It also isn't _immediately_ obvious why you're doing this.  A quick
little blurb in the patch description would help.

-- Dave


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/0] Panic on softdog timeout
  2011-01-18 15:52 ` Américo Wang
@ 2011-01-20  9:09   ` Anithra P Janakiraman
  0 siblings, 0 replies; 5+ messages in thread
From: Anithra P Janakiraman @ 2011-01-20  9:09 UTC (permalink / raw)
  To: Américo Wang
  Cc: linux-kernel, Srikar Dronamraju, vatsa, Dave Hansen, Alan Cox,
	Ananth N Mavinakayanahalli

On 01/18/2011 09:22 PM, Américo Wang wrote:
> On Tue, Jan 18, 2011 at 06:14:36PM +0530, Anithra P Janakiraman wrote:
>>
>> Hi.
>>
>> We currently have no way of determining the reason for failure when a
>> softdog timeout occurs. At the minimum a snapshot of the system would
>> help to determine the cause.
>> The attached patch invokes panic on softdog timeout iff kdump is
>> configured, if kdump is not configured it works as usual.
>>
>
> We don't do it in this way, check softlockup_panic, we have
> a boot parameter, i.e. "softlockup_panic=". :)


Some softdog specific scenarios cannot be handled by a softlockup 
detector. We use softdog to watch for critical application failures, 
where it is possible that the application has failed but there isn't a 
softlockup as such.
For e.g. when doing high availability tests on applications, softdog is 
setup so that the timer is reset by an application thread. In case of 
the application failing the timer expires and causes a reboot. In such 
scenarios some information on what caused the  failure would be useful 
and i don't see how softlockup can be used. The patch i had sent would 
be useful in these cases. If I am missing something please do let me know.
I will make the modifications as suggested by Dave Hansen and post the 
patch shortly.

Anithra.



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 0/0] Panic on softdog timeout
  2011-01-18 16:35 ` Dave Hansen
@ 2011-01-25 15:10   ` Anithra P Janakiraman
  0 siblings, 0 replies; 5+ messages in thread
From: Anithra P Janakiraman @ 2011-01-25 15:10 UTC (permalink / raw)
  To: Dave Hansen; +Cc: linux-kernel

On 01/18/2011 10:05 PM, Dave Hansen wrote:
> On Tue, 2011-01-18 at 18:14 +0530, Anithra P Janakiraman wrote:
>    
>> We currently have no way of determining the reason for failure when a
>> softdog timeout occurs. At the minimum a snapshot of the system would
>> help to determine the cause.
>> The attached patch invokes panic on softdog timeout iff kdump is
>> configured, if kdump is not configured it works as usual.
>>      
> This sounds like a decent idea.  But, is it something that should be a
> bit more optional?  We currently have boot options for when to reboot or
> panic for other things, and this is really the first use of
> kexec_crash_image outside of kexec itself.  Is it really the best switch
> to use?
>
> Will this break anyone who expects a quick, clean, reboot and instead
> gets a kdump?  Should we make _all_ emergency_restart()s use kdump?
>
> You might have noticed, but your subject is a little wonky.  It should
> probably just omit the 1/1 stuff when you only have a single patch
> series.  The subject is pretty short and doesn't really explain what's
> going on.  Could you beef it up a bit?
>
>    
>

Thanks for looking at it and for the comments. I've sent a
version 2 of the patch that hopefully addresses all your
comments.

link:
http://permalink.gmane.org/gmane.linux.kernel/1091282

For some strange reason i'm unable to find a link to my
mail on lkml.org. I see emails only upto the 23rd of Jan.

Regards,
Anithra.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-01-25 15:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-18 12:44 [PATCH 0/0] Panic on softdog timeout Anithra P Janakiraman
2011-01-18 15:52 ` Américo Wang
2011-01-20  9:09   ` Anithra P Janakiraman
2011-01-18 16:35 ` Dave Hansen
2011-01-25 15:10   ` Anithra P Janakiraman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox