public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC] removal of sync in panic
@ 2004-07-14 15:45 Christian Borntraeger
  2004-07-14 16:23 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-14 15:45 UTC (permalink / raw)
  To: linux-kernel

I have a question regarding the sys_sync() call within the panic function. 
--------snip---------------
        printk(KERN_EMERG "Kernel panic: %s\n",buf);
        if (in_interrupt())
                printk(KERN_EMERG "In interrupt handler - not syncing\n");
        else if (!current->pid)
                printk(KERN_EMERG "In idle task - not syncing\n");
        else
                sys_sync(); <--------------------
        bust_spinlocks(0);

#ifdef CONFIG_SMP
        smp_send_stop();
#endif
--------------------------


I have seen panic failing two times lately on an SMP system. The box 
panic'ed but was running happily on the other cpus. The culprit of this 
failure is the fact, that these panics have been caused by a block device 
or a filesystem (e.g. using errors=panic). In these cases the  likelihood 
of a failure/hang of  sys_sync() is high. This is exactly what happened in 
both cases I have seen. Meanwhile the other cpus are happily continuing  
destroying data as the kernel has a severe problem but its not aware of 
that as smp_send_stop happens after sys_sync.

I can imagine several changes but I am not sure if this is a problem which 
must be fixed and which fix is the best.
Here are my alternatives:

1. remove sys_sync completely: syslogd and klogd use fsync. No need to help 
them. Furthermore we have a severe problem which is worth a panic, so we 
better dont do any I/O.
2. move smp_send_stop before sys_sync. This at least prevents other cpus of 
doing harm if sys_sync hangs. Here I am not sure if this is really working.
3. Add an 
        if (doing_io())
                printk(KERN_EMERG "In I/O routine - not syncing\n");
check like in_interrupt check. Unfortunately I have no clue how this can be 
achieved and it looks quite ugly.

Thanks for any ideas and clarifications
cheers

Christian





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC] removal of sync in panic
  2004-07-14 15:45 [RFC] removal of sync in panic Christian Borntraeger
@ 2004-07-14 16:23 ` Lars Marowsky-Bree
  2004-07-14 17:39   ` [PATCH] was: " Christian Borntraeger
  0 siblings, 1 reply; 8+ messages in thread
From: Lars Marowsky-Bree @ 2004-07-14 16:23 UTC (permalink / raw)
  To: Christian Borntraeger, linux-kernel

On 2004-07-14T17:45:46,
   Christian Borntraeger <linux-kernel@borntraeger.net> said:

> I can imagine several changes but I am not sure if this is a problem which 
> must be fixed and which fix is the best.
> Here are my alternatives:
> 
> 1. remove sys_sync completely: syslogd and klogd use fsync. No need to help 
> them. Furthermore we have a severe problem which is worth a panic, so we 
> better dont do any I/O.

I've seen exactly the behaviour you describe and would be inclined to go
for this option too.

> 3. Add an 
>         if (doing_io())
>                 printk(KERN_EMERG "In I/O routine - not syncing\n");
> check like in_interrupt check. Unfortunately I have no clue how this can be 
> achieved and it looks quite ugly.

This would also work of course, but as you point out, it's more complex.


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering	    \ ever tried. ever failed. no matter.
SUSE Labs, Research and Development | try again. fail again. fail better.
SUSE LINUX AG - A Novell company    \ 	-- Samuel Beckett


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] was: [RFC] removal of sync in panic
  2004-07-14 16:23 ` Lars Marowsky-Bree
@ 2004-07-14 17:39   ` Christian Borntraeger
  2004-07-14 21:31     ` Andrew Morton
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-14 17:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: Lars Marowsky-Bree, Andrew Morton

Lars Marowsky-Bree wrote:
> > 1. remove sys_sync completely: syslogd and klogd use fsync. No need to
> > help them. Furthermore we have a severe problem which is worth a panic,
> > so we better dont do any I/O.

> I've seen exactly the behaviour you describe and would be inclined to go
> for this option too.

As this problem definitely exists, here is a patch. 

--- linux-2.6.8-rc1/kernel/panic.c      2004-06-16 07:20:04.000000000 +0200
+++ linux-patch/kernel/panic.c  2004-07-14 19:37:02.000000000 +0200
@@ -59,13 +59,7 @@ NORET_TYPE void panic(const char * fmt,
        va_start(args, fmt);
        vsnprintf(buf, sizeof(buf), fmt, args);
        va_end(args);
-       printk(KERN_EMERG "Kernel panic: %s\n",buf);
-       if (in_interrupt())
-               printk(KERN_EMERG "In interrupt handler - not syncing\n");
-       else if (!current->pid)
-               printk(KERN_EMERG "In idle task - not syncing\n");
-       else
-               sys_sync();
+       printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
        bust_spinlocks(0);

 #ifdef CONFIG_SMP



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] was: [RFC] removal of sync in panic
  2004-07-14 17:39   ` [PATCH] was: " Christian Borntraeger
@ 2004-07-14 21:31     ` Andrew Morton
  2004-07-15  4:58       ` Christian Borntraeger
  0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2004-07-14 21:31 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-kernel, lmb

Christian Borntraeger <linux-kernel@borntraeger.net> wrote:
>
> Lars Marowsky-Bree wrote:
> > > 1. remove sys_sync completely: syslogd and klogd use fsync. No need to
> > > help them. Furthermore we have a severe problem which is worth a panic,
> > > so we better dont do any I/O.
> 
> > I've seen exactly the behaviour you describe and would be inclined to go
> > for this option too.
> 
> As this problem definitely exists, here is a patch. 
> 
> --- linux-2.6.8-rc1/kernel/panic.c      2004-06-16 07:20:04.000000000 +0200
> +++ linux-patch/kernel/panic.c  2004-07-14 19:37:02.000000000 +0200
> @@ -59,13 +59,7 @@ NORET_TYPE void panic(const char * fmt,
>         va_start(args, fmt);
>         vsnprintf(buf, sizeof(buf), fmt, args);
>         va_end(args);
> -       printk(KERN_EMERG "Kernel panic: %s\n",buf);
> -       if (in_interrupt())
> -               printk(KERN_EMERG "In interrupt handler - not syncing\n");
> -       else if (!current->pid)
> -               printk(KERN_EMERG "In idle task - not syncing\n");
> -       else
> -               sys_sync();
> +       printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
>         bust_spinlocks(0);
> 
>  #ifdef CONFIG_SMP

I agree with the patch in principle, but I'd be interested in what observed
problem motivated it?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] was: [RFC] removal of sync in panic
  2004-07-14 21:31     ` Andrew Morton
@ 2004-07-15  4:58       ` Christian Borntraeger
  2004-07-15  5:22         ` William Lee Irwin III
  2004-07-17 19:01         ` Tim Wright
  0 siblings, 2 replies; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-15  4:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, lmb

Andrew Morton wrote:
> I agree with the patch in principle, but I'd be interested in what
> observed problem motivated it?

see the first posting.

-----------snip--------------
I have seen panic failing two times lately on an SMP system. The box 
panic'ed but was running happily on the other cpus. The culprit of this 
failure is the fact, that these panics have been caused by a block device 
or a filesystem (e.g. using errors=panic). In these cases the  likelihood 
of a failure/hang of  sys_sync() is high. This is exactly what happened in 
both cases I have seen. Meanwhile the other cpus are happily continuing  
destroying data as the kernel has a severe problem but its not aware of 
that as smp_send_stop happens after sys_sync.

I can imagine several changes but I am not sure if this is a problem which 
must be fixed and which fix is the best.
Here are my alternatives:

1. remove sys_sync completely: syslogd and klogd use fsync. No need to help 
them. Furthermore we have a severe problem which is worth a panic, so we 
better dont do any I/O.
2. move smp_send_stop before sys_sync. This at least prevents other cpus of 
doing harm if sys_sync hangs. Here I am not sure if this is really working.
3. Add an 
        if (doing_io())
                printk(KERN_EMERG "In I/O routine - not syncing\n");
check like in_interrupt check. Unfortunately I have no clue how this can be 
achieved and it looks quite ugly.
---------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] was: [RFC] removal of sync in panic
  2004-07-15  4:58       ` Christian Borntraeger
@ 2004-07-15  5:22         ` William Lee Irwin III
  2004-07-17 19:01         ` Tim Wright
  1 sibling, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2004-07-15  5:22 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-kernel, Andrew Morton, lmb

On Thu, Jul 15, 2004 at 06:58:54AM +0200, Christian Borntraeger wrote:
> I have seen panic failing two times lately on an SMP system. The box 
> panic'ed but was running happily on the other cpus. The culprit of this 
> failure is the fact, that these panics have been caused by a block device 
> or a filesystem (e.g. using errors=panic). In these cases the  likelihood 
> of a failure/hang of  sys_sync() is high. This is exactly what happened in 
> both cases I have seen. Meanwhile the other cpus are happily continuing  
> destroying data as the kernel has a severe problem but its not aware of 
> that as smp_send_stop happens after sys_sync.

I've seen SMP boxen run interrupt handlers for ages after panicking,
but I never thought much of it.


-- wli

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] was: [RFC] removal of sync in panic
  2004-07-15  4:58       ` Christian Borntraeger
  2004-07-15  5:22         ` William Lee Irwin III
@ 2004-07-17 19:01         ` Tim Wright
  2004-07-18  7:34           ` Christian Borntraeger
  1 sibling, 1 reply; 8+ messages in thread
From: Tim Wright @ 2004-07-17 19:01 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: linux-kernel, Andrew Morton, lmb

Yes, I've seen this multiple times.
I also agree that it seems a sensible patch. I have one dumb question.
Given that we're panicing and we know things are "bad", is there any
reason not to call smp_send_stop() as early as possible, rather than as
the last thing which we currently do? As you say, the other cpus are
happily continuing, potentially destroying data, and it seems that
stopping this as quickly as possible would be desirable.

Tim

On Wed, 2004-07-14 at 21:58, Christian Borntraeger wrote:
> Andrew Morton wrote:
> > I agree with the patch in principle, but I'd be interested in what
> > observed problem motivated it?
> 
> see the first posting.
> 
> -----------snip--------------
> I have seen panic failing two times lately on an SMP system. The box 
> panic'ed but was running happily on the other cpus. The culprit of this 
> failure is the fact, that these panics have been caused by a block device 
> or a filesystem (e.g. using errors=panic). In these cases the  likelihood 
> of a failure/hang of  sys_sync() is high. This is exactly what happened in 
> both cases I have seen. Meanwhile the other cpus are happily continuing  
> destroying data as the kernel has a severe problem but its not aware of 
> that as smp_send_stop happens after sys_sync.
> 
> I can imagine several changes but I am not sure if this is a problem which 
> must be fixed and which fix is the best.
> Here are my alternatives:
> 
> 1. remove sys_sync completely: syslogd and klogd use fsync. No need to help 
> them. Furthermore we have a severe problem which is worth a panic, so we 
> better dont do any I/O.
> 2. move smp_send_stop before sys_sync. This at least prevents other cpus of 
> doing harm if sys_sync hangs. Here I am not sure if this is really working.
> 3. Add an 
>         if (doing_io())
>                 printk(KERN_EMERG "In I/O routine - not syncing\n");
> check like in_interrupt check. Unfortunately I have no clue how this can be 
> achieved and it looks quite ugly.
> ---------------------
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Tim Wright <timw@splhi.com>
Splhi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] was: [RFC] removal of sync in panic
  2004-07-17 19:01         ` Tim Wright
@ 2004-07-18  7:34           ` Christian Borntraeger
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-18  7:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tim Wright, Andrew Morton, lmb

Tim Wright wrote:
> Yes, I've seen this multiple times.
> I also agree that it seems a sensible patch. I have one dumb question.
> Given that we're panicing and we know things are "bad", is there any
> reason not to call smp_send_stop() as early as possible, rather than as
> the last thing which we currently do? As you say, the other cpus are
> happily continuing, potentially destroying data, and it seems that
> stopping this as quickly as possible would be desirable.

That suggestion was my number 2 is my first mail :-)

On the other hand, if we remove the sync stuff,  smp_send_stop is called 
quite early. Only a printf is called before smp_send_stop().  

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-07-18  7:34 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-14 15:45 [RFC] removal of sync in panic Christian Borntraeger
2004-07-14 16:23 ` Lars Marowsky-Bree
2004-07-14 17:39   ` [PATCH] was: " Christian Borntraeger
2004-07-14 21:31     ` Andrew Morton
2004-07-15  4:58       ` Christian Borntraeger
2004-07-15  5:22         ` William Lee Irwin III
2004-07-17 19:01         ` Tim Wright
2004-07-18  7:34           ` Christian Borntraeger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox