* [RFC] removal of sync in panic
@ 2004-07-14 15:45 Christian Borntraeger
2004-07-14 16:23 ` Lars Marowsky-Bree
0 siblings, 1 reply; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-14 15:45 UTC (permalink / raw)
To: linux-kernel
I have a question regarding the sys_sync() call within the panic function.
--------snip---------------
printk(KERN_EMERG "Kernel panic: %s\n",buf);
if (in_interrupt())
printk(KERN_EMERG "In interrupt handler - not syncing\n");
else if (!current->pid)
printk(KERN_EMERG "In idle task - not syncing\n");
else
sys_sync(); <--------------------
bust_spinlocks(0);
#ifdef CONFIG_SMP
smp_send_stop();
#endif
--------------------------
I have seen panic failing two times lately on an SMP system. The box
panic'ed but was running happily on the other cpus. The culprit of this
failure is the fact, that these panics have been caused by a block device
or a filesystem (e.g. using errors=panic). In these cases the likelihood
of a failure/hang of sys_sync() is high. This is exactly what happened in
both cases I have seen. Meanwhile the other cpus are happily continuing
destroying data as the kernel has a severe problem but its not aware of
that as smp_send_stop happens after sys_sync.
I can imagine several changes but I am not sure if this is a problem which
must be fixed and which fix is the best.
Here are my alternatives:
1. remove sys_sync completely: syslogd and klogd use fsync. No need to help
them. Furthermore we have a severe problem which is worth a panic, so we
better dont do any I/O.
2. move smp_send_stop before sys_sync. This at least prevents other cpus of
doing harm if sys_sync hangs. Here I am not sure if this is really working.
3. Add an
if (doing_io())
printk(KERN_EMERG "In I/O routine - not syncing\n");
check like in_interrupt check. Unfortunately I have no clue how this can be
achieved and it looks quite ugly.
Thanks for any ideas and clarifications
cheers
Christian
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [RFC] removal of sync in panic
2004-07-14 15:45 [RFC] removal of sync in panic Christian Borntraeger
@ 2004-07-14 16:23 ` Lars Marowsky-Bree
2004-07-14 17:39 ` [PATCH] was: " Christian Borntraeger
0 siblings, 1 reply; 8+ messages in thread
From: Lars Marowsky-Bree @ 2004-07-14 16:23 UTC (permalink / raw)
To: Christian Borntraeger, linux-kernel
On 2004-07-14T17:45:46,
Christian Borntraeger <linux-kernel@borntraeger.net> said:
> I can imagine several changes but I am not sure if this is a problem which
> must be fixed and which fix is the best.
> Here are my alternatives:
>
> 1. remove sys_sync completely: syslogd and klogd use fsync. No need to help
> them. Furthermore we have a severe problem which is worth a panic, so we
> better dont do any I/O.
I've seen exactly the behaviour you describe and would be inclined to go
for this option too.
> 3. Add an
> if (doing_io())
> printk(KERN_EMERG "In I/O routine - not syncing\n");
> check like in_interrupt check. Unfortunately I have no clue how this can be
> achieved and it looks quite ugly.
This would also work of course, but as you point out, it's more complex.
Sincerely,
Lars Marowsky-Brée <lmb@suse.de>
--
High Availability & Clustering \ ever tried. ever failed. no matter.
SUSE Labs, Research and Development | try again. fail again. fail better.
SUSE LINUX AG - A Novell company \ -- Samuel Beckett
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH] was: [RFC] removal of sync in panic
2004-07-14 16:23 ` Lars Marowsky-Bree
@ 2004-07-14 17:39 ` Christian Borntraeger
2004-07-14 21:31 ` Andrew Morton
0 siblings, 1 reply; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-14 17:39 UTC (permalink / raw)
To: linux-kernel; +Cc: Lars Marowsky-Bree, Andrew Morton
Lars Marowsky-Bree wrote:
> > 1. remove sys_sync completely: syslogd and klogd use fsync. No need to
> > help them. Furthermore we have a severe problem which is worth a panic,
> > so we better dont do any I/O.
> I've seen exactly the behaviour you describe and would be inclined to go
> for this option too.
As this problem definitely exists, here is a patch.
--- linux-2.6.8-rc1/kernel/panic.c 2004-06-16 07:20:04.000000000 +0200
+++ linux-patch/kernel/panic.c 2004-07-14 19:37:02.000000000 +0200
@@ -59,13 +59,7 @@ NORET_TYPE void panic(const char * fmt,
va_start(args, fmt);
vsnprintf(buf, sizeof(buf), fmt, args);
va_end(args);
- printk(KERN_EMERG "Kernel panic: %s\n",buf);
- if (in_interrupt())
- printk(KERN_EMERG "In interrupt handler - not syncing\n");
- else if (!current->pid)
- printk(KERN_EMERG "In idle task - not syncing\n");
- else
- sys_sync();
+ printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
bust_spinlocks(0);
#ifdef CONFIG_SMP
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] was: [RFC] removal of sync in panic
2004-07-14 17:39 ` [PATCH] was: " Christian Borntraeger
@ 2004-07-14 21:31 ` Andrew Morton
2004-07-15 4:58 ` Christian Borntraeger
0 siblings, 1 reply; 8+ messages in thread
From: Andrew Morton @ 2004-07-14 21:31 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: linux-kernel, lmb
Christian Borntraeger <linux-kernel@borntraeger.net> wrote:
>
> Lars Marowsky-Bree wrote:
> > > 1. remove sys_sync completely: syslogd and klogd use fsync. No need to
> > > help them. Furthermore we have a severe problem which is worth a panic,
> > > so we better dont do any I/O.
>
> > I've seen exactly the behaviour you describe and would be inclined to go
> > for this option too.
>
> As this problem definitely exists, here is a patch.
>
> --- linux-2.6.8-rc1/kernel/panic.c 2004-06-16 07:20:04.000000000 +0200
> +++ linux-patch/kernel/panic.c 2004-07-14 19:37:02.000000000 +0200
> @@ -59,13 +59,7 @@ NORET_TYPE void panic(const char * fmt,
> va_start(args, fmt);
> vsnprintf(buf, sizeof(buf), fmt, args);
> va_end(args);
> - printk(KERN_EMERG "Kernel panic: %s\n",buf);
> - if (in_interrupt())
> - printk(KERN_EMERG "In interrupt handler - not syncing\n");
> - else if (!current->pid)
> - printk(KERN_EMERG "In idle task - not syncing\n");
> - else
> - sys_sync();
> + printk(KERN_EMERG "Kernel panic - not syncing: %s\n",buf);
> bust_spinlocks(0);
>
> #ifdef CONFIG_SMP
I agree with the patch in principle, but I'd be interested in what observed
problem motivated it?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] was: [RFC] removal of sync in panic
2004-07-14 21:31 ` Andrew Morton
@ 2004-07-15 4:58 ` Christian Borntraeger
2004-07-15 5:22 ` William Lee Irwin III
2004-07-17 19:01 ` Tim Wright
0 siblings, 2 replies; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-15 4:58 UTC (permalink / raw)
To: linux-kernel; +Cc: Andrew Morton, lmb
Andrew Morton wrote:
> I agree with the patch in principle, but I'd be interested in what
> observed problem motivated it?
see the first posting.
-----------snip--------------
I have seen panic failing two times lately on an SMP system. The box
panic'ed but was running happily on the other cpus. The culprit of this
failure is the fact, that these panics have been caused by a block device
or a filesystem (e.g. using errors=panic). In these cases the likelihood
of a failure/hang of sys_sync() is high. This is exactly what happened in
both cases I have seen. Meanwhile the other cpus are happily continuing
destroying data as the kernel has a severe problem but its not aware of
that as smp_send_stop happens after sys_sync.
I can imagine several changes but I am not sure if this is a problem which
must be fixed and which fix is the best.
Here are my alternatives:
1. remove sys_sync completely: syslogd and klogd use fsync. No need to help
them. Furthermore we have a severe problem which is worth a panic, so we
better dont do any I/O.
2. move smp_send_stop before sys_sync. This at least prevents other cpus of
doing harm if sys_sync hangs. Here I am not sure if this is really working.
3. Add an
if (doing_io())
printk(KERN_EMERG "In I/O routine - not syncing\n");
check like in_interrupt check. Unfortunately I have no clue how this can be
achieved and it looks quite ugly.
---------------------
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH] was: [RFC] removal of sync in panic
2004-07-15 4:58 ` Christian Borntraeger
@ 2004-07-15 5:22 ` William Lee Irwin III
2004-07-17 19:01 ` Tim Wright
1 sibling, 0 replies; 8+ messages in thread
From: William Lee Irwin III @ 2004-07-15 5:22 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: linux-kernel, Andrew Morton, lmb
On Thu, Jul 15, 2004 at 06:58:54AM +0200, Christian Borntraeger wrote:
> I have seen panic failing two times lately on an SMP system. The box
> panic'ed but was running happily on the other cpus. The culprit of this
> failure is the fact, that these panics have been caused by a block device
> or a filesystem (e.g. using errors=panic). In these cases the likelihood
> of a failure/hang of sys_sync() is high. This is exactly what happened in
> both cases I have seen. Meanwhile the other cpus are happily continuing
> destroying data as the kernel has a severe problem but its not aware of
> that as smp_send_stop happens after sys_sync.
I've seen SMP boxen run interrupt handlers for ages after panicking,
but I never thought much of it.
-- wli
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] was: [RFC] removal of sync in panic
2004-07-15 4:58 ` Christian Borntraeger
2004-07-15 5:22 ` William Lee Irwin III
@ 2004-07-17 19:01 ` Tim Wright
2004-07-18 7:34 ` Christian Borntraeger
1 sibling, 1 reply; 8+ messages in thread
From: Tim Wright @ 2004-07-17 19:01 UTC (permalink / raw)
To: Christian Borntraeger; +Cc: linux-kernel, Andrew Morton, lmb
Yes, I've seen this multiple times.
I also agree that it seems a sensible patch. I have one dumb question.
Given that we're panicing and we know things are "bad", is there any
reason not to call smp_send_stop() as early as possible, rather than as
the last thing which we currently do? As you say, the other cpus are
happily continuing, potentially destroying data, and it seems that
stopping this as quickly as possible would be desirable.
Tim
On Wed, 2004-07-14 at 21:58, Christian Borntraeger wrote:
> Andrew Morton wrote:
> > I agree with the patch in principle, but I'd be interested in what
> > observed problem motivated it?
>
> see the first posting.
>
> -----------snip--------------
> I have seen panic failing two times lately on an SMP system. The box
> panic'ed but was running happily on the other cpus. The culprit of this
> failure is the fact, that these panics have been caused by a block device
> or a filesystem (e.g. using errors=panic). In these cases the likelihood
> of a failure/hang of sys_sync() is high. This is exactly what happened in
> both cases I have seen. Meanwhile the other cpus are happily continuing
> destroying data as the kernel has a severe problem but its not aware of
> that as smp_send_stop happens after sys_sync.
>
> I can imagine several changes but I am not sure if this is a problem which
> must be fixed and which fix is the best.
> Here are my alternatives:
>
> 1. remove sys_sync completely: syslogd and klogd use fsync. No need to help
> them. Furthermore we have a severe problem which is worth a panic, so we
> better dont do any I/O.
> 2. move smp_send_stop before sys_sync. This at least prevents other cpus of
> doing harm if sys_sync hangs. Here I am not sure if this is really working.
> 3. Add an
> if (doing_io())
> printk(KERN_EMERG "In I/O routine - not syncing\n");
> check like in_interrupt check. Unfortunately I have no clue how this can be
> achieved and it looks quite ugly.
> ---------------------
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Tim Wright <timw@splhi.com>
Splhi
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] was: [RFC] removal of sync in panic
2004-07-17 19:01 ` Tim Wright
@ 2004-07-18 7:34 ` Christian Borntraeger
0 siblings, 0 replies; 8+ messages in thread
From: Christian Borntraeger @ 2004-07-18 7:34 UTC (permalink / raw)
To: linux-kernel; +Cc: Tim Wright, Andrew Morton, lmb
Tim Wright wrote:
> Yes, I've seen this multiple times.
> I also agree that it seems a sensible patch. I have one dumb question.
> Given that we're panicing and we know things are "bad", is there any
> reason not to call smp_send_stop() as early as possible, rather than as
> the last thing which we currently do? As you say, the other cpus are
> happily continuing, potentially destroying data, and it seems that
> stopping this as quickly as possible would be desirable.
That suggestion was my number 2 is my first mail :-)
On the other hand, if we remove the sync stuff, smp_send_stop is called
quite early. Only a printf is called before smp_send_stop().
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-07-18 7:34 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-14 15:45 [RFC] removal of sync in panic Christian Borntraeger
2004-07-14 16:23 ` Lars Marowsky-Bree
2004-07-14 17:39 ` [PATCH] was: " Christian Borntraeger
2004-07-14 21:31 ` Andrew Morton
2004-07-15 4:58 ` Christian Borntraeger
2004-07-15 5:22 ` William Lee Irwin III
2004-07-17 19:01 ` Tim Wright
2004-07-18 7:34 ` Christian Borntraeger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox