* kdump: crash_kexec()-smp_send_stop() race in panic
@ 2011-10-24 14:55 Michael Holzheu
2011-10-24 15:14 ` Eric W. Biederman
0 siblings, 1 reply; 14+ messages in thread
From: Michael Holzheu @ 2011-10-24 14:55 UTC (permalink / raw)
To: Vivek Goyal
Cc: heiko.carstens, kexec, linux-kernel, ebiederm, schwidefsky, akpm
Hello Vivek,
In our tests we ran into the following scenario:
Two CPUs have called panic at the same time. The first CPU called
crash_kexec() and the second CPU called smp_send_stop() in panic()
before crash_kexec() finished on the first CPU. So the second CPU
stopped the first CPU and therefore kdump failed.
1st CPU:
panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump
2nd CPU:
panic()->crash_kexec()->kexec_mutex already held by 1st CPU
->smp_send_stop()-> stop CPU 1 (stop kdump)
How should we fix this problem? One possibility could be to do
smp_send_stop() before we call crash_kexec().
What do you think?
Michael
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-24 14:55 kdump: crash_kexec()-smp_send_stop() race in panic Michael Holzheu @ 2011-10-24 15:14 ` Eric W. Biederman 2011-10-24 15:23 ` Américo Wang 0 siblings, 1 reply; 14+ messages in thread From: Eric W. Biederman @ 2011-10-24 15:14 UTC (permalink / raw) To: holzheu; +Cc: heiko.carstens, kexec, linux-kernel, schwidefsky, akpm, Vivek Goyal Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > Hello Vivek, > > In our tests we ran into the following scenario: > > Two CPUs have called panic at the same time. The first CPU called > crash_kexec() and the second CPU called smp_send_stop() in panic() > before crash_kexec() finished on the first CPU. So the second CPU > stopped the first CPU and therefore kdump failed. > > 1st CPU: > panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump > > 2nd CPU: > panic()->crash_kexec()->kexec_mutex already held by 1st CPU > ->smp_send_stop()-> stop CPU 1 (stop kdump) > > How should we fix this problem? One possibility could be to do > smp_send_stop() before we call crash_kexec(). > > What do you think? smp_send_stop is insufficiently reliable to be used before crash_kexec. My first reaction would be to test oops_in_progress and wait until oops_in_progress == 1 before calling smp_send_stop. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-24 15:14 ` Eric W. Biederman @ 2011-10-24 15:23 ` Américo Wang 2011-10-24 17:07 ` Eric W. Biederman 0 siblings, 1 reply; 14+ messages in thread From: Américo Wang @ 2011-10-24 15:23 UTC (permalink / raw) To: Eric W. Biederman Cc: heiko.carstens, kexec, linux-kernel, schwidefsky, akpm, holzheu, Vivek Goyal On Mon, Oct 24, 2011 at 11:14 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > >> Hello Vivek, >> >> In our tests we ran into the following scenario: >> >> Two CPUs have called panic at the same time. The first CPU called >> crash_kexec() and the second CPU called smp_send_stop() in panic() >> before crash_kexec() finished on the first CPU. So the second CPU >> stopped the first CPU and therefore kdump failed. >> >> 1st CPU: >> panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump >> >> 2nd CPU: >> panic()->crash_kexec()->kexec_mutex already held by 1st CPU >> ->smp_send_stop()-> stop CPU 1 (stop kdump) >> >> How should we fix this problem? One possibility could be to do >> smp_send_stop() before we call crash_kexec(). >> >> What do you think? > > smp_send_stop is insufficiently reliable to be used before crash_kexec. > > My first reaction would be to test oops_in_progress and wait until > oops_in_progress == 1 before calling smp_send_stop. > +1 One of my colleague mentioned the same problem with me inside RH, given the fact that the race condition window is small, it would not be easy to reproduce this scenario. Thanks. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-24 15:23 ` Américo Wang @ 2011-10-24 17:07 ` Eric W. Biederman 2011-10-24 17:33 ` Vivek Goyal 2011-10-25 8:44 ` Michael Holzheu 0 siblings, 2 replies; 14+ messages in thread From: Eric W. Biederman @ 2011-10-24 17:07 UTC (permalink / raw) To: Américo Wang Cc: heiko.carstens, kexec, linux-kernel, schwidefsky, akpm, holzheu, Vivek Goyal Américo Wang <xiyou.wangcong@gmail.com> writes: > On Mon, Oct 24, 2011 at 11:14 PM, Eric W. Biederman > <ebiederm@xmission.com> wrote: >> Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: >> >>> Hello Vivek, >>> >>> In our tests we ran into the following scenario: >>> >>> Two CPUs have called panic at the same time. The first CPU called >>> crash_kexec() and the second CPU called smp_send_stop() in panic() >>> before crash_kexec() finished on the first CPU. So the second CPU >>> stopped the first CPU and therefore kdump failed. >>> >>> 1st CPU: >>> panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump >>> >>> 2nd CPU: >>> panic()->crash_kexec()->kexec_mutex already held by 1st CPU >>> ->smp_send_stop()-> stop CPU 1 (stop kdump) >>> >>> How should we fix this problem? One possibility could be to do >>> smp_send_stop() before we call crash_kexec(). >>> >>> What do you think? >> >> smp_send_stop is insufficiently reliable to be used before crash_kexec. >> >> My first reaction would be to test oops_in_progress and wait until >> oops_in_progress == 1 before calling smp_send_stop. >> > > +1 > > One of my colleague mentioned the same problem with me inside > RH, given the fact that the race condition window is small, it would > not be easy to reproduce this scenario. As for reproducing it I have a hunch you could hack up something horrible with smp_call_function and kprobes. On a little more reflection we can't wait until oops_in_progress goes to 1 before calling smp_send_stop. Because if crash_kexec is not involved nothing we will never call smp_send_stop. So my second thought is to introduce another atomic variable panic_in_progress, visible only in panic. The cpu that sets increments panic_in_progress can call smp_send_stop. The rest of the cpus can just go into a busy wait. That should stop nasty fights about who is going to come out of smp_send_stop first. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-24 17:07 ` Eric W. Biederman @ 2011-10-24 17:33 ` Vivek Goyal 2011-10-24 22:24 ` Seiji Aguchi 2011-10-25 8:44 ` Michael Holzheu 1 sibling, 1 reply; 14+ messages in thread From: Vivek Goyal @ 2011-10-24 17:33 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, heiko.carstens, linux-kernel, schwidefsky, Américo Wang, akpm, holzheu On Mon, Oct 24, 2011 at 10:07:19AM -0700, Eric W. Biederman wrote: > Américo Wang <xiyou.wangcong@gmail.com> writes: > > > On Mon, Oct 24, 2011 at 11:14 PM, Eric W. Biederman > > <ebiederm@xmission.com> wrote: > >> Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > >> > >>> Hello Vivek, > >>> > >>> In our tests we ran into the following scenario: > >>> > >>> Two CPUs have called panic at the same time. The first CPU called > >>> crash_kexec() and the second CPU called smp_send_stop() in panic() > >>> before crash_kexec() finished on the first CPU. So the second CPU > >>> stopped the first CPU and therefore kdump failed. > >>> > >>> 1st CPU: > >>> panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump > >>> > >>> 2nd CPU: > >>> panic()->crash_kexec()->kexec_mutex already held by 1st CPU > >>> ->smp_send_stop()-> stop CPU 1 (stop kdump) > >>> > >>> How should we fix this problem? One possibility could be to do > >>> smp_send_stop() before we call crash_kexec(). > >>> > >>> What do you think? > >> > >> smp_send_stop is insufficiently reliable to be used before crash_kexec. > >> > >> My first reaction would be to test oops_in_progress and wait until > >> oops_in_progress == 1 before calling smp_send_stop. > >> > > > > +1 > > > > One of my colleague mentioned the same problem with me inside > > RH, given the fact that the race condition window is small, it would > > not be easy to reproduce this scenario. > > As for reproducing it I have a hunch you could hack up something > horrible with smp_call_function and kprobes. > > > On a little more reflection we can't wait until oops_in_progress goes > to 1 before calling smp_send_stop. Because if crash_kexec is not > involved nothing we will never call smp_send_stop. > > So my second thought is to introduce another atomic variable > panic_in_progress, visible only in panic. The cpu that sets > increments panic_in_progress can call smp_send_stop. The rest of > the cpus can just go into a busy wait. That should stop nasty > fights about who is going to come out of smp_send_stop first. Introducing panic_on_oops atomic sounds good. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-24 17:33 ` Vivek Goyal @ 2011-10-24 22:24 ` Seiji Aguchi 2011-10-25 8:33 ` Michael Holzheu 0 siblings, 1 reply; 14+ messages in thread From: Seiji Aguchi @ 2011-10-24 22:24 UTC (permalink / raw) To: Vivek Goyal, Eric W. Biederman Cc: kexec@lists.infradead.org, heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org, schwidefsky@de.ibm.com, Américo Wang, akpm@linux-foundation.org, holzheu@linux.vnet.ibm.com Hi, >> >>> 1st CPU: >> >>> panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump >> >>> >> >>> 2nd CPU: >> >>> panic()->crash_kexec()->kexec_mutex already held by 1st CPU >> >>> ->smp_send_stop()-> stop CPU 1 (stop kdump) >> >>> >> >>> How should we fix this problem? One possibility could be to do >> >>> smp_send_stop() before we call crash_kexec(). http://lkml.org/lkml/2010/9/16/353 I developed a patch solving this issue one year ago. (Just adding local_irq_disable in kexec path.) I hope this helps. Seiji _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-24 22:24 ` Seiji Aguchi @ 2011-10-25 8:33 ` Michael Holzheu 0 siblings, 0 replies; 14+ messages in thread From: Michael Holzheu @ 2011-10-25 8:33 UTC (permalink / raw) To: Seiji Aguchi Cc: kexec@lists.infradead.org, heiko.carstens@de.ibm.com, linux-kernel@vger.kernel.org, Eric W. Biederman, schwidefsky@de.ibm.com, Américo Wang, akpm@linux-foundation.org, Vivek Goyal Hello Seiji, On Mon, 2011-10-24 at 18:24 -0400, Seiji Aguchi wrote: > Hi, > > >> >>> 1st CPU: > >> >>> panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump > >> >>> > >> >>> 2nd CPU: > >> >>> panic()->crash_kexec()->kexec_mutex already held by 1st CPU > >> >>> ->smp_send_stop()-> stop CPU 1 (stop kdump) > >> >>> > >> >>> How should we fix this problem? One possibility could be to do > >> >>> smp_send_stop() before we call crash_kexec(). > > http://lkml.org/lkml/2010/9/16/353 > > I developed a patch solving this issue one year ago. > (Just adding local_irq_disable in kexec path.) This won't work (at least on s390) because smp_send_stop() will also stop CPUs that have interrupts disabled. Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-24 17:07 ` Eric W. Biederman 2011-10-24 17:33 ` Vivek Goyal @ 2011-10-25 8:44 ` Michael Holzheu 2011-10-25 12:04 ` Eric W. Biederman 1 sibling, 1 reply; 14+ messages in thread From: Michael Holzheu @ 2011-10-25 8:44 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, heiko.carstens, linux-kernel, schwidefsky, Américo Wang, akpm, Vivek Goyal Hello Eric, On Mon, 2011-10-24 at 10:07 -0700, Eric W. Biederman wrote: [snip] > So my second thought is to introduce another atomic variable > panic_in_progress, visible only in panic. The cpu that sets > increments panic_in_progress can call smp_send_stop. The rest of > the cpus can just go into a busy wait. That should stop nasty > fights about who is going to come out of smp_send_stop first. So this is a spinlock, no? What about the following patch: --- kernel/panic.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/kernel/panic.c +++ b/kernel/panic.c @@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink); */ NORET_TYPE void panic(const char * fmt, ...) { + static DEFINE_SPINLOCK(panic_lock); static char buf[1024]; va_list args; long i, i_next = 0; @@ -68,8 +69,12 @@ NORET_TYPE void panic(const char * fmt, * It's possible to come here directly from a panic-assertion and * not have preempt disabled. Some functions called from here want * preempt to be disabled. No point enabling it later though... + * + * Only one CPU is allowed to execute the panic code. For multiple + * parallel invocations of panic all other CPUs will wait on the + * panic_lock. They are stopped afterwards by smp_send_stop(). */ - preempt_disable(); + spin_lock(&panic_lock); console_verbose(); bust_spinlocks(1); _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-25 8:44 ` Michael Holzheu @ 2011-10-25 12:04 ` Eric W. Biederman 2011-10-25 14:54 ` Vivek Goyal 2011-10-25 14:58 ` Michael Holzheu 0 siblings, 2 replies; 14+ messages in thread From: Eric W. Biederman @ 2011-10-25 12:04 UTC (permalink / raw) To: holzheu Cc: kexec, heiko.carstens, linux-kernel, schwidefsky, Américo Wang, akpm, Vivek Goyal Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > Hello Eric, > > On Mon, 2011-10-24 at 10:07 -0700, Eric W. Biederman wrote: > > [snip] > >> So my second thought is to introduce another atomic variable >> panic_in_progress, visible only in panic. The cpu that sets >> increments panic_in_progress can call smp_send_stop. The rest of >> the cpus can just go into a busy wait. That should stop nasty >> fights about who is going to come out of smp_send_stop first. > > So this is a spinlock, no? What about the following patch: Do we want both panic printks? We really only need the mutual exclusion starting just before smp_send_stop so that is where I would be inclined to put it. But yeah something like the below should work. Eric > --- > kernel/panic.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink); > */ > NORET_TYPE void panic(const char * fmt, ...) > { > + static DEFINE_SPINLOCK(panic_lock); > static char buf[1024]; > va_list args; > long i, i_next = 0; > @@ -68,8 +69,12 @@ NORET_TYPE void panic(const char * fmt, > * It's possible to come here directly from a panic-assertion and > * not have preempt disabled. Some functions called from here want > * preempt to be disabled. No point enabling it later though... > + * > + * Only one CPU is allowed to execute the panic code. For multiple > + * parallel invocations of panic all other CPUs will wait on the > + * panic_lock. They are stopped afterwards by smp_send_stop(). > */ > - preempt_disable(); > + spin_lock(&panic_lock); > > console_verbose(); > bust_spinlocks(1); _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-25 12:04 ` Eric W. Biederman @ 2011-10-25 14:54 ` Vivek Goyal 2011-10-25 14:58 ` Michael Holzheu 1 sibling, 0 replies; 14+ messages in thread From: Vivek Goyal @ 2011-10-25 14:54 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, heiko.carstens, linux-kernel, schwidefsky, Américo Wang, akpm, holzheu On Tue, Oct 25, 2011 at 05:04:57AM -0700, Eric W. Biederman wrote: > Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > > > Hello Eric, > > > > On Mon, 2011-10-24 at 10:07 -0700, Eric W. Biederman wrote: > > > > [snip] > > > >> So my second thought is to introduce another atomic variable > >> panic_in_progress, visible only in panic. The cpu that sets > >> increments panic_in_progress can call smp_send_stop. The rest of > >> the cpus can just go into a busy wait. That should stop nasty > >> fights about who is going to come out of smp_send_stop first. > > > > So this is a spinlock, no? What about the following patch: > Do we want both panic printks? > I guess having printk() from from both the panics would be nice. > We really only need the mutual exclusion starting just before > smp_send_stop so that is where I would be inclined to put it. > How about something just before crash_kexec()? I think there is not much point two cpus trying to execute crash_kexec() together. Thanks Vivek > But yeah something like the below should work. > > Eric > > > > --- > > kernel/panic.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > --- a/kernel/panic.c > > +++ b/kernel/panic.c > > @@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink); > > */ > > NORET_TYPE void panic(const char * fmt, ...) > > { > > + static DEFINE_SPINLOCK(panic_lock); > > static char buf[1024]; > > va_list args; > > long i, i_next = 0; > > @@ -68,8 +69,12 @@ NORET_TYPE void panic(const char * fmt, > > * It's possible to come here directly from a panic-assertion and > > * not have preempt disabled. Some functions called from here want > > * preempt to be disabled. No point enabling it later though... > > + * > > + * Only one CPU is allowed to execute the panic code. For multiple > > + * parallel invocations of panic all other CPUs will wait on the > > + * panic_lock. They are stopped afterwards by smp_send_stop(). > > */ > > - preempt_disable(); > > + spin_lock(&panic_lock); > > > > console_verbose(); > > bust_spinlocks(1); _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-25 12:04 ` Eric W. Biederman 2011-10-25 14:54 ` Vivek Goyal @ 2011-10-25 14:58 ` Michael Holzheu 2011-10-25 15:08 ` Vivek Goyal 1 sibling, 1 reply; 14+ messages in thread From: Michael Holzheu @ 2011-10-25 14:58 UTC (permalink / raw) To: Eric W. Biederman Cc: kexec, heiko.carstens, linux-kernel, schwidefsky, Américo Wang, akpm, Vivek Goyal On Tue, 2011-10-25 at 05:04 -0700, Eric W. Biederman wrote: > Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > > > Hello Eric, > > > > On Mon, 2011-10-24 at 10:07 -0700, Eric W. Biederman wrote: > > > > [snip] > > > >> So my second thought is to introduce another atomic variable > >> panic_in_progress, visible only in panic. The cpu that sets > >> increments panic_in_progress can call smp_send_stop. The rest of > >> the cpus can just go into a busy wait. That should stop nasty > >> fights about who is going to come out of smp_send_stop first. > > > > So this is a spinlock, no? What about the following patch: > Do we want both panic printks? Ok, good point. We proably should not change that. > We really only need the mutual exclusion starting just before > smp_send_stop so that is where I would be inclined to put it. I think to fix the race, at least we have the get the lock before we call crash_kexec(). Is the following patch ok for you? --- kernel/panic.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/kernel/panic.c +++ b/kernel/panic.c @@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink); */ NORET_TYPE void panic(const char * fmt, ...) { + static DEFINE_SPINLOCK(panic_lock); static char buf[1024]; va_list args; long i, i_next = 0; @@ -82,6 +83,13 @@ NORET_TYPE void panic(const char * fmt, #endif /* + * Only one CPU is allowed to execute the panic code from here. For + * multiple parallel invocations of panic all other CPUs will wait on + * the panic_lock. They are stopped afterwards by smp_send_stop(). + */ + spin_lock(&panic_lock); + + /* * If we have crashed and we have a crash kernel loaded let it handle * everything else. * Do we want to call this before we try to display a message? _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-25 14:58 ` Michael Holzheu @ 2011-10-25 15:08 ` Vivek Goyal 2011-10-25 15:28 ` Michael Holzheu 2011-10-25 15:28 ` Don Zickus 0 siblings, 2 replies; 14+ messages in thread From: Vivek Goyal @ 2011-10-25 15:08 UTC (permalink / raw) To: Michael Holzheu Cc: Don Zickus, kexec, heiko.carstens, linux-kernel, Eric W. Biederman, schwidefsky, Américo Wang, akpm On Tue, Oct 25, 2011 at 04:58:19PM +0200, Michael Holzheu wrote: > On Tue, 2011-10-25 at 05:04 -0700, Eric W. Biederman wrote: > > Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > > > > > Hello Eric, > > > > > > On Mon, 2011-10-24 at 10:07 -0700, Eric W. Biederman wrote: > > > > > > [snip] > > > > > >> So my second thought is to introduce another atomic variable > > >> panic_in_progress, visible only in panic. The cpu that sets > > >> increments panic_in_progress can call smp_send_stop. The rest of > > >> the cpus can just go into a busy wait. That should stop nasty > > >> fights about who is going to come out of smp_send_stop first. > > > > > > So this is a spinlock, no? What about the following patch: > > Do we want both panic printks? > > Ok, good point. We proably should not change that. > > > We really only need the mutual exclusion starting just before > > smp_send_stop so that is where I would be inclined to put it. > > I think to fix the race, at least we have the get the lock before we > call crash_kexec(). > > Is the following patch ok for you? > --- > kernel/panic.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink); > */ > NORET_TYPE void panic(const char * fmt, ...) > { > + static DEFINE_SPINLOCK(panic_lock); > static char buf[1024]; > va_list args; > long i, i_next = 0; > @@ -82,6 +83,13 @@ NORET_TYPE void panic(const char * fmt, > #endif > > /* > + * Only one CPU is allowed to execute the panic code from here. For > + * multiple parallel invocations of panic all other CPUs will wait on > + * the panic_lock. They are stopped afterwards by smp_send_stop(). > + */ > + spin_lock(&panic_lock); Why leave irqs enabled? Atleast for x86, Don Zickus had a patch to use NMI in smp_send_stop(). So that should work even if interrupts are disabled. (I think that patch is not merged yet). So are other architectures a concern? If yes, then may be in future we can make it an arch call which can also choose to disable interrupts. CCing Don also. This lock also brings in the serialization required for panic notifier list and kmsg_dump() infrastructure. Thanks Vivek _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-25 15:08 ` Vivek Goyal @ 2011-10-25 15:28 ` Michael Holzheu 2011-10-25 15:28 ` Don Zickus 1 sibling, 0 replies; 14+ messages in thread From: Michael Holzheu @ 2011-10-25 15:28 UTC (permalink / raw) To: Vivek Goyal Cc: Don Zickus, kexec, heiko.carstens, linux-kernel, Eric W. Biederman, schwidefsky, Américo Wang, akpm On Tue, 2011-10-25 at 11:08 -0400, Vivek Goyal wrote: > On Tue, Oct 25, 2011 at 04:58:19PM +0200, Michael Holzheu wrote: > > On Tue, 2011-10-25 at 05:04 -0700, Eric W. Biederman wrote: > > > Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: [snip] > > > > Is the following patch ok for you? > > --- > > kernel/panic.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > --- a/kernel/panic.c > > +++ b/kernel/panic.c > > @@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink); > > */ > > NORET_TYPE void panic(const char * fmt, ...) > > { > > + static DEFINE_SPINLOCK(panic_lock); > > static char buf[1024]; > > va_list args; > > long i, i_next = 0; > > @@ -82,6 +83,13 @@ NORET_TYPE void panic(const char * fmt, > > #endif > > > > /* > > + * Only one CPU is allowed to execute the panic code from here. For > > + * multiple parallel invocations of panic all other CPUs will wait on > > + * the panic_lock. They are stopped afterwards by smp_send_stop(). > > + */ > > + spin_lock(&panic_lock); > > Why leave irqs enabled? > > Atleast for x86, Don Zickus had a patch to use NMI in smp_send_stop(). So > that should work even if interrupts are disabled. (I think that patch is > not merged yet). > > So are other architectures a concern? If yes, then may be in future we > can make it an arch call which can also choose to disable interrupts. For s390 we could disable the interrupts here. smp_send_stop() works also when IRQs are disabled. But as you said - who knows if that is true on all architectures... Michael _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kdump: crash_kexec()-smp_send_stop() race in panic 2011-10-25 15:08 ` Vivek Goyal 2011-10-25 15:28 ` Michael Holzheu @ 2011-10-25 15:28 ` Don Zickus 1 sibling, 0 replies; 14+ messages in thread From: Don Zickus @ 2011-10-25 15:28 UTC (permalink / raw) To: Vivek Goyal Cc: kexec, heiko.carstens, linux-kernel, Eric W. Biederman, schwidefsky, Américo Wang, akpm, Michael Holzheu On Tue, Oct 25, 2011 at 11:08:30AM -0400, Vivek Goyal wrote: > On Tue, Oct 25, 2011 at 04:58:19PM +0200, Michael Holzheu wrote: > > On Tue, 2011-10-25 at 05:04 -0700, Eric W. Biederman wrote: > > > Michael Holzheu <holzheu@linux.vnet.ibm.com> writes: > > > > > > > Hello Eric, > > > > > > > > On Mon, 2011-10-24 at 10:07 -0700, Eric W. Biederman wrote: > > > > > > > > [snip] > > > > > > > >> So my second thought is to introduce another atomic variable > > > >> panic_in_progress, visible only in panic. The cpu that sets > > > >> increments panic_in_progress can call smp_send_stop. The rest of > > > >> the cpus can just go into a busy wait. That should stop nasty > > > >> fights about who is going to come out of smp_send_stop first. > > > > > > > > So this is a spinlock, no? What about the following patch: > > > Do we want both panic printks? > > > > Ok, good point. We proably should not change that. > > > > > We really only need the mutual exclusion starting just before > > > smp_send_stop so that is where I would be inclined to put it. > > > > I think to fix the race, at least we have the get the lock before we > > call crash_kexec(). > > > > Is the following patch ok for you? > > --- > > kernel/panic.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > --- a/kernel/panic.c > > +++ b/kernel/panic.c > > @@ -59,6 +59,7 @@ EXPORT_SYMBOL(panic_blink); > > */ > > NORET_TYPE void panic(const char * fmt, ...) > > { > > + static DEFINE_SPINLOCK(panic_lock); > > static char buf[1024]; > > va_list args; > > long i, i_next = 0; > > @@ -82,6 +83,13 @@ NORET_TYPE void panic(const char * fmt, > > #endif > > > > /* > > + * Only one CPU is allowed to execute the panic code from here. For > > + * multiple parallel invocations of panic all other CPUs will wait on > > + * the panic_lock. They are stopped afterwards by smp_send_stop(). > > + */ > > + spin_lock(&panic_lock); > > Why leave irqs enabled? > > Atleast for x86, Don Zickus had a patch to use NMI in smp_send_stop(). So > that should work even if interrupts are disabled. (I think that patch is > not merged yet). > > So are other architectures a concern? If yes, then may be in future we > can make it an arch call which can also choose to disable interrupts. > > CCing Don also. This lock also brings in the serialization required for > panic notifier list and kmsg_dump() infrastructure. This serializes panics, for kmsg_dump we wanted to serialize the shutdown path, IOW stop all the cpus realiably. This patch solves a different problem. Cheers, Don _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-10-25 15:29 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-10-24 14:55 kdump: crash_kexec()-smp_send_stop() race in panic Michael Holzheu 2011-10-24 15:14 ` Eric W. Biederman 2011-10-24 15:23 ` Américo Wang 2011-10-24 17:07 ` Eric W. Biederman 2011-10-24 17:33 ` Vivek Goyal 2011-10-24 22:24 ` Seiji Aguchi 2011-10-25 8:33 ` Michael Holzheu 2011-10-25 8:44 ` Michael Holzheu 2011-10-25 12:04 ` Eric W. Biederman 2011-10-25 14:54 ` Vivek Goyal 2011-10-25 14:58 ` Michael Holzheu 2011-10-25 15:08 ` Vivek Goyal 2011-10-25 15:28 ` Michael Holzheu 2011-10-25 15:28 ` Don Zickus
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox