* [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock @ 2013-08-19 8:12 Yoshihiro YUNOMAE 2013-08-19 9:46 ` Ingo Molnar 0 siblings, 1 reply; 15+ messages in thread From: Yoshihiro YUNOMAE @ 2013-08-19 8:12 UTC (permalink / raw) To: linux-kernel Cc: Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, Eric W. Biederman, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Prevent crash_kexec() from deadlocking of ioapic_lock. When crash_kexec() is executed on a cpu, the cpu will get ioapic_lock in disable_IO_APIC(). So if the cpu gets NMI while locking ioapic_lock, a deadlock wiil happen. In this patch, ioapic_lock is initialized before disable_IO_APIC(). To confirm this deadlocking, you'll set up as follows: 1. Add mdelay(1000) after raw_spin_lock_irqsave() in native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c Although the deadlocking can occur without this modification, it will increase the potential of the deadlocking problem. 2. Build and install the kernel 3. Set up the OS which will run panic() and kexec when NMI is injected # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf # vim /etc/default/grub add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line # grub2-mkconfig 4. Reboot the OS 5. Run following command for each vcpu on the guest # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done; By running this command, cpus will get ioapic_lock for setting affinity. 6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you use VM) After injecting NMI, panic() is called in an nmi-handler context. Then, kexec will normally run in panic(), but the operation will be stopped by deadlock of ioapic_lock in crash_kexec()->machine_crash_shutdown()-> native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()-> clear_IO_APIC_pin()->ioapic_read_entry(). Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Joerg Roedel <joro@8bytes.org> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org --- arch/x86/include/asm/apic.h | 2 ++ arch/x86/kernel/apic/io_apic.c | 5 +++++ arch/x86/kernel/crash.c | 4 ++++ 3 files changed, 11 insertions(+) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index f8119b5..ddb06af 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -715,4 +715,6 @@ static inline void exiting_ack_irq(void) ack_APIC_irq(); } +extern void ioapic_lock_init(void); + #endif /* _ASM_X86_APIC_H */ diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 9ed796c..2816c07 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1534,6 +1534,11 @@ void intel_ir_io_apic_print_entries(unsigned int apic, } } +void ioapic_lock_init(void) +{ + raw_spin_lock_init(&ioapic_lock); +} + __apicdebuginit(void) print_IO_APIC(int ioapic_idx) { union IO_APIC_reg_00 reg_00; diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 74467fe..ea039d5 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -129,6 +129,10 @@ void native_machine_crash_shutdown(struct pt_regs *regs) lapic_shutdown(); #if defined(CONFIG_X86_IO_APIC) + /* + * Prevent crash_kexec() from deadlocking of ioapic_lock. + */ + ioapic_lock_init(); disable_IO_APIC(); #endif #ifdef CONFIG_HPET_TIMER ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-19 8:12 [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock Yoshihiro YUNOMAE @ 2013-08-19 9:46 ` Ingo Molnar 2013-08-20 0:06 ` Yoshihiro YUNOMAE 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2013-08-19 9:46 UTC (permalink / raw) To: Yoshihiro YUNOMAE Cc: linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, Eric W. Biederman, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton * Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> wrote: > Prevent crash_kexec() from deadlocking of ioapic_lock. s/of/on > When crash_kexec() is executed on a cpu, the cpu will get > ioapic_lock in disable_IO_APIC(). So if the cpu gets NMI > while locking ioapic_lock, a deadlock wiil happen. In s/will > this patch, ioapic_lock is initialized before > disable_IO_APIC(). > > To confirm this deadlocking, you'll set up as follows: s/deadlocking/deadlock > 1. Add mdelay(1000) after raw_spin_lock_irqsave() in > native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c > > Although the deadlocking can occur without this modification, it will > increase the potential of the deadlocking problem. s/deadlocking/deadlock > > 2. Build and install the kernel > > 3. Set up the OS which will run panic() and kexec when NMI is injected > # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf > # vim /etc/default/grub > add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line > # grub2-mkconfig > > 4. Reboot the OS > > 5. Run following command for each vcpu on the guest > # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done; > By running this command, cpus will get ioapic_lock for setting affinity. > > 6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you > use VM) > After injecting NMI, panic() is called in an nmi-handler context. > Then, kexec will normally run in panic(), but the operation will be stopped > by deadlock of ioapic_lock in crash_kexec()->machine_crash_shutdown()-> s/of/on > native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()-> > clear_IO_APIC_pin()->ioapic_read_entry(). I suppose we could do this patch if it's a common occurance. A few minor details need fixing: > > Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: x86@kernel.org > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Andi Kleen <ak@linux.intel.com> > Cc: Seiji Aguchi <seiji.aguchi@hds.com> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> > Cc: "Eric W. Biederman" <ebiederm@xmission.com> > Cc: Gleb Natapov <gleb@redhat.com> > Cc: Marcelo Tosatti <mtosatti@redhat.com> > Cc: linux-kernel@vger.kernel.org > Cc: stable@vger.kernel.org > --- > arch/x86/include/asm/apic.h | 2 ++ > arch/x86/kernel/apic/io_apic.c | 5 +++++ > arch/x86/kernel/crash.c | 4 ++++ > 3 files changed, 11 insertions(+) > > diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h > index f8119b5..ddb06af 100644 > --- a/arch/x86/include/asm/apic.h > +++ b/arch/x86/include/asm/apic.h > @@ -715,4 +715,6 @@ static inline void exiting_ack_irq(void) > ack_APIC_irq(); > } > > +extern void ioapic_lock_init(void); > + > #endif /* _ASM_X86_APIC_H */ > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c > index 9ed796c..2816c07 100644 > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -1534,6 +1534,11 @@ void intel_ir_io_apic_print_entries(unsigned int apic, > } > } > > +void ioapic_lock_init(void) > +{ > + raw_spin_lock_init(&ioapic_lock); > +} Please name this ioapic_zap_locks() to make clear that this is crash handling related. > + > __apicdebuginit(void) print_IO_APIC(int ioapic_idx) > { > union IO_APIC_reg_00 reg_00; > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c > index 74467fe..ea039d5 100644 > --- a/arch/x86/kernel/crash.c > +++ b/arch/x86/kernel/crash.c > @@ -129,6 +129,10 @@ void native_machine_crash_shutdown(struct pt_regs *regs) > > lapic_shutdown(); > #if defined(CONFIG_X86_IO_APIC) Please enhance this #ifdef while at it. > + /* > + * Prevent crash_kexec() from deadlocking of ioapic_lock. > + */ s/of/on. Also, single-line comment can go /* here */. > + ioapic_lock_init(); > disable_IO_APIC(); > #endif > #ifdef CONFIG_HPET_TIMER > Thanks, Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-19 9:46 ` Ingo Molnar @ 2013-08-20 0:06 ` Yoshihiro YUNOMAE 2013-08-20 10:12 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Yoshihiro YUNOMAE @ 2013-08-20 0:06 UTC (permalink / raw) To: Ingo Molnar Cc: linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, Eric W. Biederman, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Hi Ingo, Thank you for fixing typos! OK, I'll fix them and rename to ioapic_zap_locks(). Thank you again! Yoshihiro YUNOMAE (2013/08/19 18:46), Ingo Molnar wrote: > > * Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> wrote: > >> Prevent crash_kexec() from deadlocking of ioapic_lock. > > s/of/on > >> When crash_kexec() is executed on a cpu, the cpu will get >> ioapic_lock in disable_IO_APIC(). So if the cpu gets NMI >> while locking ioapic_lock, a deadlock wiil happen. In > > s/will > >> this patch, ioapic_lock is initialized before >> disable_IO_APIC(). >> >> To confirm this deadlocking, you'll set up as follows: > > s/deadlocking/deadlock > >> 1. Add mdelay(1000) after raw_spin_lock_irqsave() in >> native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c >> >> Although the deadlocking can occur without this modification, it will >> increase the potential of the deadlocking problem. > > s/deadlocking/deadlock > >> >> 2. Build and install the kernel >> >> 3. Set up the OS which will run panic() and kexec when NMI is injected >> # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf >> # vim /etc/default/grub >> add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line >> # grub2-mkconfig >> >> 4. Reboot the OS >> >> 5. Run following command for each vcpu on the guest >> # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done; >> By running this command, cpus will get ioapic_lock for setting affinity. >> >> 6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you >> use VM) >> After injecting NMI, panic() is called in an nmi-handler context. >> Then, kexec will normally run in panic(), but the operation will be stopped >> by deadlock of ioapic_lock in crash_kexec()->machine_crash_shutdown()-> > > s/of/on > >> native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()-> >> clear_IO_APIC_pin()->ioapic_read_entry(). > > I suppose we could do this patch if it's a common > occurance. > > A few minor details need fixing: > >> >> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: x86@kernel.org >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Andi Kleen <ak@linux.intel.com> >> Cc: Seiji Aguchi <seiji.aguchi@hds.com> >> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> >> Cc: Joerg Roedel <joro@8bytes.org> >> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> >> Cc: "Eric W. Biederman" <ebiederm@xmission.com> >> Cc: Gleb Natapov <gleb@redhat.com> >> Cc: Marcelo Tosatti <mtosatti@redhat.com> >> Cc: linux-kernel@vger.kernel.org >> Cc: stable@vger.kernel.org >> --- >> arch/x86/include/asm/apic.h | 2 ++ >> arch/x86/kernel/apic/io_apic.c | 5 +++++ >> arch/x86/kernel/crash.c | 4 ++++ >> 3 files changed, 11 insertions(+) >> >> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h >> index f8119b5..ddb06af 100644 >> --- a/arch/x86/include/asm/apic.h >> +++ b/arch/x86/include/asm/apic.h >> @@ -715,4 +715,6 @@ static inline void exiting_ack_irq(void) >> ack_APIC_irq(); >> } >> >> +extern void ioapic_lock_init(void); >> + >> #endif /* _ASM_X86_APIC_H */ >> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >> index 9ed796c..2816c07 100644 >> --- a/arch/x86/kernel/apic/io_apic.c >> +++ b/arch/x86/kernel/apic/io_apic.c >> @@ -1534,6 +1534,11 @@ void intel_ir_io_apic_print_entries(unsigned int apic, >> } >> } >> >> +void ioapic_lock_init(void) >> +{ >> + raw_spin_lock_init(&ioapic_lock); >> +} > > Please name this ioapic_zap_locks() to make clear that this > is crash handling related. > >> + >> __apicdebuginit(void) print_IO_APIC(int ioapic_idx) >> { >> union IO_APIC_reg_00 reg_00; >> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c >> index 74467fe..ea039d5 100644 >> --- a/arch/x86/kernel/crash.c >> +++ b/arch/x86/kernel/crash.c >> @@ -129,6 +129,10 @@ void native_machine_crash_shutdown(struct pt_regs *regs) >> >> lapic_shutdown(); >> #if defined(CONFIG_X86_IO_APIC) > > Please enhance this #ifdef while at it. > >> + /* >> + * Prevent crash_kexec() from deadlocking of ioapic_lock. >> + */ > > s/of/on. > > Also, single-line comment can go /* here */. > >> + ioapic_lock_init(); >> disable_IO_APIC(); >> #endif >> #ifdef CONFIG_HPET_TIMER >> > > Thanks, > > Ingo > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae.ez@hitachi.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-20 0:06 ` Yoshihiro YUNOMAE @ 2013-08-20 10:12 ` Eric W. Biederman 2013-08-20 14:27 ` Don Zickus 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2013-08-20 10:12 UTC (permalink / raw) To: Yoshihiro YUNOMAE Cc: Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> writes: > Hi Ingo, > > Thank you for fixing typos! > OK, I'll fix them and rename to ioapic_zap_locks(). > > Thank you again! The better fix for this would be to remove the disable_IO_APIC call from crash_kexec. I know last time it was investigated the kernel was very close to working without needing that, and the code will be much more robust in the long term if we can avoid disabling them in the crashing kernel. Yoshihiro is there any chance you can look into removing the disable_IO_APIC entirely? The apic disablement and the disable_IO_APIC exists entirely due to limitations in the kernel boot path. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-20 10:12 ` Eric W. Biederman @ 2013-08-20 14:27 ` Don Zickus 2013-08-22 8:38 ` Yoshihiro YUNOMAE 0 siblings, 1 reply; 15+ messages in thread From: Don Zickus @ 2013-08-20 14:27 UTC (permalink / raw) To: Eric W. Biederman Cc: Yoshihiro YUNOMAE, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton On Tue, Aug 20, 2013 at 03:12:32AM -0700, Eric W. Biederman wrote: > Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> writes: > > > Hi Ingo, > > > > Thank you for fixing typos! > > OK, I'll fix them and rename to ioapic_zap_locks(). > > > > Thank you again! > > > The better fix for this would be to remove the disable_IO_APIC call from > crash_kexec. > > I know last time it was investigated the kernel was very close to > working without needing that, and the code will be much more robust in > the long term if we can avoid disabling them in the crashing kernel. > > Yoshihiro is there any chance you can look into removing the > disable_IO_APIC entirely? > > The apic disablement and the disable_IO_APIC exists entirely due to > limitations in the kernel boot path. Yup. We went down this path a year ago: https://lkml.org/lkml/2012/2/2/331 Then we got sidetracked and talked about removing the lapic stuff at shutdown too: http://lists.infradead.org/pipermail/kexec/2012-February/006017.html (sorry couldn't find lkml link for some reason) And the second patch was committed. However, it was quickly reverted when Yinghai Lu noticed a problem: https://lkml.org/lkml/2012/2/11/143 The problem stemmed from the fact that the nmi_watchdog caused an NMI in the middle of transitioning between the two kernels (we didn't shutdown the lapic) and caused a reset (there is no NMI handler in purgatory). I think I dropped the ball in investigating how to write an idt for the purgatory code to handle spurious NMIs. Regardless of all that, I think if we stick to just removing the ioapic shutdown code (ie the first patch linked above), we should be ok. I believe my testing went smoothly. It was the lapic stuff that needed more tweaking. So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep the code simpler. Cheers, Don ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-20 14:27 ` Don Zickus @ 2013-08-22 8:38 ` Yoshihiro YUNOMAE 2013-08-22 13:11 ` Don Zickus 0 siblings, 1 reply; 15+ messages in thread From: Yoshihiro YUNOMAE @ 2013-08-22 8:38 UTC (permalink / raw) To: Don Zickus, Eric W. Biederman Cc: Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton (2013/08/20 23:27), Don Zickus wrote: > On Tue, Aug 20, 2013 at 03:12:32AM -0700, Eric W. Biederman wrote: >> Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> writes: >> >>> Hi Ingo, >>> >>> Thank you for fixing typos! >>> OK, I'll fix them and rename to ioapic_zap_locks(). >>> >>> Thank you again! >> >> >> The better fix for this would be to remove the disable_IO_APIC call from >> crash_kexec. >> >> I know last time it was investigated the kernel was very close to >> working without needing that, and the code will be much more robust in >> the long term if we can avoid disabling them in the crashing kernel. >> >> Yoshihiro is there any chance you can look into removing the >> disable_IO_APIC entirely? >> >> The apic disablement and the disable_IO_APIC exists entirely due to >> limitations in the kernel boot path. > > Yup. We went down this path a year ago: > > https://lkml.org/lkml/2012/2/2/331 > > Then we got sidetracked and talked about removing the lapic stuff at > shutdown too: > > http://lists.infradead.org/pipermail/kexec/2012-February/006017.html > (sorry couldn't find lkml link for some reason) > > And the second patch was committed. > > However, it was quickly reverted when Yinghai Lu noticed a problem: > > https://lkml.org/lkml/2012/2/11/143 > > The problem stemmed from the fact that the nmi_watchdog caused an NMI in > the middle of transitioning between the two kernels (we didn't shutdown > the lapic) and caused a reset (there is no NMI handler in purgatory). > > I think I dropped the ball in investigating how to write an idt for the > purgatory code to handle spurious NMIs. > > Regardless of all that, I think if we stick to just removing the ioapic > shutdown code (ie the first patch linked above), we should be ok. I > believe my testing went smoothly. It was the lapic stuff that needed more > tweaking. > > So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep > the code simpler. Thank you for commenting about my patch. I didn't know you already have submitted the patches for this deadlock problem. I can't answer definitively right now that no problems are induced by removing disable_IO_APIC(). However, my patch should be work well (and has already been merged to -tip tree). So how about taking my patch at first, and then discussing the removal of disabled_IO_APIC()? Thanks, Yoshihiro YUNOMAE -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae.ez@hitachi.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-22 8:38 ` Yoshihiro YUNOMAE @ 2013-08-22 13:11 ` Don Zickus 2013-08-27 3:41 ` Yoshihiro YUNOMAE 0 siblings, 1 reply; 15+ messages in thread From: Don Zickus @ 2013-08-22 13:11 UTC (permalink / raw) To: Yoshihiro YUNOMAE Cc: Eric W. Biederman, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton On Thu, Aug 22, 2013 at 05:38:07PM +0900, Yoshihiro YUNOMAE wrote: > >So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep > >the code simpler. > > Thank you for commenting about my patch. > I didn't know you already have submitted the patches for this deadlock > problem. > > I can't answer definitively right now that no problems are induced by > removing disable_IO_APIC(). However, my patch should be work well (and > has already been merged to -tip tree). So how about taking my patch at > first, and then discussing the removal of disabled_IO_APIC()? It doesn't matter to me. My orignal patch last year was similar to yours until it was suggested that we were working around a problem which was we shouldn't touch the IO_APIC code on panic. Then I wrote the removal of disable_IO_APIC patch and did lots of testing on it. I don't think I have seen any issues with it (just the removal of disabling the lapic stuff). Regardless, your patch fixes a similar problem we saw on RHEL, so I am happy either way. The removal of the disable_IO_APIC() just makes the code look cleaner. Cheers, Don ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-22 13:11 ` Don Zickus @ 2013-08-27 3:41 ` Yoshihiro YUNOMAE 2013-08-27 13:33 ` Don Zickus 0 siblings, 1 reply; 15+ messages in thread From: Yoshihiro YUNOMAE @ 2013-08-27 3:41 UTC (permalink / raw) To: Don Zickus Cc: Eric W. Biederman, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Hi Don, Sorry for the late reply. (2013/08/22 22:11), Don Zickus wrote: > On Thu, Aug 22, 2013 at 05:38:07PM +0900, Yoshihiro YUNOMAE wrote: >>> So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep >>> the code simpler. >> >> Thank you for commenting about my patch. >> I didn't know you already have submitted the patches for this deadlock >> problem. >> >> I can't answer definitively right now that no problems are induced by >> removing disable_IO_APIC(). However, my patch should be work well (and >> has already been merged to -tip tree). So how about taking my patch at >> first, and then discussing the removal of disabled_IO_APIC()? > > It doesn't matter to me. My orignal patch last year was similar to yours > until it was suggested that we were working around a problem which was we > shouldn't touch the IO_APIC code on panic. Then I wrote the removal of > disable_IO_APIC patch and did lots of testing on it. I don't think I have > seen any issues with it (just the removal of disabling the lapic stuff). Yes, you really did a lot of testing about this problem according to your patch(https://lkml.org/lkml/2012/1/31/391). Although you said jiffies calibration code does not need the PIT in http://lists.infradead.org/pipermail/kexec/2012-February/006017.html, I don't understand yet why we can remove disable_IO_APIC. Would you please explain about the calibration codes? By the way, can we remove disable_IO_APIC even if an old dump capture kernel is used? Thanks, Yoshihiro YUNOMAE > Regardless, your patch fixes a similar problem we saw on RHEL, so I am > happy either way. The removal of the disable_IO_APIC() just makes the > code look cleaner. > > Cheers, > Don > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae.ez@hitachi.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-27 3:41 ` Yoshihiro YUNOMAE @ 2013-08-27 13:33 ` Don Zickus 2013-08-31 0:58 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Don Zickus @ 2013-08-27 13:33 UTC (permalink / raw) To: Yoshihiro YUNOMAE Cc: Eric W. Biederman, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton On Tue, Aug 27, 2013 at 12:41:51PM +0900, Yoshihiro YUNOMAE wrote: > Hi Don, > > Sorry for the late reply. > > (2013/08/22 22:11), Don Zickus wrote: > >On Thu, Aug 22, 2013 at 05:38:07PM +0900, Yoshihiro YUNOMAE wrote: > >>>So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep > >>>the code simpler. > >> > >>Thank you for commenting about my patch. > >>I didn't know you already have submitted the patches for this deadlock > >>problem. > >> > >>I can't answer definitively right now that no problems are induced by > >>removing disable_IO_APIC(). However, my patch should be work well (and > >>has already been merged to -tip tree). So how about taking my patch at > >>first, and then discussing the removal of disabled_IO_APIC()? > > > >It doesn't matter to me. My orignal patch last year was similar to yours > >until it was suggested that we were working around a problem which was we > >shouldn't touch the IO_APIC code on panic. Then I wrote the removal of > >disable_IO_APIC patch and did lots of testing on it. I don't think I have > >seen any issues with it (just the removal of disabling the lapic stuff). > > Yes, you really did a lot of testing about this problem according to > your patch(https://lkml.org/lkml/2012/1/31/391). Although you > said jiffies calibration code does not need the PIT in > http://lists.infradead.org/pipermail/kexec/2012-February/006017.html, > I don't understand yet why we can remove disable_IO_APIC. > Would you please explain about the calibration codes? I forgot a lot of this, Eric B. might remember more (as he was the one that pointed this out initially). I believe initially the io_apic had to be in a pre-configured state in order to do some early calibration of the timing code. Later on, it was my understanding, that the calibration of various time keeping stuff did not need the io_apic in a correct state. The code might have switched to tsc instead of PIT, I forget. Then again looking at the output of the latest dmesg, it seems the IO APIC is initialized way before the tsc is calibrated. So I am not sure what needed to get done or what interrupts are needed before the IO APIC gets initialized. > > By the way, can we remove disable_IO_APIC even if an old dump capture > kernel is used? Good question. I did a bunch of testing with RHEL-6 too, which is 2.6.32 based. But I think we added some IRR fixes (commit 1e75b31d638), which may or may not have helped in this case. So I don't know when a kernel started worked correctly during init (with the right changes). I believe 2.6.32 had everything. However, at the same time, the memory layout of current kernels has changed and I am not sure if older kernels can read them correctly (or if you just need the latest makedumpfile tool). In other words, an old kernel like 2.6.32 might not work as a kdump kernel for a 3.10 kernel. I don't know. How old is the kernel you are working with? Cheers, Don ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-27 13:33 ` Don Zickus @ 2013-08-31 0:58 ` Eric W. Biederman 2013-09-02 3:09 ` Yoshihiro YUNOMAE 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2013-08-31 0:58 UTC (permalink / raw) To: Don Zickus Cc: Yoshihiro YUNOMAE, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Don Zickus <dzickus@redhat.com> writes: > On Tue, Aug 27, 2013 at 12:41:51PM +0900, Yoshihiro YUNOMAE wrote: >> Hi Don, >> >> Sorry for the late reply. >> >> (2013/08/22 22:11), Don Zickus wrote: >> >On Thu, Aug 22, 2013 at 05:38:07PM +0900, Yoshihiro YUNOMAE wrote: >> >>>So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep >> >>>the code simpler. >> >> >> >>Thank you for commenting about my patch. >> >>I didn't know you already have submitted the patches for this deadlock >> >>problem. >> >> >> >>I can't answer definitively right now that no problems are induced by >> >>removing disable_IO_APIC(). However, my patch should be work well (and >> >>has already been merged to -tip tree). So how about taking my patch at >> >>first, and then discussing the removal of disabled_IO_APIC()? >> > >> >It doesn't matter to me. My orignal patch last year was similar to yours >> >until it was suggested that we were working around a problem which was we >> >shouldn't touch the IO_APIC code on panic. Then I wrote the removal of >> >disable_IO_APIC patch and did lots of testing on it. I don't think I have >> >seen any issues with it (just the removal of disabling the lapic stuff). >> >> Yes, you really did a lot of testing about this problem according to >> your patch(https://lkml.org/lkml/2012/1/31/391). Although you >> said jiffies calibration code does not need the PIT in >> http://lists.infradead.org/pipermail/kexec/2012-February/006017.html, >> I don't understand yet why we can remove disable_IO_APIC. >> Would you please explain about the calibration codes? > > I forgot a lot of this, Eric B. might remember more (as he was the one that > pointed this out initially). I believe initially the io_apic had to be in > a pre-configured state in order to do some early calibration of the timing > code. Later on, it was my understanding, that the calibration of various > time keeping stuff did not need the io_apic in a correct state. The code > might have switched to tsc instead of PIT, I forget. Yes. Alan Coxe's initial SMP port had a few cases where it still exepected the system to be in PIT mode during boot and it took us a decade or so before those assumptions were finally expunged. > Then again looking at the output of the latest dmesg, it seems the IO APIC > is initialized way before the tsc is calibrated. So I am not sure what > needed to get done or what interrupts are needed before the IO APIC gets > initialized. The practical issue is that jiffies was calibrated off of the PIT timer if I recall. But that is all old news. >> By the way, can we remove disable_IO_APIC even if an old dump capture >> kernel is used? > > Good question. I did a bunch of testing with RHEL-6 too, which is 2.6.32 > based. But I think we added some IRR fixes (commit 1e75b31d638), which > may or may not have helped in this case. So I don't know when a kernel > started worked correctly during init (with the right changes). I believe > 2.6.32 had everything. A sufficient old and buggy dump capture kernel will fail because of bugs in it's startup path, but I don't think anyone cares. The kernel startup path has been fixed for years, and disable_IO_APIC in crash_kexec has always been a bug work-around for deficiencies in the kernel's start up path (not part of the guaranteed interface). Furthermore every real system configuration I have encountered used the same kernel version for the crashdump kernel and the production kernel. So we should be good. > However, at the same time, the memory layout of current kernels has > changed and I am not sure if older kernels can read them correctly (or if > you just need the latest makedumpfile tool). In other words, an old > kernel like 2.6.32 might not work as a kdump kernel for a 3.10 kernel. I > don't know. Memory layout should not be an issue at all. The details are passed from one kernel to another in a set of ELF headers. So if the crash dump kernel can run in the memory reserved for it, all should work well. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-08-31 0:58 ` Eric W. Biederman @ 2013-09-02 3:09 ` Yoshihiro YUNOMAE 2013-09-03 0:12 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Yoshihiro YUNOMAE @ 2013-09-02 3:09 UTC (permalink / raw) To: Eric W. Biederman, Don Zickus Cc: Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Hi Eric and Don, Sorry for the late reply. (2013/08/31 9:58), Eric W. Biederman wrote: > Don Zickus <dzickus@redhat.com> writes: > >> On Tue, Aug 27, 2013 at 12:41:51PM +0900, Yoshihiro YUNOMAE wrote: >>> Hi Don, >>> >>> Sorry for the late reply. >>> >>> (2013/08/22 22:11), Don Zickus wrote: >>>> On Thu, Aug 22, 2013 at 05:38:07PM +0900, Yoshihiro YUNOMAE wrote: >>>>>> So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep >>>>>> the code simpler. >>>>> >>>>> Thank you for commenting about my patch. >>>>> I didn't know you already have submitted the patches for this deadlock >>>>> problem. >>>>> >>>>> I can't answer definitively right now that no problems are induced by >>>>> removing disable_IO_APIC(). However, my patch should be work well (and >>>>> has already been merged to -tip tree). So how about taking my patch at >>>>> first, and then discussing the removal of disabled_IO_APIC()? >>>> >>>> It doesn't matter to me. My orignal patch last year was similar to yours >>>> until it was suggested that we were working around a problem which was we >>>> shouldn't touch the IO_APIC code on panic. Then I wrote the removal of >>>> disable_IO_APIC patch and did lots of testing on it. I don't think I have >>>> seen any issues with it (just the removal of disabling the lapic stuff). >>> >>> Yes, you really did a lot of testing about this problem according to >>> your patch(https://lkml.org/lkml/2012/1/31/391). Although you >>> said jiffies calibration code does not need the PIT in >>> http://lists.infradead.org/pipermail/kexec/2012-February/006017.html, >>> I don't understand yet why we can remove disable_IO_APIC. >>> Would you please explain about the calibration codes? >> >> I forgot a lot of this, Eric B. might remember more (as he was the one that >> pointed this out initially). I believe initially the io_apic had to be in >> a pre-configured state in order to do some early calibration of the timing >> code. Later on, it was my understanding, that the calibration of various >> time keeping stuff did not need the io_apic in a correct state. The code >> might have switched to tsc instead of PIT, I forget. > > Yes. Alan Coxe's initial SMP port had a few cases where it still > exepected the system to be in PIT mode during boot and it took us a > decade or so before those assumptions were finally expunged. Would you please tell me the commit ID or the hint like files, functions, or when? >> Then again looking at the output of the latest dmesg, it seems the IO APIC >> is initialized way before the tsc is calibrated. So I am not sure what >> needed to get done or what interrupts are needed before the IO APIC gets >> initialized. > > The practical issue is that jiffies was calibrated off of the PIT timer > if I recall. But that is all old news. Are the jiffies calibration codes calibrate_delay()? It seems that the jiffies calibration have not used PIT in 2005 according to 8a9e1b0. >>> By the way, can we remove disable_IO_APIC even if an old dump capture >>> kernel is used? >> >> Good question. I did a bunch of testing with RHEL-6 too, which is 2.6.32 >> based. But I think we added some IRR fixes (commit 1e75b31d638), which >> may or may not have helped in this case. So I don't know when a kernel >> started worked correctly during init (with the right changes). I believe >> 2.6.32 had everything. > > A sufficient old and buggy dump capture kernel will fail because of bugs > in it's startup path, but I don't think anyone cares. OK, if the jiffies calibration problem has been fixed in the old days, we don't need to care for the old kernel. > The kernel startup path has been fixed for years, and disable_IO_APIC in > crash_kexec has always been a bug work-around for deficiencies in the > kernel's start up path (not part of the guaranteed interface). > Furthermore every real system configuration I have encountered used the > same kernel version for the crashdump kernel and the production kernel. > So we should be good. We also will be use the kdump(crashdump) kernel as the production kernel. Should I only care for the current kernel? Thanks, Yoshihiro YUNOMAE -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae.ez@hitachi.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-09-02 3:09 ` Yoshihiro YUNOMAE @ 2013-09-03 0:12 ` Eric W. Biederman 2013-09-03 11:02 ` Yoshihiro YUNOMAE 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2013-09-03 0:12 UTC (permalink / raw) To: Yoshihiro YUNOMAE Cc: Don Zickus, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> writes: > Hi Eric and Don, > > Sorry for the late reply. > > (2013/08/31 9:58), Eric W. Biederman wrote: >> Don Zickus <dzickus@redhat.com> writes: >> >>> On Tue, Aug 27, 2013 at 12:41:51PM +0900, Yoshihiro YUNOMAE wrote: >>>> Hi Don, >>>> >>>> Sorry for the late reply. >>>> >>>> (2013/08/22 22:11), Don Zickus wrote: >>>>> On Thu, Aug 22, 2013 at 05:38:07PM +0900, Yoshihiro YUNOMAE wrote: >>>>>>> So, I agree with Eric, let's remove the disable_IO_APIC() stuff and keep >>>>>>> the code simpler. >>>>>> >>>>>> Thank you for commenting about my patch. >>>>>> I didn't know you already have submitted the patches for this deadlock >>>>>> problem. >>>>>> >>>>>> I can't answer definitively right now that no problems are induced by >>>>>> removing disable_IO_APIC(). However, my patch should be work well (and >>>>>> has already been merged to -tip tree). So how about taking my patch at >>>>>> first, and then discussing the removal of disabled_IO_APIC()? >>>>> >>>>> It doesn't matter to me. My orignal patch last year was similar to yours >>>>> until it was suggested that we were working around a problem which was we >>>>> shouldn't touch the IO_APIC code on panic. Then I wrote the removal of >>>>> disable_IO_APIC patch and did lots of testing on it. I don't think I have >>>>> seen any issues with it (just the removal of disabling the lapic stuff). >>>> >>>> Yes, you really did a lot of testing about this problem according to >>>> your patch(https://lkml.org/lkml/2012/1/31/391). Although you >>>> said jiffies calibration code does not need the PIT in >>>> http://lists.infradead.org/pipermail/kexec/2012-February/006017.html, >>>> I don't understand yet why we can remove disable_IO_APIC. >>>> Would you please explain about the calibration codes? >>> >>> I forgot a lot of this, Eric B. might remember more (as he was the one that >>> pointed this out initially). I believe initially the io_apic had to be in >>> a pre-configured state in order to do some early calibration of the timing >>> code. Later on, it was my understanding, that the calibration of various >>> time keeping stuff did not need the io_apic in a correct state. The code >>> might have switched to tsc instead of PIT, I forget. >> >> Yes. Alan Coxe's initial SMP port had a few cases where it still >> exepected the system to be in PIT mode during boot and it took us a >> decade or so before those assumptions were finally expunged. > > Would you please tell me the commit ID or the hint like files, > functions, or when? The short version is last time we tilted at this windmill the only problem we could find was nmi's caused by the nmi watchdog. So as a bug work-around all we need to retain is disabling the nmi watchdog in crash-kexec. >>> Then again looking at the output of the latest dmesg, it seems the IO APIC >>> is initialized way before the tsc is calibrated. So I am not sure what >>> needed to get done or what interrupts are needed before the IO APIC gets >>> initialized. >> >> The practical issue is that jiffies was calibrated off of the PIT timer >> if I recall. But that is all old news. > > Are the jiffies calibration codes calibrate_delay()? > It seems that the jiffies calibration have not used PIT in 2005 > according to 8a9e1b0. Exactly. That was the original reason why we put in the code to disable the IOAPIC and the local apic. There might have been other reasons but that was the primary. >>>> By the way, can we remove disable_IO_APIC even if an old dump capture >>>> kernel is used? >>> >>> Good question. I did a bunch of testing with RHEL-6 too, which is 2.6.32 >>> based. But I think we added some IRR fixes (commit 1e75b31d638), which >>> may or may not have helped in this case. So I don't know when a kernel >>> started worked correctly during init (with the right changes). I believe >>> 2.6.32 had everything. >> >> A sufficient old and buggy dump capture kernel will fail because of bugs >> in it's startup path, but I don't think anyone cares. > > OK, if the jiffies calibration problem has been fixed in the old days, > we don't need to care for the old kernel. Exactly. There may have been one or two other silly assumptions and to the best of our knowledge all of those have been purged except the assumption that an NMI watchdog won't happen between kernels and while booting the kernel. >> The kernel startup path has been fixed for years, and disable_IO_APIC in >> crash_kexec has always been a bug work-around for deficiencies in the >> kernel's start up path (not part of the guaranteed interface). >> Furthermore every real system configuration I have encountered used the >> same kernel version for the crashdump kernel and the production kernel. >> So we should be good. > > We also will be use the kdump(crashdump) kernel as the production > kernel. Should I only care for the current kernel? For this particular issue yes. In general it is important for there to be a stable interface between the two kernels just so you are not required to use the same kernel version, and so there is the possibility of using something besides a linux kernel. At the same time it has always been the targets kernel's responsibility to sort out the hardware devices unless it can't possibily do it. And apics for the longest time were very very hard to reset in the target kernel, but now that they are not. It makes sense for time permitting to remove the now unnecessary code in the crashing kernel. Because ultimately the less code we have the fewer possible ways we can fail in a known broken kernel. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-09-03 0:12 ` Eric W. Biederman @ 2013-09-03 11:02 ` Yoshihiro YUNOMAE 2013-09-03 12:44 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Yoshihiro YUNOMAE @ 2013-09-03 11:02 UTC (permalink / raw) To: Eric W. Biederman Cc: Don Zickus, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton (2013/09/03 9:12), Eric W. Biederman wrote: >>>> Then again looking at the output of the latest dmesg, it seems the IO APIC >>>> is initialized way before the tsc is calibrated. So I am not sure what >>>> needed to get done or what interrupts are needed before the IO APIC gets >>>> initialized. >>> >>> The practical issue is that jiffies was calibrated off of the PIT timer >>> if I recall. But that is all old news. >> >> Are the jiffies calibration codes calibrate_delay()? >> It seems that the jiffies calibration have not used PIT in 2005 >> according to 8a9e1b0. > > Exactly. That was the original reason why we put in the code to > disable the IOAPIC and the local apic. There might have been other > reasons but that was the primary. Thanks, but I have still a question for jiffies calibration. When kernel boots, calibrate_delay_direct() will be called in calibrate_delay() for calculating loops_per_jiffy. Then, calibrate_delay_direct() waits until jiffies is incremented. I think this means PIT or HPET is still used for the calibration. Is there something wrong with my understanding? If wrong, how is jiffies incremented? >>> The kernel startup path has been fixed for years, and disable_IO_APIC in >>> crash_kexec has always been a bug work-around for deficiencies in the >>> kernel's start up path (not part of the guaranteed interface). >>> Furthermore every real system configuration I have encountered used the >>> same kernel version for the crashdump kernel and the production kernel. >>> So we should be good. >> >> We also will be use the kdump(crashdump) kernel as the production >> kernel. Should I only care for the current kernel? > > For this particular issue yes. > > In general it is important for there to be a stable interface between > the two kernels just so you are not required to use the same kernel > version, and so there is the possibility of using something besides a > linux kernel. OK. In order to judge whether a kernel version as crashdump kernel is usable or not, I want to understand why we can remove disable_IO_APIC in detail. > At the same time it has always been the targets kernel's responsibility > to sort out the hardware devices unless it can't possibily do it. And > apics for the longest time were very very hard to reset in the target > kernel, but now that they are not. It makes sense for time permitting > to remove the now unnecessary code in the crashing kernel. Because > ultimately the less code we have the fewer possible ways we can fail > in a known broken kernel. Yes, I agree with you. Thanks, Yoshihiro YUNOMAE -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae.ez@hitachi.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-09-03 11:02 ` Yoshihiro YUNOMAE @ 2013-09-03 12:44 ` Eric W. Biederman 2013-09-04 9:40 ` Yoshihiro YUNOMAE 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2013-09-03 12:44 UTC (permalink / raw) To: Yoshihiro YUNOMAE Cc: Don Zickus, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> writes: > (2013/09/03 9:12), Eric W. Biederman wrote: >>>>> Then again looking at the output of the latest dmesg, it seems the IO APIC >>>>> is initialized way before the tsc is calibrated. So I am not sure what >>>>> needed to get done or what interrupts are needed before the IO APIC gets >>>>> initialized. >>>> >>>> The practical issue is that jiffies was calibrated off of the PIT timer >>>> if I recall. But that is all old news. >>> >>> Are the jiffies calibration codes calibrate_delay()? >>> It seems that the jiffies calibration have not used PIT in 2005 >>> according to 8a9e1b0. >> >> Exactly. That was the original reason why we put in the code to >> disable the IOAPIC and the local apic. There might have been other >> reasons but that was the primary. > > Thanks, but I have still a question for jiffies calibration. > > When kernel boots, calibrate_delay_direct() will be called in > calibrate_delay() for calculating loops_per_jiffy. Then, > calibrate_delay_direct() waits until jiffies is incremented. > I think this means PIT or HPET is still used for the calibration. > Is there something wrong with my understanding? > If wrong, how is jiffies incremented? Things have definitely changed, and I believe part of what you are seeing is the path when things are not calibrated by an arch specific means. Ulimately the issue was not that we waited (or possibly still wait) for a timer interrupt to calibrate the delay loop. The problem was that we had initialized the interrupt controller in PIC mode (when the kernel did not later use the interrupt controller in PIC mode) to receive the interrupt. The actual impetus for getting the last of the bugs shaken out is that we have subarchitectures on x86 that do no support interrupt controllers in PIC mode at all. Recently the code seems to get reorganized every other year and I loose track of the details of which piece of code is doing what, but the general gist remains. The key thing to look for is that we initialize the interrupt controllers in apic mode before we receive interrupts. That is really the only thing that matters, and for a long time the calibration of the delay loop was the one notable exception. > OK. In order to judge whether a kernel version as crashdump kernel is > usable or not, I want to understand why we can remove disable_IO_APIC > in detail. That sounds like responsible engineering. >> At the same time it has always been the targets kernel's responsibility >> to sort out the hardware devices unless it can't possibily do it. And >> apics for the longest time were very very hard to reset in the target >> kernel, but now that they are not. It makes sense for time permitting >> to remove the now unnecessary code in the crashing kernel. Because >> ultimately the less code we have the fewer possible ways we can fail >> in a known broken kernel. > > Yes, I agree with you. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Re: [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock 2013-09-03 12:44 ` Eric W. Biederman @ 2013-09-04 9:40 ` Yoshihiro YUNOMAE 0 siblings, 0 replies; 15+ messages in thread From: Yoshihiro YUNOMAE @ 2013-09-04 9:40 UTC (permalink / raw) To: Eric W. Biederman Cc: Don Zickus, Ingo Molnar, linux-kernel, Andi Kleen, H. Peter Anvin, Gleb Natapov, Konrad Rzeszutek Wilk, Joerg Roedel, x86, stable, Marcelo Tosatti, Hidehiro Kawai, Sebastian Andrzej Siewior, Ingo Molnar, Zhang Yanfei, yrl.pp-manager.tt, Masami Hiramatsu, Thomas Gleixner, Seiji Aguchi, Andrew Morton (2013/09/03 21:44), Eric W. Biederman wrote: > Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> writes: > >> (2013/09/03 9:12), Eric W. Biederman wrote: >>>>>> Then again looking at the output of the latest dmesg, it seems the IO APIC >>>>>> is initialized way before the tsc is calibrated. So I am not sure what >>>>>> needed to get done or what interrupts are needed before the IO APIC gets >>>>>> initialized. >>>>> >>>>> The practical issue is that jiffies was calibrated off of the PIT timer >>>>> if I recall. But that is all old news. >>>> >>>> Are the jiffies calibration codes calibrate_delay()? >>>> It seems that the jiffies calibration have not used PIT in 2005 >>>> according to 8a9e1b0. >>> >>> Exactly. That was the original reason why we put in the code to >>> disable the IOAPIC and the local apic. There might have been other >>> reasons but that was the primary. >> >> Thanks, but I have still a question for jiffies calibration. >> >> When kernel boots, calibrate_delay_direct() will be called in >> calibrate_delay() for calculating loops_per_jiffy. Then, >> calibrate_delay_direct() waits until jiffies is incremented. >> I think this means PIT or HPET is still used for the calibration. >> Is there something wrong with my understanding? >> If wrong, how is jiffies incremented? > > Things have definitely changed, and I believe part of what you are > seeing is the path when things are not calibrated by an arch specific > means. > > Ulimately the issue was not that we waited (or possibly still wait) for > a timer interrupt to calibrate the delay loop. The problem was that we > had initialized the interrupt controller in PIC mode (when the kernel > did not later use the interrupt controller in PIC mode) to receive the > interrupt. > > The actual impetus for getting the last of the bugs shaken out is that > we have subarchitectures on x86 that do no support interrupt controllers > in PIC mode at all. Is one of the subarchitectures Moorestown? I found the Moorestown patch(05ddafb) which defined pre_init_apic_IRQ0(). If this function will enable IOAPIC before using timer(i.e. the function is called before calibrate_delay()), we will be able to remove disable_IO_APIC. However, this function is a Moorestown specific code, and normal PCs don't execute it. I haven't find the generic cord which operates like pre_init_apic_IRQ0() yet. Please tell me if you remember something. Thanks, Yoshihiro YUNOMAE -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae.ez@hitachi.com ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-09-04 9:41 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-19 8:12 [PATCH] [BUGFIX] crash/ioapic: Prevent crash_kexec() from deadlocking of ioapic_lock Yoshihiro YUNOMAE 2013-08-19 9:46 ` Ingo Molnar 2013-08-20 0:06 ` Yoshihiro YUNOMAE 2013-08-20 10:12 ` Eric W. Biederman 2013-08-20 14:27 ` Don Zickus 2013-08-22 8:38 ` Yoshihiro YUNOMAE 2013-08-22 13:11 ` Don Zickus 2013-08-27 3:41 ` Yoshihiro YUNOMAE 2013-08-27 13:33 ` Don Zickus 2013-08-31 0:58 ` Eric W. Biederman 2013-09-02 3:09 ` Yoshihiro YUNOMAE 2013-09-03 0:12 ` Eric W. Biederman 2013-09-03 11:02 ` Yoshihiro YUNOMAE 2013-09-03 12:44 ` Eric W. Biederman 2013-09-04 9:40 ` Yoshihiro YUNOMAE
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).