kexec.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] kexec: Fix kdump failure with notsc
@ 2016-07-07 10:17 Wei Jiangang
  2016-07-07 17:55 ` Eric W. Biederman
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Wei Jiangang @ 2016-07-07 10:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: fenghua.yu, x86, kexec, mingo, ebiederm, hpa, tglx, Wei Jiangang

If we specify the 'notsc' boot parameter for the dump-capture kernel,
and then trigger a crash(panic) by using "ALT-SysRq-c" or "echo c >
/proc/sysrq-trigger",
the dump-capture kernel will hang in calibrate_delay_converge():

    /* wait for "start of" clock tick */
    ticks = jiffies;
    while (ticks == jiffies)
        ; /* nothing */

serial log of the hang is as follows:

    tsc: Fast TSC calibration using PIT
    tsc: Detected 2099.947 MHz processor
    Calibrating delay loop...

The reason is that the dump-capture kernel hangs in while loops and
waits for jiffies to be updated, but no timer interrupts is passed
to BSP by APIC.

In fact, the local APIC was disabled in reboot and crash path by
lapic_shutdown(). We need to put APIC in legacy mode in kexec jump path
(put the system into PIT during the crash kernel),
so that the dump-capture kernel can get timer interrupts.

BTW,
I found the buggy commit 522e66464467 ("x86/apic: Disable I/O APIC
before shutdown of the local APIC") via bisection.

Originally, I want to revert it.
But Ingo Molnar comments that "By reverting the change can paper over
the bug, but re-introduce the bug that can result in certain CPUs hanging
if IO-APIC sends an APIC message if the lapic is disabled prematurely"
And I think it's pertinent.

Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
---
 arch/x86/include/asm/apic.h        | 5 +++++
 arch/x86/kernel/apic/apic.c        | 9 +++++++++
 arch/x86/kernel/machine_kexec_32.c | 5 ++---
 arch/x86/kernel/machine_kexec_64.c | 6 +++---
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index bc27611fa58f..5d7e635e580a 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -128,6 +128,7 @@ extern void clear_local_APIC(void);
 extern void disconnect_bsp_APIC(int virt_wire_setup);
 extern void disable_local_APIC(void);
 extern void lapic_shutdown(void);
+extern int lapic_disabled(void);
 extern void sync_Arb_IDs(void);
 extern void init_bsp_APIC(void);
 extern void setup_local_APIC(void);
@@ -165,6 +166,10 @@ extern int setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask);
 
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
+static inline int lapic_disabled(void)
+{
+	return 0;
+}
 #define local_apic_timer_c2_ok		1
 static inline void init_apic_mappings(void) { }
 static inline void disable_local_APIC(void) { }
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 60078a67d7e3..d1df250994bb 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -133,6 +133,9 @@ static inline void imcr_apic_to_pic(void)
 }
 #endif
 
+/* Local APIC is disabled by the kernel for crash or reboot path */
+static int disabled_local_apic;
+
 /*
  * Knob to control our willingness to enable the local APIC.
  *
@@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
 #endif
 		disable_local_APIC();
 
+	disabled_local_apic = 1;
 
 	local_irq_restore(flags);
 }
 
+int lapic_disabled(void)
+{
+	return disabled_local_apic;
+}
+
 /**
  * sync_Arb_IDs - synchronize APIC bus arbitration IDs
  */
diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
index 469b23d6acc2..c934a7868e6b 100644
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
 	local_irq_disable();
 	hw_breakpoint_disable();
 
-	if (image->preserve_context) {
+	if (image->preserve_context || lapic_disabled()) {
 #ifdef CONFIG_X86_IO_APIC
 		/*
 		 * We need to put APICs in legacy mode so that we can
 		 * get timer interrupts in second kernel. kexec/kdump
 		 * paths already have calls to disable_IO_APIC() in
-		 * one form or other. kexec jump path also need
-		 * one.
+		 * one form or other. kexec jump path also need one.
 		 */
 		disable_IO_APIC();
 #endif
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 5a294e48b185..d3598cdd6437 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -23,6 +23,7 @@
 #include <asm/pgtable.h>
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
+#include <asm/apic.h>
 #include <asm/io_apic.h>
 #include <asm/debugreg.h>
 #include <asm/kexec-bzimage64.h>
@@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
 	local_irq_disable();
 	hw_breakpoint_disable();
 
-	if (image->preserve_context) {
+	if (image->preserve_context || lapic_disabled()) {
 #ifdef CONFIG_X86_IO_APIC
 		/*
 		 * We need to put APICs in legacy mode so that we can
 		 * get timer interrupts in second kernel. kexec/kdump
 		 * paths already have calls to disable_IO_APIC() in
-		 * one form or other. kexec jump path also need
-		 * one.
+		 * one form or other. kexec jump path also need one.
 		 */
 		disable_IO_APIC();
 #endif
-- 
1.9.3




_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-07 10:17 [PATCH v2] kexec: Fix kdump failure with notsc Wei Jiangang
@ 2016-07-07 17:55 ` Eric W. Biederman
  2016-07-08  4:48   ` Wei, Jiangang
  2016-07-08  7:38   ` Ingo Molnar
  2016-07-08  8:21 ` Nikolay Borisov
  2016-07-12  6:52 ` Xunlei Pang
  2 siblings, 2 replies; 13+ messages in thread
From: Eric W. Biederman @ 2016-07-07 17:55 UTC (permalink / raw)
  To: Wei Jiangang; +Cc: fenghua.yu, x86, kexec, linux-kernel, mingo, hpa, tglx

Wei Jiangang <weijg.fnst@cn.fujitsu.com> writes:

> If we specify the 'notsc' boot parameter for the dump-capture kernel,
> and then trigger a crash(panic) by using "ALT-SysRq-c" or "echo c >
> /proc/sysrq-trigger",
> the dump-capture kernel will hang in calibrate_delay_converge():
>
>     /* wait for "start of" clock tick */
>     ticks = jiffies;
>     while (ticks == jiffies)
>         ; /* nothing */
>
> serial log of the hang is as follows:
>
>     tsc: Fast TSC calibration using PIT
>     tsc: Detected 2099.947 MHz processor
>     Calibrating delay loop...
>
> The reason is that the dump-capture kernel hangs in while loops and
> waits for jiffies to be updated, but no timer interrupts is passed
> to BSP by APIC.
>
> In fact, the local APIC was disabled in reboot and crash path by
> lapic_shutdown(). We need to put APIC in legacy mode in kexec jump path
> (put the system into PIT during the crash kernel),
> so that the dump-capture kernel can get timer interrupts.
>
> BTW,
> I found the buggy commit 522e66464467 ("x86/apic: Disable I/O APIC
> before shutdown of the local APIC") via bisection.
>
> Originally, I want to revert it.
> But Ingo Molnar comments that "By reverting the change can paper over
> the bug, but re-introduce the bug that can result in certain CPUs hanging
> if IO-APIC sends an APIC message if the lapic is disabled prematurely"
> And I think it's pertinent.

Sigh.  Can we please just do the work to rip out the apic shutdown code
from the kexec on panic code path?

I forgetting details but the only reason we have do any apic shutdown
is bugs in older kernels that could not initialize a system properly
if we did not shut down the apics.

I certainly don't see an issue with goofy cases like notsc not working
on a crash capture kernel if we are not initializing the hardware
properly.

The strategy really needs to be to only do the absolutely essential
hardware shutdown in the crashing kernel, every adintional line of code
we execute in the crashing kernel increases our chances of hitting a
bug.

Under that policy things like requring we don't pass boot options that
inhibit the dump catpure kernel from initializing the hardware from a
random state are reasonable requirements.  AKA I don't see any
justification in this as to why we would even want to support notsc
on the dump capture kernel.  Especially when things clearly work when
that option is not specified.

Eric


> Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
> ---
>  arch/x86/include/asm/apic.h        | 5 +++++
>  arch/x86/kernel/apic/apic.c        | 9 +++++++++
>  arch/x86/kernel/machine_kexec_32.c | 5 ++---
>  arch/x86/kernel/machine_kexec_64.c | 6 +++---
>  4 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index bc27611fa58f..5d7e635e580a 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -128,6 +128,7 @@ extern void clear_local_APIC(void);
>  extern void disconnect_bsp_APIC(int virt_wire_setup);
>  extern void disable_local_APIC(void);
>  extern void lapic_shutdown(void);
> +extern int lapic_disabled(void);
>  extern void sync_Arb_IDs(void);
>  extern void init_bsp_APIC(void);
>  extern void setup_local_APIC(void);
> @@ -165,6 +166,10 @@ extern int setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask);
>  
>  #else /* !CONFIG_X86_LOCAL_APIC */
>  static inline void lapic_shutdown(void) { }
> +static inline int lapic_disabled(void)
> +{
> +	return 0;
> +}
>  #define local_apic_timer_c2_ok		1
>  static inline void init_apic_mappings(void) { }
>  static inline void disable_local_APIC(void) { }
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 60078a67d7e3..d1df250994bb 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -133,6 +133,9 @@ static inline void imcr_apic_to_pic(void)
>  }
>  #endif
>  
> +/* Local APIC is disabled by the kernel for crash or reboot path */
> +static int disabled_local_apic;
> +
>  /*
>   * Knob to control our willingness to enable the local APIC.
>   *
> @@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
>  #endif
>  		disable_local_APIC();
>  
> +	disabled_local_apic = 1;
>  
>  	local_irq_restore(flags);
>  }
>  
> +int lapic_disabled(void)
> +{
> +	return disabled_local_apic;
> +}
> +
>  /**
>   * sync_Arb_IDs - synchronize APIC bus arbitration IDs
>   */
> diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> index 469b23d6acc2..c934a7868e6b 100644
> --- a/arch/x86/kernel/machine_kexec_32.c
> +++ b/arch/x86/kernel/machine_kexec_32.c
> @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
>  	local_irq_disable();
>  	hw_breakpoint_disable();
>  
> -	if (image->preserve_context) {
> +	if (image->preserve_context || lapic_disabled()) {
>  #ifdef CONFIG_X86_IO_APIC
>  		/*
>  		 * We need to put APICs in legacy mode so that we can
>  		 * get timer interrupts in second kernel. kexec/kdump
>  		 * paths already have calls to disable_IO_APIC() in
> -		 * one form or other. kexec jump path also need
> -		 * one.
> +		 * one form or other. kexec jump path also need one.
>  		 */
>  		disable_IO_APIC();
>  #endif
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 5a294e48b185..d3598cdd6437 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -23,6 +23,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  #include <asm/mmu_context.h>
> +#include <asm/apic.h>
>  #include <asm/io_apic.h>
>  #include <asm/debugreg.h>
>  #include <asm/kexec-bzimage64.h>
> @@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
>  	local_irq_disable();
>  	hw_breakpoint_disable();
>  
> -	if (image->preserve_context) {
> +	if (image->preserve_context || lapic_disabled()) {
>  #ifdef CONFIG_X86_IO_APIC
>  		/*
>  		 * We need to put APICs in legacy mode so that we can
>  		 * get timer interrupts in second kernel. kexec/kdump
>  		 * paths already have calls to disable_IO_APIC() in
> -		 * one form or other. kexec jump path also need
> -		 * one.
> +		 * one form or other. kexec jump path also need one.
>  		 */
>  		disable_IO_APIC();
>  #endif

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-07 17:55 ` Eric W. Biederman
@ 2016-07-08  4:48   ` Wei, Jiangang
  2016-07-08  7:38   ` Ingo Molnar
  1 sibling, 0 replies; 13+ messages in thread
From: Wei, Jiangang @ 2016-07-08  4:48 UTC (permalink / raw)
  To: ebiederm@xmission.com
  Cc: fenghua.yu@intel.com, x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, mingo@redhat.com, hpa@zytor.com,
	Izumi, Taku, tglx@linutronix.de

Hi , Eric

Thanks for your comments firstly.

On Thu, 2016-07-07 at 12:55 -0500, Eric W. Biederman wrote:
> Wei Jiangang <weijg.fnst@cn.fujitsu.com> writes:
> 
> > If we specify the 'notsc' boot parameter for the dump-capture kernel,
> > and then trigger a crash(panic) by using "ALT-SysRq-c" or "echo c >
> > /proc/sysrq-trigger",
> > the dump-capture kernel will hang in calibrate_delay_converge():
> >
> >     /* wait for "start of" clock tick */
> >     ticks = jiffies;
> >     while (ticks == jiffies)
> >         ; /* nothing */
> >
> > serial log of the hang is as follows:
> >
> >     tsc: Fast TSC calibration using PIT
> >     tsc: Detected 2099.947 MHz processor
> >     Calibrating delay loop...
> >
> > The reason is that the dump-capture kernel hangs in while loops and
> > waits for jiffies to be updated, but no timer interrupts is passed
> > to BSP by APIC.
> >
> > In fact, the local APIC was disabled in reboot and crash path by
> > lapic_shutdown(). We need to put APIC in legacy mode in kexec jump path
> > (put the system into PIT during the crash kernel),
> > so that the dump-capture kernel can get timer interrupts.
> >
> > BTW,
> > I found the buggy commit 522e66464467 ("x86/apic: Disable I/O APIC
> > before shutdown of the local APIC") via bisection.
> >
> > Originally, I want to revert it.
> > But Ingo Molnar comments that "By reverting the change can paper over
> > the bug, but re-introduce the bug that can result in certain CPUs hanging
> > if IO-APIC sends an APIC message if the lapic is disabled prematurely"
> > And I think it's pertinent.
> 
> Sigh.  Can we please just do the work to rip out the apic shutdown code
> from the kexec on panic code path?

Do you mean remove the calls for disable_IO_APIC() and lapic_shutdown()
in native_machine_crash_shutdown()?

If so, I have tried it, but it doesn't work for this problem.
> 
> I forgetting details but the only reason we have do any apic shutdown
> is bugs in older kernels that could not initialize a system properly
> if we did not shut down the apics.
> 
> I certainly don't see an issue with goofy cases like notsc not working
> on a crash capture kernel if we are not initializing the hardware
> properly.
> 
> The strategy really needs to be to only do the absolutely essential
> hardware shutdown in the crashing kernel, every adintional line of code
> we execute in the crashing kernel increases our chances of hitting a
> bug.
> 
> Under that policy things like requring we don't pass boot options that
> inhibit the dump catpure kernel from initializing the hardware from a
> random state are reasonable requirements.  AKA I don't see any
> justification in this as to why we would even want to support notsc
> on the dump capture kernel.  Especially when things clearly work when
> that option is not specified.

firstly do some clarification,

My commit message metioned that "specify the 'notsc' boot parameter for
the dump-capture kernel ....". 
That's just the reproducing method used by myself for this problem.

In fact, If we specify notsc only for the first kernel,  which also can
trigger the bug.


And secondly,

In multiple CPU configurations the TSC values on different processors
may be different, 
which may cause random (bad) results.

for example,
FUJITSU's server (PRIMEQUEST 2000 series) supports Dynamic
Reconfiguration.
http://www.fujitsu.com/global/products/computing/servers/mission-critical/primequest/technology/availability/dynamic-reconfiguration.html

This feature enables to hot-add system board which contains cpus and
memories, this means some cpus can be hot-added to system. 
tsc of hot-added cpus is not consistent with tsc of
existing-from-boot-time cpus. (though hardware and firmware make an
effort to speficy the same tsc value as existing one)

PRIMEQUEST can happen this tsc-inconsistency, we recommend to specify
"notsc" boot option for Dynamic Reconfiguration users.

so we really need to specify 'notsc'.

Regards,
wei

> Eric
> 
> 
> > Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
> > ---
> >  arch/x86/include/asm/apic.h        | 5 +++++
> >  arch/x86/kernel/apic/apic.c        | 9 +++++++++
> >  arch/x86/kernel/machine_kexec_32.c | 5 ++---
> >  arch/x86/kernel/machine_kexec_64.c | 6 +++---
> >  4 files changed, 19 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> > index bc27611fa58f..5d7e635e580a 100644
> > --- a/arch/x86/include/asm/apic.h
> > +++ b/arch/x86/include/asm/apic.h
> > @@ -128,6 +128,7 @@ extern void clear_local_APIC(void);
> >  extern void disconnect_bsp_APIC(int virt_wire_setup);
> >  extern void disable_local_APIC(void);
> >  extern void lapic_shutdown(void);
> > +extern int lapic_disabled(void);
> >  extern void sync_Arb_IDs(void);
> >  extern void init_bsp_APIC(void);
> >  extern void setup_local_APIC(void);
> > @@ -165,6 +166,10 @@ extern int setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask);
> >  
> >  #else /* !CONFIG_X86_LOCAL_APIC */
> >  static inline void lapic_shutdown(void) { }
> > +static inline int lapic_disabled(void)
> > +{
> > +	return 0;
> > +}
> >  #define local_apic_timer_c2_ok		1
> >  static inline void init_apic_mappings(void) { }
> >  static inline void disable_local_APIC(void) { }
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index 60078a67d7e3..d1df250994bb 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -133,6 +133,9 @@ static inline void imcr_apic_to_pic(void)
> >  }
> >  #endif
> >  
> > +/* Local APIC is disabled by the kernel for crash or reboot path */
> > +static int disabled_local_apic;
> > +
> >  /*
> >   * Knob to control our willingness to enable the local APIC.
> >   *
> > @@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
> >  #endif
> >  		disable_local_APIC();
> >  
> > +	disabled_local_apic = 1;
> >  
> >  	local_irq_restore(flags);
> >  }
> >  
> > +int lapic_disabled(void)
> > +{
> > +	return disabled_local_apic;
> > +}
> > +
> >  /**
> >   * sync_Arb_IDs - synchronize APIC bus arbitration IDs
> >   */
> > diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> > index 469b23d6acc2..c934a7868e6b 100644
> > --- a/arch/x86/kernel/machine_kexec_32.c
> > +++ b/arch/x86/kernel/machine_kexec_32.c
> > @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
> >  	local_irq_disable();
> >  	hw_breakpoint_disable();
> >  
> > -	if (image->preserve_context) {
> > +	if (image->preserve_context || lapic_disabled()) {
> >  #ifdef CONFIG_X86_IO_APIC
> >  		/*
> >  		 * We need to put APICs in legacy mode so that we can
> >  		 * get timer interrupts in second kernel. kexec/kdump
> >  		 * paths already have calls to disable_IO_APIC() in
> > -		 * one form or other. kexec jump path also need
> > -		 * one.
> > +		 * one form or other. kexec jump path also need one.
> >  		 */
> >  		disable_IO_APIC();
> >  #endif
> > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > index 5a294e48b185..d3598cdd6437 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -23,6 +23,7 @@
> >  #include <asm/pgtable.h>
> >  #include <asm/tlbflush.h>
> >  #include <asm/mmu_context.h>
> > +#include <asm/apic.h>
> >  #include <asm/io_apic.h>
> >  #include <asm/debugreg.h>
> >  #include <asm/kexec-bzimage64.h>
> > @@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
> >  	local_irq_disable();
> >  	hw_breakpoint_disable();
> >  
> > -	if (image->preserve_context) {
> > +	if (image->preserve_context || lapic_disabled()) {
> >  #ifdef CONFIG_X86_IO_APIC
> >  		/*
> >  		 * We need to put APICs in legacy mode so that we can
> >  		 * get timer interrupts in second kernel. kexec/kdump
> >  		 * paths already have calls to disable_IO_APIC() in
> > -		 * one form or other. kexec jump path also need
> > -		 * one.
> > +		 * one form or other. kexec jump path also need one.
> >  		 */
> >  		disable_IO_APIC();
> >  #endif
> 
> 



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-07 17:55 ` Eric W. Biederman
  2016-07-08  4:48   ` Wei, Jiangang
@ 2016-07-08  7:38   ` Ingo Molnar
  2016-07-11 10:30     ` Wei, Jiangang
  1 sibling, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2016-07-08  7:38 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: fenghua.yu, x86, kexec, linux-kernel, mingo, hpa, tglx,
	Wei Jiangang


* Eric W. Biederman <ebiederm@xmission.com> wrote:

> Sigh.  Can we please just do the work to rip out the apic shutdown code from the 
> kexec on panic code path?
> 
> I forgetting details but the only reason we have do any apic shutdown is bugs in 
> older kernels that could not initialize a system properly if we did not shut 
> down the apics.
> 
> I certainly don't see an issue with goofy cases like notsc not working on a 
> crash capture kernel if we are not initializing the hardware properly.
> 
> The strategy really needs to be to only do the absolutely essential hardware 
> shutdown in the crashing kernel, every adintional line of code we execute in the 
> crashing kernel increases our chances of hitting a bug.

Fully agreed.

> Under that policy things like requring we don't pass boot options that inhibit 
> the dump catpure kernel from initializing the hardware from a random state are 
> reasonable requirements.  AKA I don't see any justification in this as to why we 
> would even want to support notsc on the dump capture kernel.  Especially when 
> things clearly work when that option is not specified.

So at least on the surface it appears 'surprising' that the 'notsc' option (which, 
supposedly, disables TSC handling) interferes with being able to fully boot. Even 
if 'notsc' is specified we are still using the local APIC, right?

So it might be a good idea to find the root cause of this bootup fragility even if 
'notsc' is specified. And I fully agree that it should be fixed in the bootup path 
of the dump kernel, not the crash kernel reboot path.

Thanks,

	Ingo

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-07 10:17 [PATCH v2] kexec: Fix kdump failure with notsc Wei Jiangang
  2016-07-07 17:55 ` Eric W. Biederman
@ 2016-07-08  8:21 ` Nikolay Borisov
  2016-07-12  6:52 ` Xunlei Pang
  2 siblings, 0 replies; 13+ messages in thread
From: Nikolay Borisov @ 2016-07-08  8:21 UTC (permalink / raw)
  To: Wei Jiangang, linux-kernel
  Cc: fenghua.yu, x86, kexec, mingo, ebiederm, hpa, tglx



On 07/07/2016 01:17 PM, Wei Jiangang wrote:
> If we specify the 'notsc' boot parameter for the dump-capture kernel,
> and then trigger a crash(panic) by using "ALT-SysRq-c" or "echo c >
> /proc/sysrq-trigger",
> the dump-capture kernel will hang in calibrate_delay_converge():
> 
>     /* wait for "start of" clock tick */
>     ticks = jiffies;
>     while (ticks == jiffies)
>         ; /* nothing */
> 
> serial log of the hang is as follows:
> 
>     tsc: Fast TSC calibration using PIT
>     tsc: Detected 2099.947 MHz processor
>     Calibrating delay loop...
> 
> The reason is that the dump-capture kernel hangs in while loops and
> waits for jiffies to be updated, but no timer interrupts is passed
> to BSP by APIC.
> 
> In fact, the local APIC was disabled in reboot and crash path by
> lapic_shutdown(). We need to put APIC in legacy mode in kexec jump path
> (put the system into PIT during the crash kernel),
> so that the dump-capture kernel can get timer interrupts.
> 
> BTW,
> I found the buggy commit 522e66464467 ("x86/apic: Disable I/O APIC
> before shutdown of the local APIC") via bisection.
> 
> Originally, I want to revert it.
> But Ingo Molnar comments that "By reverting the change can paper over
> the bug, but re-introduce the bug that can result in certain CPUs hanging
> if IO-APIC sends an APIC message if the lapic is disabled prematurely"
> And I think it's pertinent.
> 
> Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
> ---
>  arch/x86/include/asm/apic.h        | 5 +++++
>  arch/x86/kernel/apic/apic.c        | 9 +++++++++
>  arch/x86/kernel/machine_kexec_32.c | 5 ++---
>  arch/x86/kernel/machine_kexec_64.c | 6 +++---
>  4 files changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index bc27611fa58f..5d7e635e580a 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -128,6 +128,7 @@ extern void clear_local_APIC(void);
>  extern void disconnect_bsp_APIC(int virt_wire_setup);
>  extern void disable_local_APIC(void);
>  extern void lapic_shutdown(void);
> +extern int lapic_disabled(void);



>  extern void sync_Arb_IDs(void);
>  extern void init_bsp_APIC(void);
>  extern void setup_local_APIC(void);
> @@ -165,6 +166,10 @@ extern int setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask);
>  
>  #else /* !CONFIG_X86_LOCAL_APIC */
>  static inline void lapic_shutdown(void) { }
> +static inline int lapic_disabled(void)
> +{
> +	return 0;
> +}
>  #define local_apic_timer_c2_ok		1
>  static inline void init_apic_mappings(void) { }
>  static inline void disable_local_APIC(void) { }
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 60078a67d7e3..d1df250994bb 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -133,6 +133,9 @@ static inline void imcr_apic_to_pic(void)
>  }
>  #endif
>  
> +/* Local APIC is disabled by the kernel for crash or reboot path */
> +static int disabled_local_apic;

Your are using an int for a boolean value, please be more explicit by
declaring the variable as boolean and respectively changing all the
functions returning this value.

> +
>  /*
>   * Knob to control our willingness to enable the local APIC.
>   *
> @@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
>  #endif
>  		disable_local_APIC();
>  
> +	disabled_local_apic = 1;
>  
>  	local_irq_restore(flags);
>  }
>  
> +int lapic_disabled(void)
> +{
> +	return disabled_local_apic;
> +}
> +
>  /**
>   * sync_Arb_IDs - synchronize APIC bus arbitration IDs
>   */
> diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> index 469b23d6acc2..c934a7868e6b 100644
> --- a/arch/x86/kernel/machine_kexec_32.c
> +++ b/arch/x86/kernel/machine_kexec_32.c
> @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
>  	local_irq_disable();
>  	hw_breakpoint_disable();
>  
> -	if (image->preserve_context) {
> +	if (image->preserve_context || lapic_disabled()) {
>  #ifdef CONFIG_X86_IO_APIC
>  		/*
>  		 * We need to put APICs in legacy mode so that we can
>  		 * get timer interrupts in second kernel. kexec/kdump
>  		 * paths already have calls to disable_IO_APIC() in
> -		 * one form or other. kexec jump path also need
> -		 * one.
> +		 * one form or other. kexec jump path also need one.
>  		 */
>  		disable_IO_APIC();
>  #endif
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 5a294e48b185..d3598cdd6437 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -23,6 +23,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  #include <asm/mmu_context.h>
> +#include <asm/apic.h>
>  #include <asm/io_apic.h>
>  #include <asm/debugreg.h>
>  #include <asm/kexec-bzimage64.h>
> @@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
>  	local_irq_disable();
>  	hw_breakpoint_disable();
>  
> -	if (image->preserve_context) {
> +	if (image->preserve_context || lapic_disabled()) {
>  #ifdef CONFIG_X86_IO_APIC
>  		/*
>  		 * We need to put APICs in legacy mode so that we can
>  		 * get timer interrupts in second kernel. kexec/kdump
>  		 * paths already have calls to disable_IO_APIC() in
> -		 * one form or other. kexec jump path also need
> -		 * one.
> +		 * one form or other. kexec jump path also need one.
>  		 */
>  		disable_IO_APIC();
>  #endif
> 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-08  7:38   ` Ingo Molnar
@ 2016-07-11 10:30     ` Wei, Jiangang
  2016-07-13  7:46       ` Wei, Jiangang
  0 siblings, 1 reply; 13+ messages in thread
From: Wei, Jiangang @ 2016-07-11 10:30 UTC (permalink / raw)
  To: mingo@kernel.org
  Cc: fenghua.yu@intel.com, x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	ebiederm@xmission.com, hpa@zytor.com, tglx@linutronix.de

Hi , Ingo

On Fri, 2016-07-08 at 09:38 +0200, Ingo Molnar wrote:
> * Eric W. Biederman <ebiederm@xmission.com> wrote:
> 
> > Sigh.  Can we please just do the work to rip out the apic shutdown code from the 
> > kexec on panic code path?
> > 
> > I forgetting details but the only reason we have do any apic shutdown is bugs in 
> > older kernels that could not initialize a system properly if we did not shut 
> > down the apics.
> > 
> > I certainly don't see an issue with goofy cases like notsc not working on a 
> > crash capture kernel if we are not initializing the hardware properly.
> > 
> > The strategy really needs to be to only do the absolutely essential hardware 
> > shutdown in the crashing kernel, every adintional line of code we execute in the 
> > crashing kernel increases our chances of hitting a bug.
> 
> Fully agreed.
> 
> > Under that policy things like requring we don't pass boot options that inhibit 
> > the dump catpure kernel from initializing the hardware from a random state are 
> > reasonable requirements.  AKA I don't see any justification in this as to why we 
> > would even want to support notsc on the dump capture kernel.  Especially when 
> > things clearly work when that option is not specified.
> 
> So at least on the surface it appears 'surprising' that the 'notsc' option (which, 
> supposedly, disables TSC handling) interferes with being able to fully boot. Even 
> if 'notsc' is specified we are still using the local APIC, right?

In most case,  It's no problem that using local APIC while notsc is
specified.
But not for kdump.

We can get evidence, Especially from "Spurious LAPIC timer interrupt on
cpu 0".

###serial log,

[    0.000000] NR_IRQS:524544 nr_irqs:256 16
[    0.000000] Spurious LAPIC timer interrupt on cpu 0
[    0.000000] Console: colour dummy device 80x25
[    0.000000] console [tty0] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2099.947 MHz processor
[    0.000000] Calibrating delay loop...


Due to the local apic and local apic timer hasn't been setup and enabled
fully, The event_handler of clock event is NULL.

###codes,

static void local_apic_timer_interrupt(void)
{
    int cpu = smp_processor_id();
    struct clock_event_device *evt = &per_cpu(lapic_events, cpu);

    /*
     * Normally we should not be here till LAPIC has been initialized
but
     * in some cases like kdump, its possible that there is a pending
LAPIC
     * timer interrupt from previous kernel's context and is delivered
in
     * new kernel the moment interrupts are enabled.
     *
     * Interrupts are enabled early and LAPIC is setup much later, hence
     * its possible that when we get here evt->event_handler is NULL.
     * Check for event_handler being NULL and discard the interrupt as
     * spurious.
     */
    if (!evt->event_handler) {
        pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu);
        /* Switch it off */
        lapic_timer_shutdown(evt);
        return;
    }

     .............
}


IMHO, 
If we specify notsc, the dump-capture kernel waits for jiffies being
updated early and LAPIC and timer are setup much later, which causes no
timer interrupts is passed to BSP. as following,

start_kernel  -->
1)-> calibrate_delay()  -> calibrate_delay_converge()  # hang and wait
for jiffies changed
                   
2)-> rest_init() -> kernel_init() -> .... -> apic_bsp_setup() ->
setup_local_APIC()

-> setup_percpu_clockev().

the setup_percpu_clockev points setup_boot_APIC_clock() which used to
setup the boot APIC and timer.


> So it might be a good idea to find the root cause of this bootup fragility even if 
> 'notsc' is specified. And I fully agree that it should be fixed in the bootup path 
> of the dump kernel, not the crash kernel reboot path.

Because the lapic and timer are not ready when dump-capture waits them
to update the jiffies value. so I suggest to put APIC in legacy mode in
local_apic_timer_interrupt() temporarily, which in the bootup path of
dump kernel. 

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index dcb52850a28f..af3be93997ed 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -879,6 +879,7 @@ static void local_apic_timer_interrupt(void)
                pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
cpu);
                /* Switch it off */
                lapic_timer_shutdown(evt);
+             disable_IO_APIC();
                return;
        }

And the new solution can fix the problem.
What‘s your opinion about it?

Thanks,
wei

> 
> Thanks,
> 
> 	Ingo
> 
> 



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-07 10:17 [PATCH v2] kexec: Fix kdump failure with notsc Wei Jiangang
  2016-07-07 17:55 ` Eric W. Biederman
  2016-07-08  8:21 ` Nikolay Borisov
@ 2016-07-12  6:52 ` Xunlei Pang
  2016-07-12  8:21   ` Baoquan He
  2016-07-12  9:09   ` Wei, Jiangang
  2 siblings, 2 replies; 13+ messages in thread
From: Xunlei Pang @ 2016-07-12  6:52 UTC (permalink / raw)
  To: Wei Jiangang, linux-kernel
  Cc: fenghua.yu, x86, kexec, mingo, ebiederm, hpa, tglx

On 2016/07/07 at 18:17, Wei Jiangang wrote:
> If we specify the 'notsc' boot parameter for the dump-capture kernel,
> and then trigger a crash(panic) by using "ALT-SysRq-c" or "echo c >
> /proc/sysrq-trigger",
> the dump-capture kernel will hang in calibrate_delay_converge():
>
>     /* wait for "start of" clock tick */
>     ticks = jiffies;
>     while (ticks == jiffies)
>         ; /* nothing */
>
> serial log of the hang is as follows:
>
>     tsc: Fast TSC calibration using PIT
>     tsc: Detected 2099.947 MHz processor
>     Calibrating delay loop...
>
> The reason is that the dump-capture kernel hangs in while loops and
> waits for jiffies to be updated, but no timer interrupts is passed
> to BSP by APIC.
>
> In fact, the local APIC was disabled in reboot and crash path by
> lapic_shutdown(). We need to put APIC in legacy mode in kexec jump path
> (put the system into PIT during the crash kernel),
> so that the dump-capture kernel can get timer interrupts.
>
> BTW,
> I found the buggy commit 522e66464467 ("x86/apic: Disable I/O APIC
> before shutdown of the local APIC") via bisection.
>
> Originally, I want to revert it.
> But Ingo Molnar comments that "By reverting the change can paper over
> the bug, but re-introduce the bug that can result in certain CPUs hanging
> if IO-APIC sends an APIC message if the lapic is disabled prematurely"
> And I think it's pertinent.
>
> Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
> ---
>  arch/x86/include/asm/apic.h        | 5 +++++
>  arch/x86/kernel/apic/apic.c        | 9 +++++++++
>  arch/x86/kernel/machine_kexec_32.c | 5 ++---
>  arch/x86/kernel/machine_kexec_64.c | 6 +++---
>  4 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index bc27611fa58f..5d7e635e580a 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -128,6 +128,7 @@ extern void clear_local_APIC(void);
>  extern void disconnect_bsp_APIC(int virt_wire_setup);
>  extern void disable_local_APIC(void);
>  extern void lapic_shutdown(void);
> +extern int lapic_disabled(void);
>  extern void sync_Arb_IDs(void);
>  extern void init_bsp_APIC(void);
>  extern void setup_local_APIC(void);
> @@ -165,6 +166,10 @@ extern int setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask);
>  
>  #else /* !CONFIG_X86_LOCAL_APIC */
>  static inline void lapic_shutdown(void) { }
> +static inline int lapic_disabled(void)
> +{
> +	return 0;
> +}
>  #define local_apic_timer_c2_ok		1
>  static inline void init_apic_mappings(void) { }
>  static inline void disable_local_APIC(void) { }
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 60078a67d7e3..d1df250994bb 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -133,6 +133,9 @@ static inline void imcr_apic_to_pic(void)
>  }
>  #endif
>  
> +/* Local APIC is disabled by the kernel for crash or reboot path */
> +static int disabled_local_apic;
> +
>  /*
>   * Knob to control our willingness to enable the local APIC.
>   *
> @@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
>  #endif
>  		disable_local_APIC();
>  
> +	disabled_local_apic = 1;
>  
>  	local_irq_restore(flags);
>  }
>  
> +int lapic_disabled(void)
> +{
> +	return disabled_local_apic;
> +}
> +
>  /**
>   * sync_Arb_IDs - synchronize APIC bus arbitration IDs
>   */
> diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> index 469b23d6acc2..c934a7868e6b 100644
> --- a/arch/x86/kernel/machine_kexec_32.c
> +++ b/arch/x86/kernel/machine_kexec_32.c
> @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
>  	local_irq_disable();
>  	hw_breakpoint_disable();
>  
> -	if (image->preserve_context) {
> +	if (image->preserve_context || lapic_disabled()) {
>  #ifdef CONFIG_X86_IO_APIC
>  		/*
>  		 * We need to put APICs in legacy mode so that we can
>  		 * get timer interrupts in second kernel. kexec/kdump
>  		 * paths already have calls to disable_IO_APIC() in
> -		 * one form or other. kexec jump path also need
> -		 * one.
> +		 * one form or other. kexec jump path also need one.
>  		 */
>  		disable_IO_APIC();

Hi Wei,

As the comment says, kexec/kdump paths already have disable_IO_APIC(), why again here?

Regards,
Xunlei

>  #endif
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 5a294e48b185..d3598cdd6437 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -23,6 +23,7 @@
>  #include <asm/pgtable.h>
>  #include <asm/tlbflush.h>
>  #include <asm/mmu_context.h>
> +#include <asm/apic.h>
>  #include <asm/io_apic.h>
>  #include <asm/debugreg.h>
>  #include <asm/kexec-bzimage64.h>
> @@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
>  	local_irq_disable();
>  	hw_breakpoint_disable();
>  
> -	if (image->preserve_context) {
> +	if (image->preserve_context || lapic_disabled()) {
>  #ifdef CONFIG_X86_IO_APIC
>  		/*
>  		 * We need to put APICs in legacy mode so that we can
>  		 * get timer interrupts in second kernel. kexec/kdump
>  		 * paths already have calls to disable_IO_APIC() in
> -		 * one form or other. kexec jump path also need
> -		 * one.
> +		 * one form or other. kexec jump path also need one.
>  		 */
>  		disable_IO_APIC();
>  #endif


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-12  6:52 ` Xunlei Pang
@ 2016-07-12  8:21   ` Baoquan He
  2016-07-12  9:10     ` Wei, Jiangang
  2016-07-12  9:09   ` Wei, Jiangang
  1 sibling, 1 reply; 13+ messages in thread
From: Baoquan He @ 2016-07-12  8:21 UTC (permalink / raw)
  To: xlpang
  Cc: fenghua.yu, x86, kexec, linux-kernel, mingo, ebiederm, hpa, tglx,
	Wei Jiangang

On 07/12/16 at 02:52pm, Xunlei Pang wrote:
> On 2016/07/07 at 18:17, Wei Jiangang wrote:
> > Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
> > ---
> > +/* Local APIC is disabled by the kernel for crash or reboot path */
> > +static int disabled_local_apic;
> > +
> >  /*
> >   * Knob to control our willingness to enable the local APIC.
> >   *
> > @@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
> >  #endif
> >  		disable_local_APIC();
> >  
> > +	disabled_local_apic = 1;
> >  
> >  	local_irq_restore(flags);
> >  }
> >  
> > +int lapic_disabled(void)
> > +{
> > +	return disabled_local_apic;
> > +}
> > +
> >  /**
> >   * sync_Arb_IDs - synchronize APIC bus arbitration IDs
> >   */
> > diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> > index 469b23d6acc2..c934a7868e6b 100644
> > --- a/arch/x86/kernel/machine_kexec_32.c
> > +++ b/arch/x86/kernel/machine_kexec_32.c
> > @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
> >  	local_irq_disable();
> >  	hw_breakpoint_disable();
> >  
> > -	if (image->preserve_context) {
> > +	if (image->preserve_context || lapic_disabled()) {
> >  #ifdef CONFIG_X86_IO_APIC
> >  		/*
> >  		 * We need to put APICs in legacy mode so that we can
> >  		 * get timer interrupts in second kernel. kexec/kdump
> >  		 * paths already have calls to disable_IO_APIC() in
> > -		 * one form or other. kexec jump path also need
> > -		 * one.
> > +		 * one form or other. kexec jump path also need one.
> >  		 */
> >  		disable_IO_APIC();
> 
> Hi Wei,
> 
> As the comment says, kexec/kdump paths already have disable_IO_APIC(), why again here?

I also have this question. I guess Jiangang didn't post his modification
correctly. He should remove calling of disable_IO_APIC in
native_machine_crash_shutdown(). Assume his test was done on correct
code change. 

> 
> Regards,
> Xunlei
> 
> >  #endif
> > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > index 5a294e48b185..d3598cdd6437 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -23,6 +23,7 @@
> >  #include <asm/pgtable.h>
> >  #include <asm/tlbflush.h>
> >  #include <asm/mmu_context.h>
> > +#include <asm/apic.h>
> >  #include <asm/io_apic.h>
> >  #include <asm/debugreg.h>
> >  #include <asm/kexec-bzimage64.h>
> > @@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
> >  	local_irq_disable();
> >  	hw_breakpoint_disable();
> >  
> > -	if (image->preserve_context) {
> > +	if (image->preserve_context || lapic_disabled()) {
> >  #ifdef CONFIG_X86_IO_APIC
> >  		/*
> >  		 * We need to put APICs in legacy mode so that we can
> >  		 * get timer interrupts in second kernel. kexec/kdump
> >  		 * paths already have calls to disable_IO_APIC() in
> > -		 * one form or other. kexec jump path also need
> > -		 * one.
> > +		 * one form or other. kexec jump path also need one.
> >  		 */
> >  		disable_IO_APIC();
> >  #endif
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-12  6:52 ` Xunlei Pang
  2016-07-12  8:21   ` Baoquan He
@ 2016-07-12  9:09   ` Wei, Jiangang
  2016-07-12  9:29     ` Baoquan He
  1 sibling, 1 reply; 13+ messages in thread
From: Wei, Jiangang @ 2016-07-12  9:09 UTC (permalink / raw)
  To: xlpang@redhat.com
  Cc: fenghua.yu@intel.com, x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	ebiederm@xmission.com, hpa@zytor.com, tglx@linutronix.de

On Tue, 2016-07-12 at 14:52 +0800, Xunlei Pang wrote:
> On 2016/07/07 at 18:17, Wei Jiangang wrote:
> > If we specify the 'notsc' boot parameter for the dump-capture kernel,
> > and then trigger a crash(panic) by using "ALT-SysRq-c" or "echo c >
> > /proc/sysrq-trigger",
> > the dump-capture kernel will hang in calibrate_delay_converge():
> >
> >     /* wait for "start of" clock tick */
> >     ticks = jiffies;
> >     while (ticks == jiffies)
> >         ; /* nothing */
> >
> > serial log of the hang is as follows:
> >
> >     tsc: Fast TSC calibration using PIT
> >     tsc: Detected 2099.947 MHz processor
> >     Calibrating delay loop...
> >
> > The reason is that the dump-capture kernel hangs in while loops and
> > waits for jiffies to be updated, but no timer interrupts is passed
> > to BSP by APIC.
> >
> > In fact, the local APIC was disabled in reboot and crash path by
> > lapic_shutdown(). We need to put APIC in legacy mode in kexec jump path
> > (put the system into PIT during the crash kernel),
> > so that the dump-capture kernel can get timer interrupts.
> >
> > BTW,
> > I found the buggy commit 522e66464467 ("x86/apic: Disable I/O APIC
> > before shutdown of the local APIC") via bisection.
> >
> > Originally, I want to revert it.
> > But Ingo Molnar comments that "By reverting the change can paper over
> > the bug, but re-introduce the bug that can result in certain CPUs hanging
> > if IO-APIC sends an APIC message if the lapic is disabled prematurely"
> > And I think it's pertinent.
> >
> > Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
> > ---
> >  arch/x86/include/asm/apic.h        | 5 +++++
> >  arch/x86/kernel/apic/apic.c        | 9 +++++++++
> >  arch/x86/kernel/machine_kexec_32.c | 5 ++---
> >  arch/x86/kernel/machine_kexec_64.c | 6 +++---
> >  4 files changed, 19 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> > index bc27611fa58f..5d7e635e580a 100644
> > --- a/arch/x86/include/asm/apic.h
> > +++ b/arch/x86/include/asm/apic.h
> > @@ -128,6 +128,7 @@ extern void clear_local_APIC(void);
> >  extern void disconnect_bsp_APIC(int virt_wire_setup);
> >  extern void disable_local_APIC(void);
> >  extern void lapic_shutdown(void);
> > +extern int lapic_disabled(void);
> >  extern void sync_Arb_IDs(void);
> >  extern void init_bsp_APIC(void);
> >  extern void setup_local_APIC(void);
> > @@ -165,6 +166,10 @@ extern int setup_APIC_eilvt(u8 lvt_off, u8 vector, u8 msg_type, u8 mask);
> >  
> >  #else /* !CONFIG_X86_LOCAL_APIC */
> >  static inline void lapic_shutdown(void) { }
> > +static inline int lapic_disabled(void)
> > +{
> > +	return 0;
> > +}
> >  #define local_apic_timer_c2_ok		1
> >  static inline void init_apic_mappings(void) { }
> >  static inline void disable_local_APIC(void) { }
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index 60078a67d7e3..d1df250994bb 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -133,6 +133,9 @@ static inline void imcr_apic_to_pic(void)
> >  }
> >  #endif
> >  
> > +/* Local APIC is disabled by the kernel for crash or reboot path */
> > +static int disabled_local_apic;
> > +
> >  /*
> >   * Knob to control our willingness to enable the local APIC.
> >   *
> > @@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
> >  #endif
> >  		disable_local_APIC();
> >  
> > +	disabled_local_apic = 1;
> >  
> >  	local_irq_restore(flags);
> >  }
> >  
> > +int lapic_disabled(void)
> > +{
> > +	return disabled_local_apic;
> > +}
> > +
> >  /**
> >   * sync_Arb_IDs - synchronize APIC bus arbitration IDs
> >   */
> > diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> > index 469b23d6acc2..c934a7868e6b 100644
> > --- a/arch/x86/kernel/machine_kexec_32.c
> > +++ b/arch/x86/kernel/machine_kexec_32.c
> > @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
> >  	local_irq_disable();
> >  	hw_breakpoint_disable();
> >  
> > -	if (image->preserve_context) {
> > +	if (image->preserve_context || lapic_disabled()) {
> >  #ifdef CONFIG_X86_IO_APIC
> >  		/*
> >  		 * We need to put APICs in legacy mode so that we can
> >  		 * get timer interrupts in second kernel. kexec/kdump
> >  		 * paths already have calls to disable_IO_APIC() in
> > -		 * one form or other. kexec jump path also need
> > -		 * one.
> > +		 * one form or other. kexec jump path also need one.
> >  		 */
> >  		disable_IO_APIC();
> 
> Hi Wei,
> 
> As the comment says, kexec/kdump paths already have disable_IO_APIC(), why again here?
Frankly, I am not very clear about it as well...

Originally the codes and comment added by Huang Ying
<ying.huang@intel.com>
There may be some special reason...

The comment also said that it's used to put APICs in legacy mode so that
we can get timer interrupts in second kernel. and we also need it in
kexec jump path.

In fact, the jiffies calibration needs timer interrupts in second
kernel. and the lapic and timer are not ready when second kernel waits
them to update the jiffies value.

so I just want to reuse them and enable legacy mode (PIT timer) for
notsc case.
kernel.

Thanks,
wei
> 
> Regards,
> Xunlei
> 
> >  #endif
> > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > index 5a294e48b185..d3598cdd6437 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -23,6 +23,7 @@
> >  #include <asm/pgtable.h>
> >  #include <asm/tlbflush.h>
> >  #include <asm/mmu_context.h>
> > +#include <asm/apic.h>
> >  #include <asm/io_apic.h>
> >  #include <asm/debugreg.h>
> >  #include <asm/kexec-bzimage64.h>
> > @@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
> >  	local_irq_disable();
> >  	hw_breakpoint_disable();
> >  
> > -	if (image->preserve_context) {
> > +	if (image->preserve_context || lapic_disabled()) {
> >  #ifdef CONFIG_X86_IO_APIC
> >  		/*
> >  		 * We need to put APICs in legacy mode so that we can
> >  		 * get timer interrupts in second kernel. kexec/kdump
> >  		 * paths already have calls to disable_IO_APIC() in
> > -		 * one form or other. kexec jump path also need
> > -		 * one.
> > +		 * one form or other. kexec jump path also need one.
> >  		 */
> >  		disable_IO_APIC();
> >  #endif
> 
> 
> 



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-12  8:21   ` Baoquan He
@ 2016-07-12  9:10     ` Wei, Jiangang
  0 siblings, 0 replies; 13+ messages in thread
From: Wei, Jiangang @ 2016-07-12  9:10 UTC (permalink / raw)
  To: bhe@redhat.com
  Cc: fenghua.yu@intel.com, kexec@lists.infradead.org, x86@kernel.org,
	xlpang@redhat.com, linux-kernel@vger.kernel.org, mingo@redhat.com,
	ebiederm@xmission.com, hpa@zytor.com, tglx@linutronix.de

On Tue, 2016-07-12 at 16:21 +0800, Baoquan He wrote:
> On 07/12/16 at 02:52pm, Xunlei Pang wrote:
> > On 2016/07/07 at 18:17, Wei Jiangang wrote:
> > > Signed-off-by: Wei Jiangang <weijg.fnst@cn.fujitsu.com>
> > > ---
> > > +/* Local APIC is disabled by the kernel for crash or reboot path */
> > > +static int disabled_local_apic;
> > > +
> > >  /*
> > >   * Knob to control our willingness to enable the local APIC.
> > >   *
> > > @@ -1097,10 +1100,16 @@ void lapic_shutdown(void)
> > >  #endif
> > >  		disable_local_APIC();
> > >  
> > > +	disabled_local_apic = 1;
> > >  
> > >  	local_irq_restore(flags);
> > >  }
> > >  
> > > +int lapic_disabled(void)
> > > +{
> > > +	return disabled_local_apic;
> > > +}
> > > +
> > >  /**
> > >   * sync_Arb_IDs - synchronize APIC bus arbitration IDs
> > >   */
> > > diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> > > index 469b23d6acc2..c934a7868e6b 100644
> > > --- a/arch/x86/kernel/machine_kexec_32.c
> > > +++ b/arch/x86/kernel/machine_kexec_32.c
> > > @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
> > >  	local_irq_disable();
> > >  	hw_breakpoint_disable();
> > >  
> > > -	if (image->preserve_context) {
> > > +	if (image->preserve_context || lapic_disabled()) {
> > >  #ifdef CONFIG_X86_IO_APIC
> > >  		/*
> > >  		 * We need to put APICs in legacy mode so that we can
> > >  		 * get timer interrupts in second kernel. kexec/kdump
> > >  		 * paths already have calls to disable_IO_APIC() in
> > > -		 * one form or other. kexec jump path also need
> > > -		 * one.
> > > +		 * one form or other. kexec jump path also need one.
> > >  		 */
> > >  		disable_IO_APIC();
> > 
> > Hi Wei,
> > 
> > As the comment says, kexec/kdump paths already have disable_IO_APIC(), why again here?
> 
> I also have this question. I guess Jiangang didn't post his modification
> correctly. He should remove calling of disable_IO_APIC in
> native_machine_crash_shutdown(). Assume his test was done on correct
> code change. 
Hi he,
Thanks for your suggestion firstly.

In fact, Eric had suggested me to do that last week
(https://lkml.org/lkml/2016/7/8/14).
And i had a try to remove the calling of disable_IO_APIC in
native_machine_crash_shutdown().
But it doesn't work for kdump with notsc.

Thanks,
wei
> 
> > 
> > Regards,
> > Xunlei
> > 
> > >  #endif
> > > diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> > > index 5a294e48b185..d3598cdd6437 100644
> > > --- a/arch/x86/kernel/machine_kexec_64.c
> > > +++ b/arch/x86/kernel/machine_kexec_64.c
> > > @@ -23,6 +23,7 @@
> > >  #include <asm/pgtable.h>
> > >  #include <asm/tlbflush.h>
> > >  #include <asm/mmu_context.h>
> > > +#include <asm/apic.h>
> > >  #include <asm/io_apic.h>
> > >  #include <asm/debugreg.h>
> > >  #include <asm/kexec-bzimage64.h>
> > > @@ -269,14 +270,13 @@ void machine_kexec(struct kimage *image)
> > >  	local_irq_disable();
> > >  	hw_breakpoint_disable();
> > >  
> > > -	if (image->preserve_context) {
> > > +	if (image->preserve_context || lapic_disabled()) {
> > >  #ifdef CONFIG_X86_IO_APIC
> > >  		/*
> > >  		 * We need to put APICs in legacy mode so that we can
> > >  		 * get timer interrupts in second kernel. kexec/kdump
> > >  		 * paths already have calls to disable_IO_APIC() in
> > > -		 * one form or other. kexec jump path also need
> > > -		 * one.
> > > +		 * one form or other. kexec jump path also need one.
> > >  		 */
> > >  		disable_IO_APIC();
> > >  #endif
> > 
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> 
> 



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-12  9:09   ` Wei, Jiangang
@ 2016-07-12  9:29     ` Baoquan He
  0 siblings, 0 replies; 13+ messages in thread
From: Baoquan He @ 2016-07-12  9:29 UTC (permalink / raw)
  To: Wei, Jiangang, ying.huang
  Cc: fenghua.yu@intel.com, kexec@lists.infradead.org, x86@kernel.org,
	xlpang@redhat.com, linux-kernel@vger.kernel.org, mingo@redhat.com,
	ebiederm@xmission.com, hpa@zytor.com, tglx@linutronix.de

On 07/12/16 at 09:09am, Wei, Jiangang wrote:
> On Tue, 2016-07-12 at 14:52 +0800, Xunlei Pang wrote:
> > On 2016/07/07 at 18:17, Wei Jiangang wrote:

> > > diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c
> > > index 469b23d6acc2..c934a7868e6b 100644
> > > --- a/arch/x86/kernel/machine_kexec_32.c
> > > +++ b/arch/x86/kernel/machine_kexec_32.c
> > > @@ -202,14 +202,13 @@ void machine_kexec(struct kimage *image)
> > >  	local_irq_disable();
> > >  	hw_breakpoint_disable();
> > >  
> > > -	if (image->preserve_context) {
> > > +	if (image->preserve_context || lapic_disabled()) {
> > >  #ifdef CONFIG_X86_IO_APIC
> > >  		/*
> > >  		 * We need to put APICs in legacy mode so that we can
> > >  		 * get timer interrupts in second kernel. kexec/kdump
> > >  		 * paths already have calls to disable_IO_APIC() in
> > > -		 * one form or other. kexec jump path also need
> > > -		 * one.
> > > +		 * one form or other. kexec jump path also need one.
> > >  		 */
> > >  		disable_IO_APIC();
> > 
> > Hi Wei,
> > 
> > As the comment says, kexec/kdump paths already have disable_IO_APIC(), why again here?
> Frankly, I am not very clear about it as well...
> 
> Originally the codes and comment added by Huang Ying
> <ying.huang@intel.com>
> There may be some special reason...

Check code again, it might be native_disable_io_apic() is called in
disable_IO_APIC(). See the code comments at the beginning of
native_disable_io_apic(), ioapic_i8259 is used to record the pin of
ioapic which connect to 8259. Here ioapic is configured to virtual wire
mode to pass through interrupt to cpu. That might be why Huang Ying
said it can put it into legacy irq mode with calling with
disable_IO_APIC.

void native_disable_io_apic(void)
{
        /*
         * If the i8259 is routed through an IOAPIC
         * Put that IOAPIC in virtual wire mode
         * so legacy interrupts can be delivered.
         */
...
}

Now question is why disable_IO_APIC need be called twice to make ioapic
to be virtual wire mode.

Add Huang Ying to this thread, don't know if he can help explain.

> 
> The comment also said that it's used to put APICs in legacy mode so that
> we can get timer interrupts in second kernel. and we also need it in
> kexec jump path.
> 
> In fact, the jiffies calibration needs timer interrupts in second
> kernel. and the lapic and timer are not ready when second kernel waits
> them to update the jiffies value.
> 
> so I just want to reuse them and enable legacy mode (PIT timer) for
> notsc case.
> kernel.

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-11 10:30     ` Wei, Jiangang
@ 2016-07-13  7:46       ` Wei, Jiangang
  2016-07-13  9:05         ` Baoquan He
  0 siblings, 1 reply; 13+ messages in thread
From: Wei, Jiangang @ 2016-07-13  7:46 UTC (permalink / raw)
  To: mingo@kernel.org
  Cc: fenghua.yu@intel.com, x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	ebiederm@xmission.com, hpa@zytor.com, tglx@linutronix.de

On Mon, 2016-07-11 at 18:28 +0800, Wei Jiangang wrote:
> Hi , Ingo
> 
> On Fri, 2016-07-08 at 09:38 +0200, Ingo Molnar wrote:
> > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > 
> > > Sigh.  Can we please just do the work to rip out the apic shutdown code from the 
> > > kexec on panic code path?
> > > 
> > > I forgetting details but the only reason we have do any apic shutdown is bugs in 
> > > older kernels that could not initialize a system properly if we did not shut 
> > > down the apics.
> > > 
> > > I certainly don't see an issue with goofy cases like notsc not working on a 
> > > crash capture kernel if we are not initializing the hardware properly.
> > > 
> > > The strategy really needs to be to only do the absolutely essential hardware 
> > > shutdown in the crashing kernel, every adintional line of code we execute in the 
> > > crashing kernel increases our chances of hitting a bug.
> > 
> > Fully agreed.
> > 
> > > Under that policy things like requring we don't pass boot options that inhibit 
> > > the dump catpure kernel from initializing the hardware from a random state are 
> > > reasonable requirements.  AKA I don't see any justification in this as to why we 
> > > would even want to support notsc on the dump capture kernel.  Especially when 
> > > things clearly work when that option is not specified.
> > 
> > So at least on the surface it appears 'surprising' that the 'notsc' option (which, 
> > supposedly, disables TSC handling) interferes with being able to fully boot. Even 
> > if 'notsc' is specified we are still using the local APIC, right?
> 
> In most case,  It's no problem that using local APIC while notsc is
> specified.
> But not for kdump.
> 
> We can get evidence, Especially from "Spurious LAPIC timer interrupt on
> cpu 0".
> 
> ###serial log,
> 
> [    0.000000] NR_IRQS:524544 nr_irqs:256 16
> [    0.000000] Spurious LAPIC timer interrupt on cpu 0
> [    0.000000] Console: colour dummy device 80x25
> [    0.000000] console [tty0] enabled
> [    0.000000] console [ttyS0] enabled
> [    0.000000] tsc: Fast TSC calibration using PIT
> [    0.000000] tsc: Detected 2099.947 MHz processor
> [    0.000000] Calibrating delay loop...
> 
> 
> Due to the local apic and local apic timer hasn't been setup and enabled
> fully, The event_handler of clock event is NULL.
> 
> ###codes,
> 
> static void local_apic_timer_interrupt(void)
> {
>     int cpu = smp_processor_id();
>     struct clock_event_device *evt = &per_cpu(lapic_events, cpu);
> 
>     /*
>      * Normally we should not be here till LAPIC has been initialized
> but
>      * in some cases like kdump, its possible that there is a pending
> LAPIC
>      * timer interrupt from previous kernel's context and is delivered
> in
>      * new kernel the moment interrupts are enabled.
>      *
>      * Interrupts are enabled early and LAPIC is setup much later, hence
>      * its possible that when we get here evt->event_handler is NULL.
>      * Check for event_handler being NULL and discard the interrupt as
>      * spurious.
>      */
>     if (!evt->event_handler) {
>         pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu);
>         /* Switch it off */
>         lapic_timer_shutdown(evt);
>         return;
>     }
> 
>      .............
> }
> 
> 
> IMHO, 
> If we specify notsc, the dump-capture kernel waits for jiffies being
> updated early and LAPIC and timer are setup much later, which causes no
> timer interrupts is passed to BSP. as following,
> 
> start_kernel  -->
> 1)-> calibrate_delay()  -> calibrate_delay_converge()  # hang and wait
> for jiffies changed
>                    
> 2)-> rest_init() -> kernel_init() -> .... -> apic_bsp_setup() ->
> setup_local_APIC()
> 
> -> setup_percpu_clockev().
> 
> the setup_percpu_clockev points setup_boot_APIC_clock() which used to
> setup the boot APIC and timer.
> 
> 
> > So it might be a good idea to find the root cause of this bootup fragility even if 
> > 'notsc' is specified. And I fully agree that it should be fixed in the bootup path 
> > of the dump kernel, not the crash kernel reboot path.
> 

Can anyone give some advice or commet on the following idea?
Thanks in advance.

> Because the lapic and timer are not ready when dump-capture waits them
> to update the jiffies value. so I suggest to put APIC in legacy mode in
> local_apic_timer_interrupt() temporarily, which in the bootup path of
> dump kernel. 
> 
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index dcb52850a28f..af3be93997ed 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -879,6 +879,7 @@ static void local_apic_timer_interrupt(void)
>                 pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
> cpu);
>                 /* Switch it off */
>                 lapic_timer_shutdown(evt);
> +             disable_IO_APIC();
>                 return;
>         }
> 
> And the new solution can fix the problem.
> What‘s your opinion about it?
> 
> Thanks,
> wei
> 
> > 
> > Thanks,
> > 
> > 	Ingo
> > 
> > 
> 



_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2] kexec: Fix kdump failure with notsc
  2016-07-13  7:46       ` Wei, Jiangang
@ 2016-07-13  9:05         ` Baoquan He
  0 siblings, 0 replies; 13+ messages in thread
From: Baoquan He @ 2016-07-13  9:05 UTC (permalink / raw)
  To: Wei, Jiangang
  Cc: fenghua.yu@intel.com, x86@kernel.org, kexec@lists.infradead.org,
	linux-kernel@vger.kernel.org, mingo@redhat.com,
	ebiederm@xmission.com, hpa@zytor.com, tglx@linutronix.de,
	mingo@kernel.org

On 07/13/16 at 07:46am, Wei, Jiangang wrote:
> On Mon, 2016-07-11 at 18:28 +0800, Wei Jiangang wrote:
> > Hi , Ingo
> > 
> > On Fri, 2016-07-08 at 09:38 +0200, Ingo Molnar wrote:
> > > * Eric W. Biederman <ebiederm@xmission.com> wrote:
> > > 
> > > > Sigh.  Can we please just do the work to rip out the apic shutdown code from the 
> > > > kexec on panic code path?
> > > > 
> > > > I forgetting details but the only reason we have do any apic shutdown is bugs in 
> > > > older kernels that could not initialize a system properly if we did not shut 
> > > > down the apics.
> > > > 
> > > > I certainly don't see an issue with goofy cases like notsc not working on a 
> > > > crash capture kernel if we are not initializing the hardware properly.
> > > > 
> > > > The strategy really needs to be to only do the absolutely essential hardware 
> > > > shutdown in the crashing kernel, every adintional line of code we execute in the 
> > > > crashing kernel increases our chances of hitting a bug.
> > > 
> > > Fully agreed.
> > > 
> > > > Under that policy things like requring we don't pass boot options that inhibit 
> > > > the dump catpure kernel from initializing the hardware from a random state are 
> > > > reasonable requirements.  AKA I don't see any justification in this as to why we 
> > > > would even want to support notsc on the dump capture kernel.  Especially when 
> > > > things clearly work when that option is not specified.
> > > 
> > > So at least on the surface it appears 'surprising' that the 'notsc' option (which, 
> > > supposedly, disables TSC handling) interferes with being able to fully boot. Even 
> > > if 'notsc' is specified we are still using the local APIC, right?
> > 
> > In most case,  It's no problem that using local APIC while notsc is
> > specified.
> > But not for kdump.
> > 
> > We can get evidence, Especially from "Spurious LAPIC timer interrupt on
> > cpu 0".
> > 
> > ###serial log,
> > 
> > [    0.000000] NR_IRQS:524544 nr_irqs:256 16
> > [    0.000000] Spurious LAPIC timer interrupt on cpu 0
> > [    0.000000] Console: colour dummy device 80x25
> > [    0.000000] console [tty0] enabled
> > [    0.000000] console [ttyS0] enabled
> > [    0.000000] tsc: Fast TSC calibration using PIT
> > [    0.000000] tsc: Detected 2099.947 MHz processor
> > [    0.000000] Calibrating delay loop...
> > 
> > 
> > Due to the local apic and local apic timer hasn't been setup and enabled
> > fully, The event_handler of clock event is NULL.
> > 
> > ###codes,
> > 
> > static void local_apic_timer_interrupt(void)
> > {
> >     int cpu = smp_processor_id();
> >     struct clock_event_device *evt = &per_cpu(lapic_events, cpu);
> > 
> >     /*
> >      * Normally we should not be here till LAPIC has been initialized
> > but
> >      * in some cases like kdump, its possible that there is a pending
> > LAPIC
> >      * timer interrupt from previous kernel's context and is delivered
> > in
> >      * new kernel the moment interrupts are enabled.
> >      *
> >      * Interrupts are enabled early and LAPIC is setup much later, hence
> >      * its possible that when we get here evt->event_handler is NULL.
> >      * Check for event_handler being NULL and discard the interrupt as
> >      * spurious.
> >      */
> >     if (!evt->event_handler) {
> >         pr_warning("Spurious LAPIC timer interrupt on cpu %d\n", cpu);
> >         /* Switch it off */
> >         lapic_timer_shutdown(evt);
> >         return;
> >     }
> > 
> >      .............
> > }
> > 
> > 
> > IMHO, 
> > If we specify notsc, the dump-capture kernel waits for jiffies being
> > updated early and LAPIC and timer are setup much later, which causes no
> > timer interrupts is passed to BSP. as following,
> > 
> > start_kernel  -->
> > 1)-> calibrate_delay()  -> calibrate_delay_converge()  # hang and wait
> > for jiffies changed
> >                    
> > 2)-> rest_init() -> kernel_init() -> .... -> apic_bsp_setup() ->
> > setup_local_APIC()
> > 
> > -> setup_percpu_clockev().
> > 
> > the setup_percpu_clockev points setup_boot_APIC_clock() which used to
> > setup the boot APIC and timer.
> > 
> > 
> > > So it might be a good idea to find the root cause of this bootup fragility even if 
> > > 'notsc' is specified. And I fully agree that it should be fixed in the bootup path 
> > > of the dump kernel, not the crash kernel reboot path.
> > 
> 
> Can anyone give some advice or commet on the following idea?
> Thanks in advance.

You can't do this.

The reason is disable_IO_APIC not only disable io-apic by masking each
pin but setup the apic virtual wire mode.

I dig code and found somethings really interesting. lapic_shutdown()
really disable local apic by writing the APIC software enable/disable
flag in the spurious-interrupt vector register, please check intel arch
manual section 10.4.3. However disable_IO_APIC() softwae enable local
apic again though just make it in virtual wire mode. About virtual wire
mode, in x86 32 IMCR is used to control PIC mode or APIC mode system can
choose. In x86 64 it seems that system will go to virtual wire mode and
then enter into APIC mode. In virtual wire mode the IRET entry of pin which
connected i8259  will be setup as pass through, and LINTIN0 will be set
as ExtMode to get the interrupt into cpu.

This can explain why in commit 522e66464467 ("x86/apic: Disable I/O APIC
before shutdown of the local APIC") Fenghua decided to call
disable_IO_APIC firstly, because disable_IO_APIC enable apic again to go
back into  Virtual Wire compatibility mode. In commit log he said lapic
is disabled, but io-apic is active again.

And this also explains why you need call disable_IO_APIC twice to make
notsc work again. With commit 522e664 local apic is disabled completely,
so you have to call disable_IO_APIC() to go into virtual wire mode, this
software enable local apic.

Now I don't understand why specifing notsc lead to system hang. I am
digging clocksource code, got some finding, but not very clear.
Seems local_apic_timer_interrupt give me some clues, still not clear.
How about not hurry to fix this, let's get together to make this clear
and then make a final fix. At least disable_IO_APIC have to be changed
to make it more reasonable, from the name of function we can't see what
it really is and it really does.

If we know why notsc cause system hang, it should be clear. TSC is the
highest resolution clock source, actually on my system there are clock
sources like jiffies, refined_jiffies, hpet, tsc, pit. Why tsc?

Thanks
Baoquan

> 
> > Because the lapic and timer are not ready when dump-capture waits them
> > to update the jiffies value. so I suggest to put APIC in legacy mode in
> > local_apic_timer_interrupt() temporarily, which in the bootup path of
> > dump kernel. 
> > 
> > diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> > index dcb52850a28f..af3be93997ed 100644
> > --- a/arch/x86/kernel/apic/apic.c
> > +++ b/arch/x86/kernel/apic/apic.c
> > @@ -879,6 +879,7 @@ static void local_apic_timer_interrupt(void)
> >                 pr_warning("Spurious LAPIC timer interrupt on cpu %d\n",
> > cpu);
> >                 /* Switch it off */
> >                 lapic_timer_shutdown(evt);
> > +             disable_IO_APIC();
> >                 return;
> >         }
> > 
> > And the new solution can fix the problem.
> > What‘s your opinion about it?
> > 
> > Thanks,
> > wei
> > 
> > > 
> > > Thanks,
> > > 
> > > 	Ingo
> > > 
> > > 
> > 
> 
> 
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-07-13  9:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-07 10:17 [PATCH v2] kexec: Fix kdump failure with notsc Wei Jiangang
2016-07-07 17:55 ` Eric W. Biederman
2016-07-08  4:48   ` Wei, Jiangang
2016-07-08  7:38   ` Ingo Molnar
2016-07-11 10:30     ` Wei, Jiangang
2016-07-13  7:46       ` Wei, Jiangang
2016-07-13  9:05         ` Baoquan He
2016-07-08  8:21 ` Nikolay Borisov
2016-07-12  6:52 ` Xunlei Pang
2016-07-12  8:21   ` Baoquan He
2016-07-12  9:10     ` Wei, Jiangang
2016-07-12  9:09   ` Wei, Jiangang
2016-07-12  9:29     ` Baoquan He

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).