* [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
@ 2012-04-16 2:21 HATAYAMA Daisuke
2012-04-16 2:21 ` [PATCH 1/2] Introduce crash ipi helpers to wait for APs to stop HATAYAMA Daisuke
` (3 more replies)
0 siblings, 4 replies; 6+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16 2:21 UTC (permalink / raw)
To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi
Currently, booting up 2nd kernel with multiple CPUs fails in most
cases since it enters 2nd kernel with AP if the crash happens on the
AP. The problem is to signal startup IPI from AP to BSP. Typical
result of the operation I saw is the machine hanging during the 2nd
kernel boot.
To solve this issue, always enter 2nd kernel with BSP. To do this, I
modify logic for shooting down CPUs. I use simple existing logic only
in this mechanism, not complicating crash path to machine_kexec().
I did stress tests about 100 in total on the processors below:
Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz
Socket x 4, Core x 8, Thread x 16 (160 LCPUS in total)
Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz
Socket x 8, Core x 10, Thread x 20 (64 LCPUS in total)
* Motivation of enabling multiple CPUs on the 2nd kernel
This patch is aimed at doing parallel compression on the 2nd
kernel. The machine that has more than tera bytes memory requires
several hours to generate crash dump.
There are several ways to reduce generation time of crash time, but
they have different pros and cons:
Fast I/O devices
pros
- Can obtain high-speed stably
cons
- Big financial cost for good performance I/O devices. It's
difficult financially to prepare these for all environments as
dump devices.
Filtering
pros
- No financial cost.
- Large reduction of crash dump size
cons
- Some data is definitely lost. So, we cannot use this on some
situations:
1) High availability configuration where application triggers
OS to crash and users want to debug the application later by
retrieving the application's user process image from the
system's crash dump.
2) KVM virtualization configuration where KVM host machine
contains KVM guest machine images as user processes.
3) Page cache is needed for debugging filesystem related bugs.
Compression
pros
- No financial cost.
- No data lost.
cons
- Compression doesn't always reduce crash dump size.
- take heavy CPU time. Slow if CPU is weak in speed.
Machines with large memory tend to have a lot of CPUs. Parallel
compression is sutable for parallel processing. My goal is to make
compression as for free as possible.
* TODO
- Extend 512MB limit of reserved memory size for 2nd kernel for
multiple CPUs.
- Intel microcode patch loading on the 2nd kenrel is slow for the
2nd and later CPUs: about one or more minutes per one CPU.
- There are a limited number of irq vectors for TLB flush IPI on
x86_64: 32 for recent 3.x kernels and 8 for around 2.6.x
kernels. So compression doesn't scale if a lot of page reclaim
happens when reading kernel image larger than memory. Special
handling without page cache could be applicable to parallel dump
mechanism, but more investigation is needed.
---
HATAYAMA Daisuke (2):
Enter 2nd kernel with BSP
Introduce crash ipi helpers to wait for APs to stop
arch/x86/include/asm/reboot.h | 4 +++
arch/x86/kernel/crash.c | 15 +++++++++-
arch/x86/kernel/reboot.c | 63 +++++++++++++++++++++++++++++------------
3 files changed, 62 insertions(+), 20 deletions(-)
--
HATAYAMA Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] Introduce crash ipi helpers to wait for APs to stop
2012-04-16 2:21 [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs HATAYAMA Daisuke
@ 2012-04-16 2:21 ` HATAYAMA Daisuke
2012-04-16 2:21 ` [PATCH 2/2] Enter 2nd kernel with BSP HATAYAMA Daisuke
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16 2:21 UTC (permalink / raw)
To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi
Introduce crash ipi helpers to use them from BSP and AP sides in
common.
There's no logical change in this patch.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---
arch/x86/include/asm/reboot.h | 4 +++
arch/x86/kernel/reboot.c | 53 ++++++++++++++++++++++++++++++++---------
2 files changed, 45 insertions(+), 12 deletions(-)
diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 92f29706..2f8e9e7 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -26,4 +26,8 @@ void machine_real_restart(unsigned int type);
typedef void (*nmi_shootdown_cb)(int, struct pt_regs*);
void nmi_shootdown_cpus(nmi_shootdown_cb callback);
+void crash_ipi_init(void);
+void crash_ipi_dec_and_halt(void);
+void crash_ipi_wait_for_APs(void);
+
#endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index d840e69..6dd77a8 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -769,6 +769,31 @@ static nmi_shootdown_cb shootdown_callback;
static atomic_t waiting_for_crash_ipi;
+void crash_ipi_init(void)
+{
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+}
+
+void crash_ipi_dec_and_halt(void)
+{
+ atomic_dec(&waiting_for_crash_ipi);
+ /* Assume hlt works */
+ halt();
+ for (;;)
+ cpu_relax();
+}
+
+void crash_ipi_wait_for_APs(void)
+{
+ unsigned long msecs;
+
+ msecs = 1000; /* Wait at most a second for the other cpus to stop */
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
+ mdelay(1);
+ msecs--;
+ }
+}
+
static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
{
int cpu;
@@ -785,11 +810,7 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
shootdown_callback(cpu, regs);
- atomic_dec(&waiting_for_crash_ipi);
- /* Assume hlt works */
- halt();
- for (;;)
- cpu_relax();
+ crash_ipi_dec_and_halt();
return NMI_HANDLED;
}
@@ -807,7 +828,6 @@ static void smp_send_nmi_allbutself(void)
*/
void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
- unsigned long msecs;
local_irq_disable();
/* Make a note of crashing cpu. Will be used in NMI callback.*/
@@ -815,7 +835,8 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
shootdown_callback = callback;
- atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+ crash_ipi_init();
+
/* Would it be better to replace the trap vector here? */
if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
NMI_FLAG_FIRST, "crash"))
@@ -827,11 +848,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
smp_send_nmi_allbutself();
- msecs = 1000; /* Wait at most a second for the other cpus to stop */
- while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
- mdelay(1);
- msecs--;
- }
+ crash_ipi_wait_for_APs();
/* Leave the nmi callback set */
}
@@ -840,4 +857,16 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
/* No other CPUs to shoot down */
}
+
+void crash_ipi_init(void)
+{
+}
+
+void crash_ipi_dec_and_halt(void)
+{
+}
+
+void crash_ipi_wait_for_APs(void)
+{
+}
#endif
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] Enter 2nd kernel with BSP
2012-04-16 2:21 [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs HATAYAMA Daisuke
2012-04-16 2:21 ` [PATCH 1/2] Introduce crash ipi helpers to wait for APs to stop HATAYAMA Daisuke
@ 2012-04-16 2:21 ` HATAYAMA Daisuke
[not found] ` <beaa8ade-b7af-4e71-b4e0-a418ceb83f1e@email.android.com>
2013-04-18 11:41 ` Petr Tesarik
3 siblings, 0 replies; 6+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16 2:21 UTC (permalink / raw)
To: kexec, linux-kernel, ebiederm, vgoyal, kumagai-atsushi
Split logic into BSP's and AP's: BSP waits for AP halting.
Don't remove variable crashing_cpu for debugging use; useful for
determining one what CPU crash happens.
Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
---
arch/x86/kernel/crash.c | 15 ++++++++++++++-
arch/x86/kernel/reboot.c | 16 ++++++----------
2 files changed, 20 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..c5c19fa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -83,9 +83,14 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
* In practice this means shooting down the other cpus in
* an SMP system.
*/
+
+ int cpu;
+
/* The kernel is broken so disable interrupts */
local_irq_disable();
+ crash_ipi_init();
+
kdump_nmi_shootdown_cpus();
/* Booting kdump kernel with VMX or SVM enabled won't work,
@@ -102,5 +107,13 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
#ifdef CONFIG_HPET_TIMER
hpet_disable();
#endif
- crash_save_cpu(regs, safe_smp_processor_id());
+ cpu = safe_smp_processor_id();
+ crash_save_cpu(regs, cpu);
+
+ if (cpu_physical_id(cpu) == boot_cpu_physical_apicid) {
+ crash_ipi_wait_for_APs();
+ return;
+ }
+
+ crash_ipi_dec_and_halt();
}
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 6dd77a8..90354f9 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -7,6 +7,7 @@
#include <linux/sched.h>
#include <linux/tboot.h>
#include <linux/delay.h>
+#include <linux/kexec.h>
#include <acpi/reboot.h>
#include <asm/io.h>
#include <asm/apic.h>
@@ -800,16 +801,15 @@ static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
cpu = raw_smp_processor_id();
- /* Don't do anything if this handler is invoked on crashing cpu.
- * Otherwise, system will completely hang. Crashing cpu can get
- * an NMI if system was initially booted with nmi_watchdog parameter.
- */
- if (cpu == crashing_cpu)
- return NMI_HANDLED;
local_irq_disable();
shootdown_callback(cpu, regs);
+ if (cpu_physical_id(cpu) == boot_cpu_physical_apicid) {
+ crash_ipi_wait_for_APs();
+ machine_kexec(kexec_crash_image);
+ }
+
crash_ipi_dec_and_halt();
return NMI_HANDLED;
@@ -835,8 +835,6 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
shootdown_callback = callback;
- crash_ipi_init();
-
/* Would it be better to replace the trap vector here? */
if (register_nmi_handler(NMI_LOCAL, crash_nmi_callback,
NMI_FLAG_FIRST, "crash"))
@@ -848,8 +846,6 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
smp_send_nmi_allbutself();
- crash_ipi_wait_for_APs();
-
/* Leave the nmi callback set */
}
#else /* !CONFIG_SMP */
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
[not found] ` <beaa8ade-b7af-4e71-b4e0-a418ceb83f1e@email.android.com>
@ 2012-04-16 6:40 ` HATAYAMA Daisuke
0 siblings, 0 replies; 6+ messages in thread
From: HATAYAMA Daisuke @ 2012-04-16 6:40 UTC (permalink / raw)
To: ebiederm; +Cc: kumagai-atsushi, kexec, linux-kernel, vgoyal
From: "Eric W. Biederman" <ebiederm@xmission.com>
Subject: Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
Date: Sun, 15 Apr 2012 19:59:52 -0700
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
>>Currently, booting up 2nd kernel with multiple CPUs fails in most
>>cases since it enters 2nd kernel with AP if the crash happens on the
>>AP. The problem is to signal startup IPI from AP to BSP.
>
> If so, then we need to fix our code that sends startupIPIs.
> And perhaps the code that attempts to shutdown the other cpus.
>
Now maxcpus=1 is set at default, in which configuration, 2nd kernel
doesn't try to wake up secondary and later CPUs. So there's no startup
IPI on the 2nd kernel now.
> It is not ok to switch cpus during kdump (reducing the reliability) just so you can write crash dumps faster. Better would be to cope with secondary cpus not booting.
Even the current implememntation uses NMI to stop other CPUs. The
reliability you are concerned about here is the possibility where the
non-crashing BSP doesn't go into machine_kexec() due to some failures
of interrupt processing, right?
Alternative idea is:
1) try to go into 2nd kernel with BSP,
2) after some seconds, then try to go into 2nd kernel with crashing
CPU. Then, think of CPUs except for the crashing cpu as abnormal,
and use crashing cpu only on the 2nd kernel.
This seems as reliable as the current one.
>
> I do like the direction of pounding on things so we can get multiple cpus going in large configurations. Although I am surprised you are cpu bound and not disk bound in the time to write your crash dumps.
>
What do you indicate in the 1st sentence? I don't understand around
``pounding on thing'', sorry.
For the 2nd, it depends on data. If data is sparse enough, the data
size is significantly reduced, and so IO size is also reduced. If data
is randomized enough, compression takes much time and the data remains
the same size, resulting in cpu bound processing.
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
2012-04-16 2:21 [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs HATAYAMA Daisuke
` (2 preceding siblings ...)
[not found] ` <beaa8ade-b7af-4e71-b4e0-a418ceb83f1e@email.android.com>
@ 2013-04-18 11:41 ` Petr Tesarik
2013-04-19 8:45 ` HATAYAMA Daisuke
3 siblings, 1 reply; 6+ messages in thread
From: Petr Tesarik @ 2013-04-18 11:41 UTC (permalink / raw)
To: HATAYAMA Daisuke
Cc: Fenghua Yu, kexec, linux-kernel, kumagai-atsushi, ebiederm,
vgoyal
On Mon, 16 Apr 2012 11:21:28 +0900
HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
> Currently, booting up 2nd kernel with multiple CPUs fails in most
> cases since it enters 2nd kernel with AP if the crash happens on the
> AP. The problem is to signal startup IPI from AP to BSP. Typical
> result of the operation I saw is the machine hanging during the 2nd
> kernel boot.
>
> To solve this issue, always enter 2nd kernel with BSP. To do this, I
> modify logic for shooting down CPUs. I use simple existing logic only
> in this mechanism, not complicating crash path to machine_kexec().
These patches looked pretty good. I seem to recall that Fenghua (from
Intel) had an alternative solution for booting from AP. Unfortunately I
can't find his mails in my kexec mailbox...
Anyway, what's the latest upstream status?
Petr
> I did stress tests about 100 in total on the processors below:
>
> Intel(R) Xeon(R) CPU E7- 4820 @ 2.00GHz
> Socket x 4, Core x 8, Thread x 16 (160 LCPUS in total)
>
> Intel(R) Xeon(R) CPU E7- 8870 @ 2.40GHz
> Socket x 8, Core x 10, Thread x 20 (64 LCPUS in total)
>
> * Motivation of enabling multiple CPUs on the 2nd kernel
>
> This patch is aimed at doing parallel compression on the 2nd
> kernel. The machine that has more than tera bytes memory requires
> several hours to generate crash dump.
>
> There are several ways to reduce generation time of crash time, but
> they have different pros and cons:
>
> Fast I/O devices
> pros
> - Can obtain high-speed stably
> cons
> - Big financial cost for good performance I/O devices. It's
> difficult financially to prepare these for all environments as
> dump devices.
>
> Filtering
> pros
> - No financial cost.
> - Large reduction of crash dump size
>
> cons
> - Some data is definitely lost. So, we cannot use this on some
> situations:
>
> 1) High availability configuration where application triggers
> OS to crash and users want to debug the application later by
> retrieving the application's user process image from the
> system's crash dump.
>
> 2) KVM virtualization configuration where KVM host machine
> contains KVM guest machine images as user processes.
>
> 3) Page cache is needed for debugging filesystem related bugs.
>
> Compression
> pros
> - No financial cost.
> - No data lost.
>
> cons
> - Compression doesn't always reduce crash dump size.
> - take heavy CPU time. Slow if CPU is weak in speed.
>
> Machines with large memory tend to have a lot of CPUs. Parallel
> compression is sutable for parallel processing. My goal is to make
> compression as for free as possible.
>
> * TODO
>
> - Extend 512MB limit of reserved memory size for 2nd kernel for
> multiple CPUs.
>
> - Intel microcode patch loading on the 2nd kenrel is slow for the
> 2nd and later CPUs: about one or more minutes per one CPU.
>
> - There are a limited number of irq vectors for TLB flush IPI on
> x86_64: 32 for recent 3.x kernels and 8 for around 2.6.x
> kernels. So compression doesn't scale if a lot of page reclaim
> happens when reading kernel image larger than memory. Special
> handling without page cache could be applicable to parallel dump
> mechanism, but more investigation is needed.
>
> ---
>
> HATAYAMA Daisuke (2):
> Enter 2nd kernel with BSP
> Introduce crash ipi helpers to wait for APs to stop
>
>
> arch/x86/include/asm/reboot.h | 4 +++
> arch/x86/kernel/crash.c | 15 +++++++++-
> arch/x86/kernel/reboot.c | 63 +++++++++++++++++++++++++++++------------
> 3 files changed, 62 insertions(+), 20 deletions(-)
>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs
2013-04-18 11:41 ` Petr Tesarik
@ 2013-04-19 8:45 ` HATAYAMA Daisuke
0 siblings, 0 replies; 6+ messages in thread
From: HATAYAMA Daisuke @ 2013-04-19 8:45 UTC (permalink / raw)
To: Petr Tesarik
Cc: Fenghua Yu, kexec, linux-kernel, kumagai-atsushi, ebiederm,
vgoyal
(2013/04/18 20:41), Petr Tesarik wrote:
> On Mon, 16 Apr 2012 11:21:28 +0900
> HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> wrote:
>
>> Currently, booting up 2nd kernel with multiple CPUs fails in most
>> cases since it enters 2nd kernel with AP if the crash happens on the
>> AP. The problem is to signal startup IPI from AP to BSP. Typical
>> result of the operation I saw is the machine hanging during the 2nd
>> kernel boot.
>>
>> To solve this issue, always enter 2nd kernel with BSP. To do this, I
>> modify logic for shooting down CPUs. I use simple existing logic only
>> in this mechanism, not complicating crash path to machine_kexec().
>
> These patches looked pretty good. I seem to recall that Fenghua (from
> Intel) had an alternative solution for booting from AP. Unfortunately I
> can't find his mails in my kexec mailbox...
>
> Anyway, what's the latest upstream status?
It's still in experimental state.
The patch itself was nacked by Erick since switching the CPU that
entered 2nd kenrel through NMI reduced reliability of kdump.
At the discussion of my 2nd patch set that tried to reset BSP flag at
boot on the 2nd kernel, Erick suggested that BSP flag could be changed
at runtime and then behaviour when INIT was received varied and first we
should discuss how unsetting BSP flag affects system.
I'm now going in this direction and the patch I posted a month ago is:
[PATCH] x86, apic: Add unset_bsp parameter to unset BSP flag at boot time
https://lkml.org/lkml/2013/3/18/107
According to Fenghua, some kind of firmware assumes that BSP flag is
being kept throughout system is running. I have yet to see difference of
behaviour when unsetting BSP flag on top of the patch on my machine. I
think this is system dependent and it might be better to assign each
user to decide whether to unset BSP flag or not.
BTW, the work of software cpu hotplug for BSP by Fenghua is orthogonal
to my case. His work is for system including firmware that is affected
if BSP flag is unset and assumes healthy system that cpu#0 is always
BSP. On the other hand, our case is for crash kernel and we can no
longer assume cpu#0 is BSP and can no longer use NMI to wake up other
CPUs since we cannot use logic that depends on the state of CPUs
sleeping in the 1st kernel.
--
Thanks.
HATAYAMA, Daisuke
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-04-19 8:45 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-16 2:21 [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs HATAYAMA Daisuke
2012-04-16 2:21 ` [PATCH 1/2] Introduce crash ipi helpers to wait for APs to stop HATAYAMA Daisuke
2012-04-16 2:21 ` [PATCH 2/2] Enter 2nd kernel with BSP HATAYAMA Daisuke
[not found] ` <beaa8ade-b7af-4e71-b4e0-a418ceb83f1e@email.android.com>
2012-04-16 6:40 ` [PATCH 0/2] kdump: Enter 2nd kernel with BSP for enabling multiple CPUs HATAYAMA Daisuke
2013-04-18 11:41 ` Petr Tesarik
2013-04-19 8:45 ` HATAYAMA Daisuke
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox