* perf hw in kexeced kernel broken in tip
@ 2010-12-01 8:00 Yinghai Lu
2010-12-01 11:27 ` Peter Zijlstra
0 siblings, 1 reply; 42+ messages in thread
From: Yinghai Lu @ 2010-12-01 8:00 UTC (permalink / raw)
To: Ingo Molnar, Jason Wessel, Peter Zijlstra, Peter Zijlstra,
Don Zickus
Cc: linux-kernel@vger.kernel.org
First kernel:
[ 1.139418] calling init_hw_perf_events+0x0/0xb77 @ 1
[ 1.159111] Performance Events: PEBS fmt1+, Nehalem events, Intel PMU
driver.
[ 1.159567] ... version: 3
[ 1.179121] ... bit width: 48
[ 1.179353] ... generic registers: 4
[ 1.179593] ... value mask: 0000ffffffffffff
[ 1.199211] ... max period: 000000007fffffff
[ 1.199554] ... fixed-purpose events: 3
[ 1.219108] ... event mask: 000000070000000f
[ 1.219454] initcall init_hw_perf_events+0x0/0xb77 returned 0 after
11719 usecs
.....
[ 20.220997] checking TSC synchronization [CPU#0 -> CPU#11]: passed.
[ 20.260818] NMI watchdog enabled, takes one hw-pmu counter.
kexeced kernel.
[ 1.169470] calling init_hw_perf_events+0x0/0xb77 @ 1
[ 1.189265] Performance Events: PEBS fmt1+, Nehalem events, Broken
PMU hardware detected, software events only.
...
[ 21.010407] NMI watchdog failed to create perf event on cpu14:
fffffffffffffffe
caused by:
commit 33c6d6a7ad0ffab9b1b15f8e4107a2af072a05a0
Author: Don Zickus <dzickus@redhat.com>
Date: Mon Nov 22 16:55:23 2010 -0500
x86, perf, nmi: Disable perf if counters are not accessible
In a kvm virt guests, the perf counters are not emulated. Instead they
return zero on a rdmsrl. The perf nmi handler uses the fact that
crossing
a zero means the counter overflowed (for those counters that do not have
specific interrupt bits). Therefore on kvm guests, perf will swallow all
NMIs thinking the counters overflowed.
This causes problems for subsystems like kgdb which needs NMIs to do its
magic. This problem was discovered by running kgdb tests.
The solution is to write garbage into a perf counter during the
initialization and hopefully reading back the same number. On kvm
guests, the value will be read back as zero and we disable perf as
a result.
Reported-by: Jason Wessel <jason.wessel@windriver.com>
Patch-inspired-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <1290462923-30734-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
diff --git a/arch/x86/kernel/cpu/perf_event.c
b/arch/x86/kernel/cpu/perf_event.c
index ed63101..6d75b91 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -381,6 +381,20 @@ static void release_pmc_hardware(void) {}
#endif
+static bool check_hw_exists(void)
+{
+ u64 val, val_new = 0;
+ int ret = 0;
+
+ val = 0xabcdUL;
+ ret |= checking_wrmsrl(x86_pmu.perfctr, val);
+ ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new);
+ if (ret || val != val_new)
+ return false;
+
+ return true;
+}
+
static void reserve_ds_buffers(void);
static void release_ds_buffers(void);
@@ -1372,6 +1386,12 @@ void __init init_hw_perf_events(void)
pmu_check_apic();
+ /* sanity check that the hardware exists or is emulated */
+ if (!check_hw_exists()) {
+ pr_cont("Broken PMU hardware detected, software events
only.\n");
+ return;
+ }
+
pr_cont("%s PMU driver.\n", x86_pmu.name);
if (x86_pmu.quirks)
^ permalink raw reply related [flat|nested] 42+ messages in thread* Re: perf hw in kexeced kernel broken in tip 2010-12-01 8:00 perf hw in kexeced kernel broken in tip Yinghai Lu @ 2010-12-01 11:27 ` Peter Zijlstra 2010-12-01 16:06 ` Vivek Goyal 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-01 11:27 UTC (permalink / raw) To: Yinghai Lu Cc: Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Vivek Goyal, Haren Myneni On Wed, 2010-12-01 at 00:00 -0800, Yinghai Lu wrote: > First kernel: > [ 1.139418] calling init_hw_perf_events+0x0/0xb77 @ 1 > [ 1.159111] Performance Events: PEBS fmt1+, Nehalem events, Intel PMU > driver. > [ 1.159567] ... version: 3 > [ 1.179121] ... bit width: 48 > [ 1.179353] ... generic registers: 4 > [ 1.179593] ... value mask: 0000ffffffffffff > [ 1.199211] ... max period: 000000007fffffff > [ 1.199554] ... fixed-purpose events: 3 > [ 1.219108] ... event mask: 000000070000000f > [ 1.219454] initcall init_hw_perf_events+0x0/0xb77 returned 0 after > 11719 usecs > > ..... > [ 20.220997] checking TSC synchronization [CPU#0 -> CPU#11]: passed. > [ 20.260818] NMI watchdog enabled, takes one hw-pmu counter. > > kexeced kernel. > > > [ 1.169470] calling init_hw_perf_events+0x0/0xb77 @ 1 > [ 1.189265] Performance Events: PEBS fmt1+, Nehalem events, Broken > PMU hardware detected, software events only. > ... > [ 21.010407] NMI watchdog failed to create perf event on cpu14: > fffffffffffffffe > > caused by: *sigh*, and people ask me why kexec/kdump are such bad ideas.. apparently kexec doesn't properly shut down the first kernel and leaves a counter running, then when we write and read the counter value they don't match because its still running and voila, crap happens. I've CC'ed the kexec people, maybe they got clue as to how to sort this. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 11:27 ` Peter Zijlstra @ 2010-12-01 16:06 ` Vivek Goyal 2010-12-01 16:11 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Vivek Goyal @ 2010-12-01 16:06 UTC (permalink / raw) To: Peter Zijlstra Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 01, 2010 at 12:27:47PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-01 at 00:00 -0800, Yinghai Lu wrote: > > First kernel: > > [ 1.139418] calling init_hw_perf_events+0x0/0xb77 @ 1 > > [ 1.159111] Performance Events: PEBS fmt1+, Nehalem events, Intel PMU > > driver. > > [ 1.159567] ... version: 3 > > [ 1.179121] ... bit width: 48 > > [ 1.179353] ... generic registers: 4 > > [ 1.179593] ... value mask: 0000ffffffffffff > > [ 1.199211] ... max period: 000000007fffffff > > [ 1.199554] ... fixed-purpose events: 3 > > [ 1.219108] ... event mask: 000000070000000f > > [ 1.219454] initcall init_hw_perf_events+0x0/0xb77 returned 0 after > > 11719 usecs > > > > ..... > > [ 20.220997] checking TSC synchronization [CPU#0 -> CPU#11]: passed. > > [ 20.260818] NMI watchdog enabled, takes one hw-pmu counter. > > > > kexeced kernel. > > > > > > [ 1.169470] calling init_hw_perf_events+0x0/0xb77 @ 1 > > [ 1.189265] Performance Events: PEBS fmt1+, Nehalem events, Broken > > PMU hardware detected, software events only. > > ... > > [ 21.010407] NMI watchdog failed to create perf event on cpu14: > > fffffffffffffffe > > > > caused by: > > *sigh*, and people ask me why kexec/kdump are such bad ideas.. > > apparently kexec doesn't properly shut down the first kernel and leaves > a counter running, then when we write and read the counter value they > don't match because its still running and voila, crap happens. > > I've CC'ed the kexec people, maybe they got clue as to how to sort this. So we can shutdown counters while first kernel is going down. Is there a simple function already which I can call? Thanks Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 16:06 ` Vivek Goyal @ 2010-12-01 16:11 ` Peter Zijlstra 2010-12-01 16:23 ` Vivek Goyal 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-01 16:11 UTC (permalink / raw) To: Vivek Goyal Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-01 at 11:06 -0500, Vivek Goyal wrote: > On Wed, Dec 01, 2010 at 12:27:47PM +0100, Peter Zijlstra wrote: > > On Wed, 2010-12-01 at 00:00 -0800, Yinghai Lu wrote: > > > First kernel: > > > [ 1.139418] calling init_hw_perf_events+0x0/0xb77 @ 1 > > > [ 1.159111] Performance Events: PEBS fmt1+, Nehalem events, Intel PMU > > > driver. > > > [ 1.159567] ... version: 3 > > > [ 1.179121] ... bit width: 48 > > > [ 1.179353] ... generic registers: 4 > > > [ 1.179593] ... value mask: 0000ffffffffffff > > > [ 1.199211] ... max period: 000000007fffffff > > > [ 1.199554] ... fixed-purpose events: 3 > > > [ 1.219108] ... event mask: 000000070000000f > > > [ 1.219454] initcall init_hw_perf_events+0x0/0xb77 returned 0 after > > > 11719 usecs > > > > > > ..... > > > [ 20.220997] checking TSC synchronization [CPU#0 -> CPU#11]: passed. > > > [ 20.260818] NMI watchdog enabled, takes one hw-pmu counter. > > > > > > kexeced kernel. > > > > > > > > > [ 1.169470] calling init_hw_perf_events+0x0/0xb77 @ 1 > > > [ 1.189265] Performance Events: PEBS fmt1+, Nehalem events, Broken > > > PMU hardware detected, software events only. > > > ... > > > [ 21.010407] NMI watchdog failed to create perf event on cpu14: > > > fffffffffffffffe > > > > > > caused by: > > > > *sigh*, and people ask me why kexec/kdump are such bad ideas.. > > > > apparently kexec doesn't properly shut down the first kernel and leaves > > a counter running, then when we write and read the counter value they > > don't match because its still running and voila, crap happens. > > > > I've CC'ed the kexec people, maybe they got clue as to how to sort this. > > So we can shutdown counters while first kernel is going down. Is there a > simple function already which I can call? Dunno, the cpu hotplug stuff should suffice I think, but then I don't think you actually unplug the boot cpu. What does kexec normally do to ensure hardware is left in a sane state? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 16:11 ` Peter Zijlstra @ 2010-12-01 16:23 ` Vivek Goyal 2010-12-01 19:38 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Vivek Goyal @ 2010-12-01 16:23 UTC (permalink / raw) To: Peter Zijlstra Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 01, 2010 at 05:11:46PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-01 at 11:06 -0500, Vivek Goyal wrote: > > On Wed, Dec 01, 2010 at 12:27:47PM +0100, Peter Zijlstra wrote: > > > On Wed, 2010-12-01 at 00:00 -0800, Yinghai Lu wrote: > > > > First kernel: > > > > [ 1.139418] calling init_hw_perf_events+0x0/0xb77 @ 1 > > > > [ 1.159111] Performance Events: PEBS fmt1+, Nehalem events, Intel PMU > > > > driver. > > > > [ 1.159567] ... version: 3 > > > > [ 1.179121] ... bit width: 48 > > > > [ 1.179353] ... generic registers: 4 > > > > [ 1.179593] ... value mask: 0000ffffffffffff > > > > [ 1.199211] ... max period: 000000007fffffff > > > > [ 1.199554] ... fixed-purpose events: 3 > > > > [ 1.219108] ... event mask: 000000070000000f > > > > [ 1.219454] initcall init_hw_perf_events+0x0/0xb77 returned 0 after > > > > 11719 usecs > > > > > > > > ..... > > > > [ 20.220997] checking TSC synchronization [CPU#0 -> CPU#11]: passed. > > > > [ 20.260818] NMI watchdog enabled, takes one hw-pmu counter. > > > > > > > > kexeced kernel. > > > > > > > > > > > > [ 1.169470] calling init_hw_perf_events+0x0/0xb77 @ 1 > > > > [ 1.189265] Performance Events: PEBS fmt1+, Nehalem events, Broken > > > > PMU hardware detected, software events only. > > > > ... > > > > [ 21.010407] NMI watchdog failed to create perf event on cpu14: > > > > fffffffffffffffe > > > > > > > > caused by: > > > > > > *sigh*, and people ask me why kexec/kdump are such bad ideas.. > > > > > > apparently kexec doesn't properly shut down the first kernel and leaves > > > a counter running, then when we write and read the counter value they > > > don't match because its still running and voila, crap happens. > > > > > > I've CC'ed the kexec people, maybe they got clue as to how to sort this. > > > > So we can shutdown counters while first kernel is going down. Is there a > > simple function already which I can call? > > Dunno, the cpu hotplug stuff should suffice I think, but then I don't > think you actually unplug the boot cpu. > > What does kexec normally do to ensure hardware is left in a sane state? Typically calls device_shutdown() and sysdev_shutdown() from kernel_restart_prepare() to shutdown the devices. Also calls machine_shutdown() which depending on architecture can take care of various things like stopping other cpus, shutting down LAPIC, disabling IOAPIC, disabling hpet, shutting down IOMMU etc (native_machine_shutdown()). Thanks Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 16:23 ` Vivek Goyal @ 2010-12-01 19:38 ` Peter Zijlstra 2010-12-01 19:46 ` Vivek Goyal 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-01 19:38 UTC (permalink / raw) To: Vivek Goyal Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-01 at 11:23 -0500, Vivek Goyal wrote: > > What does kexec normally do to ensure hardware is left in a sane state? > > Typically calls device_shutdown() and sysdev_shutdown() from > kernel_restart_prepare() to shutdown the devices. > > Also calls machine_shutdown() which depending on architecture can take > care of various things like stopping other cpus, shutting down LAPIC, > disabling IOAPIC, disabling hpet, shutting down IOMMU etc > (native_machine_shutdown()). So basically there's no sane generic reset callout? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 19:38 ` Peter Zijlstra @ 2010-12-01 19:46 ` Vivek Goyal 2010-12-01 19:49 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Vivek Goyal @ 2010-12-01 19:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni, Eric W. Biederman On Wed, Dec 01, 2010 at 08:38:12PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-01 at 11:23 -0500, Vivek Goyal wrote: > > > What does kexec normally do to ensure hardware is left in a sane state? > > > > Typically calls device_shutdown() and sysdev_shutdown() from > > kernel_restart_prepare() to shutdown the devices. > > > > Also calls machine_shutdown() which depending on architecture can take > > care of various things like stopping other cpus, shutting down LAPIC, > > disabling IOAPIC, disabling hpet, shutting down IOMMU etc > > (native_machine_shutdown()). > > So basically there's no sane generic reset callout? I think ->shutdown() calls are sane generic callouts. Isn't it? There seem to be few exceptions for LAPIC, IOMMU and HPET and I am not sure why they are not covered by shutdown calls. CCing Eric, he might have more insight into it. Thanks Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 19:46 ` Vivek Goyal @ 2010-12-01 19:49 ` Peter Zijlstra 2010-12-01 19:58 ` Vivek Goyal 2010-12-01 20:41 ` Eric W. Biederman 0 siblings, 2 replies; 42+ messages in thread From: Peter Zijlstra @ 2010-12-01 19:49 UTC (permalink / raw) To: Vivek Goyal Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni, Eric W. Biederman On Wed, 2010-12-01 at 14:46 -0500, Vivek Goyal wrote: > On Wed, Dec 01, 2010 at 08:38:12PM +0100, Peter Zijlstra wrote: > > On Wed, 2010-12-01 at 11:23 -0500, Vivek Goyal wrote: > > > > What does kexec normally do to ensure hardware is left in a sane state? > > > > > > Typically calls device_shutdown() and sysdev_shutdown() from > > > kernel_restart_prepare() to shutdown the devices. > > > > > > Also calls machine_shutdown() which depending on architecture can take > > > care of various things like stopping other cpus, shutting down LAPIC, > > > disabling IOAPIC, disabling hpet, shutting down IOMMU etc > > > (native_machine_shutdown()). > > > > So basically there's no sane generic reset callout? > > I think ->shutdown() calls are sane generic callouts. Isn't it? ->shutdown looks like it's about to reset/halt the hardware, no point in slowing down the regular shutdown/reboot path for something like this, we know the hardware will get reset to a sane state. > There seem to be few exceptions for LAPIC, IOMMU and HPET and I am not > sure why they are not covered by shutdown calls. CCing Eric, he might > have more insight into it. That's all arch specific, but even there I don't think the reset code should live outside of kexec. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 19:49 ` Peter Zijlstra @ 2010-12-01 19:58 ` Vivek Goyal 2010-12-01 20:07 ` Peter Zijlstra 2010-12-01 20:41 ` Eric W. Biederman 1 sibling, 1 reply; 42+ messages in thread From: Vivek Goyal @ 2010-12-01 19:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni, Eric W. Biederman On Wed, Dec 01, 2010 at 08:49:49PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-01 at 14:46 -0500, Vivek Goyal wrote: > > On Wed, Dec 01, 2010 at 08:38:12PM +0100, Peter Zijlstra wrote: > > > On Wed, 2010-12-01 at 11:23 -0500, Vivek Goyal wrote: > > > > > What does kexec normally do to ensure hardware is left in a sane state? > > > > > > > > Typically calls device_shutdown() and sysdev_shutdown() from > > > > kernel_restart_prepare() to shutdown the devices. > > > > > > > > Also calls machine_shutdown() which depending on architecture can take > > > > care of various things like stopping other cpus, shutting down LAPIC, > > > > disabling IOAPIC, disabling hpet, shutting down IOMMU etc > > > > (native_machine_shutdown()). > > > > > > So basically there's no sane generic reset callout? > > > > I think ->shutdown() calls are sane generic callouts. Isn't it? > > ->shutdown looks like it's about to reset/halt the hardware, no point in > slowing down the regular shutdown/reboot path for something like this, > we know the hardware will get reset to a sane state. I think we already call ->shutdown() in regular reboot path. kernel_restart() kernel_restart_prepare() device_shutdown(); sysdev_shutdown(); So it should not make lot of difference if perf subsystem/counters are also shutdown using ->shutdown(). > > > There seem to be few exceptions for LAPIC, IOMMU and HPET and I am not > > sure why they are not covered by shutdown calls. CCing Eric, he might > > have more insight into it. > > That's all arch specific, but even there I don't think the reset code > should live outside of kexec. I would not know the history but I have heard stories that if you don't shutdown the hardware over restart, BIOS might not be expecting it and might get trumped. Thanks Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 19:58 ` Vivek Goyal @ 2010-12-01 20:07 ` Peter Zijlstra 2010-12-01 21:48 ` Eric W. Biederman 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-01 20:07 UTC (permalink / raw) To: Vivek Goyal Cc: Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni, Eric W. Biederman On Wed, 2010-12-01 at 14:58 -0500, Vivek Goyal wrote: > On Wed, Dec 01, 2010 at 08:49:49PM +0100, Peter Zijlstra wrote: > > On Wed, 2010-12-01 at 14:46 -0500, Vivek Goyal wrote: > > > On Wed, Dec 01, 2010 at 08:38:12PM +0100, Peter Zijlstra wrote: > > > > On Wed, 2010-12-01 at 11:23 -0500, Vivek Goyal wrote: > > > > > > What does kexec normally do to ensure hardware is left in a sane state? > > > > > > > > > > Typically calls device_shutdown() and sysdev_shutdown() from > > > > > kernel_restart_prepare() to shutdown the devices. > > > > > > > > > > Also calls machine_shutdown() which depending on architecture can take > > > > > care of various things like stopping other cpus, shutting down LAPIC, > > > > > disabling IOAPIC, disabling hpet, shutting down IOMMU etc > > > > > (native_machine_shutdown()). > > > > > > > > So basically there's no sane generic reset callout? > > > > > > I think ->shutdown() calls are sane generic callouts. Isn't it? > > > > ->shutdown looks like it's about to reset/halt the hardware, no point in > > slowing down the regular shutdown/reboot path for something like this, > > we know the hardware will get reset to a sane state. > > I think we already call ->shutdown() in regular reboot path. > > kernel_restart() > kernel_restart_prepare() > device_shutdown(); > sysdev_shutdown(); > > So it should not make lot of difference if perf subsystem/counters are > also shutdown using ->shutdown(). Oh, but I'm not a device or sysdev thing, I'll never get something like that. > > > > > There seem to be few exceptions for LAPIC, IOMMU and HPET and I am not > > > sure why they are not covered by shutdown calls. CCing Eric, he might > > > have more insight into it. > > > > That's all arch specific, but even there I don't think the reset code > > should live outside of kexec. > > I would not know the history but I have heard stories that if you don't > shutdown the hardware over restart, BIOS might not be expecting it and > might get trumped. Never yet had a problem with that. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 20:07 ` Peter Zijlstra @ 2010-12-01 21:48 ` Eric W. Biederman 2010-12-02 5:23 ` Don Zickus 0 siblings, 1 reply; 42+ messages in thread From: Eric W. Biederman @ 2010-12-01 21:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni Peter Zijlstra <peterz@infradead.org> writes: > On Wed, 2010-12-01 at 14:58 -0500, Vivek Goyal wrote: >> On Wed, Dec 01, 2010 at 08:49:49PM +0100, Peter Zijlstra wrote: >> > On Wed, 2010-12-01 at 14:46 -0500, Vivek Goyal wrote: >> > > On Wed, Dec 01, 2010 at 08:38:12PM +0100, Peter Zijlstra wrote: >> > > > On Wed, 2010-12-01 at 11:23 -0500, Vivek Goyal wrote: >> > > > > > What does kexec normally do to ensure hardware is left in a sane state? >> > > > > >> > > > > Typically calls device_shutdown() and sysdev_shutdown() from >> > > > > kernel_restart_prepare() to shutdown the devices. >> > > > > >> > > > > Also calls machine_shutdown() which depending on architecture can take >> > > > > care of various things like stopping other cpus, shutting down LAPIC, >> > > > > disabling IOAPIC, disabling hpet, shutting down IOMMU etc >> > > > > (native_machine_shutdown()). >> > > > >> > > > So basically there's no sane generic reset callout? >> > > >> > > I think ->shutdown() calls are sane generic callouts. Isn't it? >> > >> > ->shutdown looks like it's about to reset/halt the hardware, no point in >> > slowing down the regular shutdown/reboot path for something like this, >> > we know the hardware will get reset to a sane state. >> >> I think we already call ->shutdown() in regular reboot path. >> >> kernel_restart() >> kernel_restart_prepare() >> device_shutdown(); >> sysdev_shutdown(); >> >> So it should not make lot of difference if perf subsystem/counters are >> also shutdown using ->shutdown(). > > Oh, but I'm not a device or sysdev thing, I'll never get something like > that. There is also the reboot notifier, if the NMI needs to be controlled outside of device model. Sigh. The NMI handling is such a special case. >> > > There seem to be few exceptions for LAPIC, IOMMU and HPET and I am not >> > > sure why they are not covered by shutdown calls. CCing Eric, he might >> > > have more insight into it. >> > >> > That's all arch specific, but even there I don't think the reset code >> > should live outside of kexec. >> >> I would not know the history but I have heard stories that if you don't >> shutdown the hardware over restart, BIOS might not be expecting it and >> might get trumped. > > Never yet had a problem with that. I haven't personally but I have certainly heard stories and seen debugging sessions where some devices work or don't depending on the order of running linux and windows on a machine, with soft reboots in between. Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 21:48 ` Eric W. Biederman @ 2010-12-02 5:23 ` Don Zickus 2010-12-02 7:34 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Don Zickus @ 2010-12-02 5:23 UTC (permalink / raw) To: Eric W. Biederman Cc: Peter Zijlstra, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 01, 2010 at 01:48:07PM -0800, Eric W. Biederman wrote: > > > > Oh, but I'm not a device or sysdev thing, I'll never get something like > > that. > > There is also the reboot notifier, if the NMI needs to be controlled > outside of device model. Sigh. The NMI handling is such a special case. I tried reboot notifiers with the nmi_watchdog and acheived some success (on a Westmere box, a P4 still failed). Kdump is still screwed, but maybe we don't care for now. Here is the quick and dirty patch I used. Cheers, Don diff --git a/kernel/watchdog.c b/kernel/watchdog.c index 792a4ed..3455cf9 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -23,6 +23,7 @@ #include <linux/notifier.h> #include <linux/module.h> #include <linux/sysctl.h> +#include <linux/reboot.h> #include <asm/irq_regs.h> #include <linux/perf_event.h> @@ -550,6 +551,18 @@ static struct notifier_block __cpuinitdata cpu_nfb = { .notifier_call = cpu_callback }; +static int __cpuinit +reboot_callback(struct notifier_block *nfb, unsigned long action, void *unused) +{ + watchdog_disable_all_cpus(); + + return notifier_from_errno(0); +} + +static struct notifier_block __cpuinitdata reboot_nfb = { + .notifier_call = reboot_callback +}; + void __init lockup_detector_init(void) { void *cpu = (void *)(long)smp_processor_id(); @@ -563,6 +576,7 @@ void __init lockup_detector_init(void) cpu_callback(&cpu_nfb, CPU_ONLINE, cpu); register_cpu_notifier(&cpu_nfb); + register_reboot_notifier(&reboot_nfb); return; } ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-02 5:23 ` Don Zickus @ 2010-12-02 7:34 ` Peter Zijlstra 2010-12-02 16:15 ` Don Zickus 2010-12-07 21:16 ` Don Zickus 0 siblings, 2 replies; 42+ messages in thread From: Peter Zijlstra @ 2010-12-02 7:34 UTC (permalink / raw) To: Don Zickus Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Thu, 2010-12-02 at 00:23 -0500, Don Zickus wrote: > On Wed, Dec 01, 2010 at 01:48:07PM -0800, Eric W. Biederman wrote: > > > > > > Oh, but I'm not a device or sysdev thing, I'll never get something like > > > that. > > > > There is also the reboot notifier, if the NMI needs to be controlled > > outside of device model. Sigh. The NMI handling is such a special case. > > I tried reboot notifiers with the nmi_watchdog and acheived some success > (on a Westmere box, a P4 still failed). Kdump is still screwed, but maybe > we don't care for now. > > Here is the quick and dirty patch I used. > diff --git a/kernel/watchdog.c b/kernel/watchdog.c > index 792a4ed..3455cf9 100644 > --- a/kernel/watchdog.c > +++ b/kernel/watchdog.c > @@ -23,6 +23,7 @@ > #include <linux/notifier.h> > #include <linux/module.h> > #include <linux/sysctl.h> > +#include <linux/reboot.h> > > #include <asm/irq_regs.h> > #include <linux/perf_event.h> > @@ -550,6 +551,18 @@ static struct notifier_block __cpuinitdata cpu_nfb = { > .notifier_call = cpu_callback > }; > > +static int __cpuinit > +reboot_callback(struct notifier_block *nfb, unsigned long action, void *unused) > +{ > + watchdog_disable_all_cpus(); > + > + return notifier_from_errno(0); > +} > + > +static struct notifier_block __cpuinitdata reboot_nfb = { > + .notifier_call = reboot_callback > +}; > + > void __init lockup_detector_init(void) > { > void *cpu = (void *)(long)smp_processor_id(); > @@ -563,6 +576,7 @@ void __init lockup_detector_init(void) > > cpu_callback(&cpu_nfb, CPU_ONLINE, cpu); > register_cpu_notifier(&cpu_nfb); > + register_reboot_notifier(&reboot_nfb); > > return; > } We'd really want a perf_event.c callback there to do as the hot-unplug code does and detach all running counters from the cpu. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-02 7:34 ` Peter Zijlstra @ 2010-12-02 16:15 ` Don Zickus 2010-12-07 23:30 ` Peter Zijlstra 2010-12-07 21:16 ` Don Zickus 1 sibling, 1 reply; 42+ messages in thread From: Don Zickus @ 2010-12-02 16:15 UTC (permalink / raw) To: Peter Zijlstra Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Thu, Dec 02, 2010 at 08:34:30AM +0100, Peter Zijlstra wrote: > On Thu, 2010-12-02 at 00:23 -0500, Don Zickus wrote: > > On Wed, Dec 01, 2010 at 01:48:07PM -0800, Eric W. Biederman wrote: > > > > > > > > Oh, but I'm not a device or sysdev thing, I'll never get something like > > > > that. > > > > > > There is also the reboot notifier, if the NMI needs to be controlled > > > outside of device model. Sigh. The NMI handling is such a special case. > > > > I tried reboot notifiers with the nmi_watchdog and acheived some success > > (on a Westmere box, a P4 still failed). Kdump is still screwed, but maybe > > we don't care for now. > > > > Here is the quick and dirty patch I used. > > > We'd really want a perf_event.c callback there to do as the hot-unplug > code does and detach all running counters from the cpu. Ok, I moved the reboot notifier stuff from kernel/watchdog.c to kernel/perf_event.c. Things still worked fine from a kexec perspective. Vivek suggested to me this morning that I should just blantantly disable the perf counter during init when running my test. Looking through the code I don't think I can do this using disable_all because some routines look for the active bit to be set and some arches have different disable registers than others. Thoughts? Cheers, Don ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-02 16:15 ` Don Zickus @ 2010-12-07 23:30 ` Peter Zijlstra 2010-12-08 14:01 ` Don Zickus 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-07 23:30 UTC (permalink / raw) To: Don Zickus Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Thu, 2010-12-02 at 11:15 -0500, Don Zickus wrote: > Vivek suggested to me this morning that I should just blantantly disable the > perf counter during init when running my test. Nah, we should actively scan for that during the bring-up and kill hw-perf when we find an enable bit set, some BIOSes actively use the PMU, this is something that should be discouraged. --- arch/x86/kernel/cpu/perf_event.c | 30 +++++++++++++++++++++++++++--- 1 files changed, 27 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 817d2b1..7f92833 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -375,15 +375,40 @@ static void release_pmc_hardware(void) {} static bool check_hw_exists(void) { u64 val, val_new = 0; - int ret = 0; + int i, reg, ret = 0; val = 0xabcdUL; ret |= checking_wrmsrl(x86_pmu.perfctr, val); ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); - if (ret || val != val_new) + if (ret || val != val_new) { + printk(KERN_CONT "Broken PMU hardware detected, software events only.\n"); return false; + } + + /* + * Check to see if the BIOS enabled any of the counters, if so + * complain and bail. + */ + for (i = 0; i < x86_pmu.num_counters; i++) { + reg = x86_pmu.eventsel + i; + rdmsrl(reg, val); + if (val & ARCH_PERFMON_EVENTSEL_ENABLE) + goto bios_fail; + } + + for (i = 0; i < x86_pmu.num_counters_fixed; i++) { + reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; + rdmsrl(reg, val); + if (val & (0x03 << i*4)) + goto bios_fail; + } return true; + +bios_fail: + printk(KERN_CONT "Broken BIOS detected, software events only.\n"); + printk(KERN_ERR FW_BUG "invalid MSR %x=%Lx\n", reg, val); + return false; } static void reserve_ds_buffers(void); @@ -1379,7 +1404,6 @@ int __init init_hw_perf_events(void) /* sanity check that the hardware exists or is emulated */ if (!check_hw_exists()) { - pr_cont("Broken PMU hardware detected, software events only.\n"); return 0; } > Looking through the code I > don't think I can do this using disable_all because some routines look for > the active bit to be set and some arches have different disable registers > than others. Thoughts? Something like the below, preferably I'd key that off of SYS_KEXEC, but looking through the existing notifiers adding a state requires touching all of them :/ --- kernel/perf_event.c | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/kernel/perf_event.c b/kernel/perf_event.c index 195393c..7abcd8d 100644 --- a/kernel/perf_event.c +++ b/kernel/perf_event.c @@ -21,6 +21,7 @@ #include <linux/dcache.h> #include <linux/percpu.h> #include <linux/ptrace.h> +#include <linux/reboot.h> #include <linux/vmstat.h> #include <linux/vmalloc.h> #include <linux/hardirq.h> @@ -6383,6 +6384,25 @@ static void perf_event_exit_cpu(int cpu) static inline void perf_event_exit_cpu(int cpu) { } #endif +static int +perf_reboot(struct notifier_block *notifier, unsigned long val, void *v) +{ + /* + * XXX this relies on hotplug, does kexec do too? + */ + perf_event_exit_cpu(0); + return NOTIFY_OK; +} + +/* + * Run the perf reboot notifier at the very last possible moment so that + * the generic watchdog code runs as long as possible. + */ +static struct notifier_block perf_reboot_notifier = { + .notifier_call = perf_reboot, + .priority = INT_MIN, +}; + static int __cpuinit perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { @@ -6418,6 +6438,7 @@ void __init perf_event_init(void) perf_pmu_register(&perf_task_clock); perf_tp_register(); perf_cpu_notifier(perf_cpu_notify); + register_reboot_notifier(&perf_reboot_notifier); ret = init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-07 23:30 ` Peter Zijlstra @ 2010-12-08 14:01 ` Don Zickus 2010-12-08 14:20 ` Peter Zijlstra ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Don Zickus @ 2010-12-08 14:01 UTC (permalink / raw) To: Peter Zijlstra Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 12:30:20AM +0100, Peter Zijlstra wrote: > On Thu, 2010-12-02 at 11:15 -0500, Don Zickus wrote: > > > Vivek suggested to me this morning that I should just blantantly disable the > > perf counter during init when running my test. > > Nah, we should actively scan for that during the bring-up and kill > hw-perf when we find an enable bit set, some BIOSes actively use the > PMU, this is something that should be discouraged. Ok, the reboot notifier addresses the kexec problem but doesn't fix it though (I have to test to confirm that, comments below). The bios check should catch those situations (ironically I stumbled upon a machine with this problem, so I will test your patch with it, though it only uses perf counter 0). The kdump problem will still exist, not sure if we care and perhaps we should document in the changelog that we know kdump is still broken (unless we do care). > > --- > arch/x86/kernel/cpu/perf_event.c | 30 +++++++++++++++++++++++++++--- > 1 files changed, 27 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c > index 817d2b1..7f92833 100644 > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -375,15 +375,40 @@ static void release_pmc_hardware(void) {} > static bool check_hw_exists(void) > { > u64 val, val_new = 0; > - int ret = 0; > + int i, reg, ret = 0; > > val = 0xabcdUL; > ret |= checking_wrmsrl(x86_pmu.perfctr, val); > ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); > - if (ret || val != val_new) > + if (ret || val != val_new) { > + printk(KERN_CONT "Broken PMU hardware detected, software events only.\n"); > return false; > + } > + > + /* > + * Check to see if the BIOS enabled any of the counters, if so > + * complain and bail. > + */ > + for (i = 0; i < x86_pmu.num_counters; i++) { > + reg = x86_pmu.eventsel + i; > + rdmsrl(reg, val); > + if (val & ARCH_PERFMON_EVENTSEL_ENABLE) > + goto bios_fail; > + } > + > + for (i = 0; i < x86_pmu.num_counters_fixed; i++) { > + reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; > + rdmsrl(reg, val); > + if (val & (0x03 << i*4)) > + goto bios_fail; > + } I wonder if you should reverse these checks. If the bios has the perf counter enabled, there might be a high chance that it fails the first check and never gets to the actually bios checks. > > return true; > + > +bios_fail: > + printk(KERN_CONT "Broken BIOS detected, software events only.\n"); > + printk(KERN_ERR FW_BUG "invalid MSR %x=%Lx\n", reg, val); > + return false; > } > > static void reserve_ds_buffers(void); > @@ -1379,7 +1404,6 @@ int __init init_hw_perf_events(void) > > /* sanity check that the hardware exists or is emulated */ > if (!check_hw_exists()) { > - pr_cont("Broken PMU hardware detected, software events only.\n"); > return 0; > } nitpick - you can probably remove the curly braces, no? > > > > > > Looking through the code I > > don't think I can do this using disable_all because some routines look for > > the active bit to be set and some arches have different disable registers > > than others. Thoughts? > > Something like the below, preferably I'd key that off of SYS_KEXEC, but > looking through the existing notifiers adding a state requires touching > all of them :/ > > --- > kernel/perf_event.c | 21 +++++++++++++++++++++ > 1 files changed, 21 insertions(+), 0 deletions(-) > > diff --git a/kernel/perf_event.c b/kernel/perf_event.c > index 195393c..7abcd8d 100644 > --- a/kernel/perf_event.c > +++ b/kernel/perf_event.c > @@ -21,6 +21,7 @@ > #include <linux/dcache.h> > #include <linux/percpu.h> > #include <linux/ptrace.h> > +#include <linux/reboot.h> > #include <linux/vmstat.h> > #include <linux/vmalloc.h> > #include <linux/hardirq.h> > @@ -6383,6 +6384,25 @@ static void perf_event_exit_cpu(int cpu) > static inline void perf_event_exit_cpu(int cpu) { } > #endif > > +static int > +perf_reboot(struct notifier_block *notifier, unsigned long val, void *v) > +{ > + /* > + * XXX this relies on hotplug, does kexec do too? > + */ > + perf_event_exit_cpu(0); > + return NOTIFY_OK; Ok, so this shuts down the perf counters on cpu0, but the other cpus are still running and will fail your new bios check, no? Privately, I used the above wrapped with for_each_online_cpu(cpu) and it worked fine for me. Cheers, Don ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:01 ` Don Zickus @ 2010-12-08 14:20 ` Peter Zijlstra 2010-12-08 14:42 ` Vivek Goyal 2010-12-08 14:59 ` Peter Zijlstra 2010-12-08 14:33 ` Peter Zijlstra 2010-12-08 14:39 ` Vivek Goyal 2 siblings, 2 replies; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 14:20 UTC (permalink / raw) To: Don Zickus Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-08 at 09:01 -0500, Don Zickus wrote: > On Wed, Dec 08, 2010 at 12:30:20AM +0100, Peter Zijlstra wrote: > > On Thu, 2010-12-02 at 11:15 -0500, Don Zickus wrote: > > > > > Vivek suggested to me this morning that I should just blantantly disable the > > > perf counter during init when running my test. > > > > Nah, we should actively scan for that during the bring-up and kill > > hw-perf when we find an enable bit set, some BIOSes actively use the > > PMU, this is something that should be discouraged. > > Ok, the reboot notifier addresses the kexec problem but doesn't fix it > though (I have to test to confirm that, comments below). > The bios check > should catch those situations (ironically I stumbled upon a machine with > this problem, so I will test your patch with it, though it only uses perf > counter 0). Right, they usually only steal one or two counters, but the fact that they're using them at all is insane and should be punished. > The kdump problem will still exist, not sure if we care and > perhaps we should document in the changelog that we know kdump is still > broken (unless we do care). You mean even if we cure the kexec reboot notifier patch thing kdump is still borken? > > --- > > arch/x86/kernel/cpu/perf_event.c | 30 +++++++++++++++++++++++++++--- > > 1 files changed, 27 insertions(+), 3 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c > > index 817d2b1..7f92833 100644 > > --- a/arch/x86/kernel/cpu/perf_event.c > > +++ b/arch/x86/kernel/cpu/perf_event.c > > @@ -375,15 +375,40 @@ static void release_pmc_hardware(void) {} > > static bool check_hw_exists(void) > > { > > u64 val, val_new = 0; > > - int ret = 0; > > + int i, reg, ret = 0; > > > > val = 0xabcdUL; > > ret |= checking_wrmsrl(x86_pmu.perfctr, val); > > ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); > > - if (ret || val != val_new) > > + if (ret || val != val_new) { > > + printk(KERN_CONT "Broken PMU hardware detected, software events only.\n"); > > return false; > > + } > > + > > + /* > > + * Check to see if the BIOS enabled any of the counters, if so > > + * complain and bail. > > + */ > > + for (i = 0; i < x86_pmu.num_counters; i++) { > > + reg = x86_pmu.eventsel + i; > > + rdmsrl(reg, val); > > + if (val & ARCH_PERFMON_EVENTSEL_ENABLE) > > + goto bios_fail; > > + } > > + > > + for (i = 0; i < x86_pmu.num_counters_fixed; i++) { > > + reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; > > + rdmsrl(reg, val); > > + if (val & (0x03 << i*4)) > > + goto bios_fail; > > + } > > I wonder if you should reverse these checks. If the bios has the perf > counter enabled, there might be a high chance that it fails the first > check and never gets to the actually bios checks. Ah, good point. > > > > return true; > > + > > +bios_fail: > > + printk(KERN_CONT "Broken BIOS detected, software events only.\n"); > > + printk(KERN_ERR FW_BUG "invalid MSR %x=%Lx\n", reg, val); > > + return false; > > } > > > > static void reserve_ds_buffers(void); > > @@ -1379,7 +1404,6 @@ int __init init_hw_perf_events(void) > > > > /* sanity check that the hardware exists or is emulated */ > > if (!check_hw_exists()) { > > - pr_cont("Broken PMU hardware detected, software events only.\n"); > > return 0; > > } > > nitpick - you can probably remove the curly braces, no? Quite so. > > @@ -6383,6 +6384,25 @@ static void perf_event_exit_cpu(int cpu) > > static inline void perf_event_exit_cpu(int cpu) { } > > #endif > > > > +static int > > +perf_reboot(struct notifier_block *notifier, unsigned long val, void *v) > > +{ > > + /* > > + * XXX this relies on hotplug, does kexec do too? > > + */ > > + perf_event_exit_cpu(0); > > + return NOTIFY_OK; > > Ok, so this shuts down the perf counters on cpu0, but the other cpus are > still running and will fail your new bios check, no? > > Privately, I used the above wrapped with for_each_online_cpu(cpu) and it > worked fine for me. Oh, so reboot doesn't actually stop the non-boot cpus? I was unsure of that (see my XXX there), so yeah, if it doesn't then I guess the for_each_possible_cpu() thing is the way out. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:20 ` Peter Zijlstra @ 2010-12-08 14:42 ` Vivek Goyal 2010-12-08 14:48 ` Peter Zijlstra 2010-12-08 14:59 ` Peter Zijlstra 1 sibling, 1 reply; 42+ messages in thread From: Vivek Goyal @ 2010-12-08 14:42 UTC (permalink / raw) To: Peter Zijlstra Cc: Don Zickus, Eric W. Biederman, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 03:20:05PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-08 at 09:01 -0500, Don Zickus wrote: > > On Wed, Dec 08, 2010 at 12:30:20AM +0100, Peter Zijlstra wrote: > > > On Thu, 2010-12-02 at 11:15 -0500, Don Zickus wrote: > > > > > > > Vivek suggested to me this morning that I should just blantantly disable the > > > > perf counter during init when running my test. > > > > > > Nah, we should actively scan for that during the bring-up and kill > > > hw-perf when we find an enable bit set, some BIOSes actively use the > > > PMU, this is something that should be discouraged. > > > > Ok, the reboot notifier addresses the kexec problem but doesn't fix it > > though (I have to test to confirm that, comments below). > > > > The bios check > > should catch those situations (ironically I stumbled upon a machine with > > this problem, so I will test your patch with it, though it only uses perf > > counter 0). > > Right, they usually only steal one or two counters, but the fact that > they're using them at all is insane and should be punished. > > > The kdump problem will still exist, not sure if we care and > > perhaps we should document in the changelog that we know kdump is still > > broken (unless we do care). > > You mean even if we cure the kexec reboot notifier patch thing kdump is > still borken? > Yes. reboot notifier notifications are not sent in kdump path. In this path we know kernel has crashed and we just try to do bare minimal things to boot into second kernel. If some hardware is left in inconsistent state we try to recover from that situation by resetting the device when second kernel is booting. Either driver itself can detect that device is in inconsistent state and reset it otherwise we also pass a command line parameter "reset_devices" to second kernel to explicitly tell kernel that devices might be in bad state, reset these during initialization. If we want to use these perf counters in kdump kernel, we shall have to do something similar. Thanks Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:42 ` Vivek Goyal @ 2010-12-08 14:48 ` Peter Zijlstra 2010-12-08 15:02 ` Vivek Goyal 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 14:48 UTC (permalink / raw) To: Vivek Goyal Cc: Don Zickus, Eric W. Biederman, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-08 at 09:42 -0500, Vivek Goyal wrote: > > Either driver itself can detect that device is in inconsistent state and > reset it otherwise we also pass a command line parameter "reset_devices" to > second kernel to explicitly tell kernel that devices might be in bad state, > reset these during initialization. If we want to use these perf counters in > kdump kernel, we shall have to do something similar. Right, so I'm perfectly fine with leaving the kdump kernel broken for now and if people really do need hardware events we can try and reset the hardware when we find that reset_devices command line parameter. Not sure how that interacts with these broken BIOSes, but its kdump so its mostly broken by design anyway ;-) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:48 ` Peter Zijlstra @ 2010-12-08 15:02 ` Vivek Goyal 2010-12-08 15:15 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Vivek Goyal @ 2010-12-08 15:02 UTC (permalink / raw) To: Peter Zijlstra Cc: Don Zickus, Eric W. Biederman, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 03:48:04PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-08 at 09:42 -0500, Vivek Goyal wrote: > > > > Either driver itself can detect that device is in inconsistent state and > > reset it otherwise we also pass a command line parameter "reset_devices" to > > second kernel to explicitly tell kernel that devices might be in bad state, > > reset these during initialization. If we want to use these perf counters in > > kdump kernel, we shall have to do something similar. > > Right, so I'm perfectly fine with leaving the kdump kernel broken for > now and if people really do need hardware events we can try and reset > the hardware when we find that reset_devices command line parameter. > > Not sure how that interacts with these broken BIOSes, reset_devices was meant to be dual purpose so that it can handle broken BIOSes also. So if BIOS is broken then one can pass "reset_devices" to kernel and driver can attempt to reset the device and boot the kernel. >but its kdump so > its mostly broken by design anyway ;-) Kdump has its share of problems especially with the fact that kernel/drivers find devices in bad state and are not hardened enough to deal with that. But on bare metal what's the better way of capturing kernel crash dump? Trying to do anything post crash in the kernel is also not very reliable either. I think the way we fix kernel for boot problems on newer hardware, for broken BIOses, we need to keep on fixing it in kdump path also to make sure new devices/drivers can cope up with this scenario. Thanks Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 15:02 ` Vivek Goyal @ 2010-12-08 15:15 ` Peter Zijlstra 2010-12-08 15:22 ` Vivek Goyal 2010-12-08 21:16 ` Eric W. Biederman 0 siblings, 2 replies; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 15:15 UTC (permalink / raw) To: Vivek Goyal Cc: Don Zickus, Eric W. Biederman, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-08 at 10:02 -0500, Vivek Goyal wrote: > >but its kdump so its mostly broken by design anyway ;-) > > Kdump has its share of problems especially with the fact that > kernel/drivers find devices in bad state and are not hardened enough > to deal with that. But on bare metal what's the better way of capturing > kernel crash dump? Trying to do anything post crash in the kernel is > also not very reliable either. /me <3 RS-232 I haven't found anything better than that... And poking at the RS-232 requires less of the kernel to be functional than booting into a new kernel (whose image might have been corrupted by the dying kernel, etc..) ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 15:15 ` Peter Zijlstra @ 2010-12-08 15:22 ` Vivek Goyal 2010-12-08 21:16 ` Eric W. Biederman 1 sibling, 0 replies; 42+ messages in thread From: Vivek Goyal @ 2010-12-08 15:22 UTC (permalink / raw) To: Peter Zijlstra Cc: Don Zickus, Eric W. Biederman, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 04:15:07PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-08 at 10:02 -0500, Vivek Goyal wrote: > > > >but its kdump so its mostly broken by design anyway ;-) > > > > Kdump has its share of problems especially with the fact that > > kernel/drivers find devices in bad state and are not hardened enough > > to deal with that. But on bare metal what's the better way of capturing > > kernel crash dump? Trying to do anything post crash in the kernel is > > also not very reliable either. > > /me <3 RS-232 > > I haven't found anything better than that... Serial is good for getting the oops out. But for the big vmcore? Secondly, people want the flexibility of sending the vmcore over various targets like over network to some remote server. Booting into second kernel opens up all those options and now one can do intelligent filtering and send the vmcore to any kind of destination. > > And poking at the RS-232 requires less of the kernel to be functional > than booting into a new kernel (whose image might have been corrupted by > the dying kernel, etc..) New kernel image being corrupted problem can be solved up to great extent by write protecting that memory location. So those who are happy with RS-232, they don't have to configure kdump. Just connect serial console and get the oops message out. Thanks Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 15:15 ` Peter Zijlstra 2010-12-08 15:22 ` Vivek Goyal @ 2010-12-08 21:16 ` Eric W. Biederman 1 sibling, 0 replies; 42+ messages in thread From: Eric W. Biederman @ 2010-12-08 21:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Vivek Goyal, Don Zickus, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni Peter Zijlstra <peterz@infradead.org> writes: > On Wed, 2010-12-08 at 10:02 -0500, Vivek Goyal wrote: > >> >but its kdump so its mostly broken by design anyway ;-) >> >> Kdump has its share of problems especially with the fact that >> kernel/drivers find devices in bad state and are not hardened enough >> to deal with that. But on bare metal what's the better way of capturing >> kernel crash dump? Trying to do anything post crash in the kernel is >> also not very reliable either. > > /me <3 RS-232 > > I haven't found anything better than that... True. But it can be a pain to operate RS-232 at production scale, or to convince customers to hook up RS-232 just in case your released software happens to crash. > And poking at the RS-232 requires less of the kernel to be functional > than booting into a new kernel (whose image might have been corrupted by > the dying kernel, etc..) For debugging a reproducible failure RS-232 wins. For everything else there is kdump. It sucks but it is at least fixable. And really the kdump kernel should be running a minimalistic hardware config so you only have to get the chunks of hardware you really care about working. As for corruption the kdump kernel lives in an area of memory that we never DMA to in the primary kernel, and we check a sha256 hash before we start booting the kdump kernel. In general kdump fails safe. That is if it can't makes things work it fails to boot and does nothing to your system. Definitely not perfect but if you don't have RS-232 it is the best I have seen. Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:20 ` Peter Zijlstra 2010-12-08 14:42 ` Vivek Goyal @ 2010-12-08 14:59 ` Peter Zijlstra 2010-12-08 18:43 ` Yinghai Lu ` (2 more replies) 1 sibling, 3 replies; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 14:59 UTC (permalink / raw) To: Don Zickus Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-08 at 15:20 +0100, Peter Zijlstra wrote: > > I wonder if you should reverse these checks. If the bios has the perf > > counter enabled, there might be a high chance that it fails the first > > check and never gets to the actually bios checks. > > Ah, good point. Something like so.. --- Subject: perf, x86: Detect broken BIOSes From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Wed Dec 08 15:56:23 CET 2010 Some BIOSes use PMU resources, this is a bug. Try to detect this, warn about it, and further refuse to touch the PMU ourselves. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c =================================================================== --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c +++ linux-2.6/arch/x86/kernel/cpu/perf_event.c @@ -375,15 +375,51 @@ static void release_pmc_hardware(void) { static bool check_hw_exists(void) { u64 val, val_new = 0; - int ret = 0; + int i, reg, ret = 0; + /* + * Check to see if the BIOS enabled any of the counters, if so + * complain and bail. + */ + for (i = 0; i < x86_pmu.num_counters; i++) { + reg = x86_pmu.eventsel + i; + ret = rdmsrl_safe(reg, &val); + if (ret) + goto msr_fail; + if (val & ARCH_PERFMON_EVENTSEL_ENABLE) + goto bios_fail; + } + + for (i = 0; i < x86_pmu.num_counters_fixed; i++) { + reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; + ret = rdmsrl_safe(reg, &val); + if (ret) + goto msr_fail; + if (val & (0x03 << i*4)) + goto bios_fail; + } + + /* + * Now write a value and read it back to see if it matches, + * this is needed to detect certain hardware emulators (qemu/kvm) + * that don't trap on the MSR access and always return 0s. + */ val = 0xabcdUL; - ret |= checking_wrmsrl(x86_pmu.perfctr, val); + ret = checking_wrmsrl(x86_pmu.perfctr, val); ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); if (ret || val != val_new) - return false; + goto msr_fail; return true; + +bios_fail: + printk(KERN_CONT "Broken BIOS detected, software events only.\n"); + printk(KERN_ERR FW_BUG "invalid MSR: %x=%Lx\n", reg, val); + return false; + +msr_fail: + printk(KERN_CONT "Broken PMU hardware detected, software events only.\n"); + return false; } static void reserve_ds_buffers(void); @@ -1378,10 +1414,8 @@ int __init init_hw_perf_events(void) pmu_check_apic(); /* sanity check that the hardware exists or is emulated */ - if (!check_hw_exists()) { - pr_cont("Broken PMU hardware detected, software events only.\n"); + if (!check_hw_exists()) return 0; - } pr_cont("%s PMU driver.\n", x86_pmu.name); ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:59 ` Peter Zijlstra @ 2010-12-08 18:43 ` Yinghai Lu 2010-12-08 19:01 ` Don Zickus 2010-12-08 19:06 ` Peter Zijlstra 2010-12-08 22:37 ` Don Zickus 2010-12-09 20:20 ` Don Zickus 2 siblings, 2 replies; 42+ messages in thread From: Yinghai Lu @ 2010-12-08 18:43 UTC (permalink / raw) To: Peter Zijlstra Cc: Don Zickus, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On 12/08/2010 06:59 AM, Peter Zijlstra wrote: > On Wed, 2010-12-08 at 15:20 +0100, Peter Zijlstra wrote: > >>> I wonder if you should reverse these checks. If the bios has the perf >>> counter enabled, there might be a high chance that it fails the first >>> check and never gets to the actually bios checks. >> >> Ah, good point. > > Something like so.. > > --- > Subject: perf, x86: Detect broken BIOSes > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Wed Dec 08 15:56:23 CET 2010 > > Some BIOSes use PMU resources, this is a bug. > > Try to detect this, warn about it, and further refuse to touch the > PMU ourselves. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c > +++ linux-2.6/arch/x86/kernel/cpu/perf_event.c > @@ -375,15 +375,51 @@ static void release_pmc_hardware(void) { > static bool check_hw_exists(void) > { > u64 val, val_new = 0; > - int ret = 0; > + int i, reg, ret = 0; > > + /* > + * Check to see if the BIOS enabled any of the counters, if so > + * complain and bail. > + */ > + for (i = 0; i < x86_pmu.num_counters; i++) { > + reg = x86_pmu.eventsel + i; > + ret = rdmsrl_safe(reg, &val); > + if (ret) > + goto msr_fail; > + if (val & ARCH_PERFMON_EVENTSEL_ENABLE) > + goto bios_fail; > + } > + > + for (i = 0; i < x86_pmu.num_counters_fixed; i++) { > + reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; > + ret = rdmsrl_safe(reg, &val); > + if (ret) > + goto msr_fail; > + if (val & (0x03 << i*4)) > + goto bios_fail; > + } > + > + /* > + * Now write a value and read it back to see if it matches, > + * this is needed to detect certain hardware emulators (qemu/kvm) > + * that don't trap on the MSR access and always return 0s. > + */ > val = 0xabcdUL; > - ret |= checking_wrmsrl(x86_pmu.perfctr, val); > + ret = checking_wrmsrl(x86_pmu.perfctr, val); > ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); > if (ret || val != val_new) > - return false; > + goto msr_fail; > > return true; > + > +bios_fail: > + printk(KERN_CONT "Broken BIOS detected, software events only.\n"); > + printk(KERN_ERR FW_BUG "invalid MSR: %x=%Lx\n", reg, val); > + return false; > + > +msr_fail: > + printk(KERN_CONT "Broken PMU hardware detected, software events only.\n"); > + return false; > } can you add sth force_... in command line to take over ownership of perf from BIOS or previous kernel ? then still can use perf etc after we kexec from RHEL or SLES kernel to later kernel ( from 2.6.37) Thanks Yinghai > > static void reserve_ds_buffers(void); > @@ -1378,10 +1414,8 @@ int __init init_hw_perf_events(void) > pmu_check_apic(); > > /* sanity check that the hardware exists or is emulated */ > - if (!check_hw_exists()) { > - pr_cont("Broken PMU hardware detected, software events only.\n"); > + if (!check_hw_exists()) > return 0; > - } > > pr_cont("%s PMU driver.\n", x86_pmu.name); > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 18:43 ` Yinghai Lu @ 2010-12-08 19:01 ` Don Zickus 2010-12-08 19:05 ` Yinghai Lu 2010-12-08 19:06 ` Peter Zijlstra 1 sibling, 1 reply; 42+ messages in thread From: Don Zickus @ 2010-12-08 19:01 UTC (permalink / raw) To: Yinghai Lu Cc: Peter Zijlstra, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 10:43:14AM -0800, Yinghai Lu wrote: > > can you add sth force_... in command line to take over ownership of perf from BIOS or previous kernel ? > > then still can use perf etc after we kexec from RHEL or SLES kernel to later kernel ( from 2.6.37) My understand is that you can't because the BIOS is actively using it behind the scenes of the kernel (well during an SMI). I have a machine where I tried to force take it but it still stopped triggering interrupts. Cheers, Don ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 19:01 ` Don Zickus @ 2010-12-08 19:05 ` Yinghai Lu 2010-12-08 19:17 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Yinghai Lu @ 2010-12-08 19:05 UTC (permalink / raw) To: Don Zickus Cc: Peter Zijlstra, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On 12/08/2010 11:01 AM, Don Zickus wrote: > On Wed, Dec 08, 2010 at 10:43:14AM -0800, Yinghai Lu wrote: >> >> can you add sth force_... in command line to take over ownership of perf from BIOS or previous kernel ? >> >> then still can use perf etc after we kexec from RHEL or SLES kernel to later kernel ( from 2.6.37) > > My understand is that you can't because the BIOS is actively using it > behind the scenes of the kernel (well during an SMI). I have a machine > where I tried to force take it but it still stopped triggering interrupts. how about second case: kexec from RHEL 6 stock kernel to upstream kernel ? Thanks Yinghai ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 19:05 ` Yinghai Lu @ 2010-12-08 19:17 ` Peter Zijlstra 2010-12-08 19:20 ` Yinghai Lu 0 siblings, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 19:17 UTC (permalink / raw) To: Yinghai Lu Cc: Don Zickus, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-08 at 11:05 -0800, Yinghai Lu wrote: > On 12/08/2010 11:01 AM, Don Zickus wrote: > > On Wed, Dec 08, 2010 at 10:43:14AM -0800, Yinghai Lu wrote: > >> > >> can you add sth force_... in command line to take over ownership of perf from BIOS or previous kernel ? > >> > >> then still can use perf etc after we kexec from RHEL or SLES kernel to later kernel ( from 2.6.37) > > > > My understand is that you can't because the BIOS is actively using it > > behind the scenes of the kernel (well during an SMI). I have a machine > > where I tried to force take it but it still stopped triggering interrupts. > > how about second case: kexec from RHEL 6 stock kernel to upstream kernel ? Its impossible to distinguish between a BIOS having claimed a counter and a previous kernel not having shut things down properly. The best we can do is allow a force parameter and let the user keep all pieces when he uses it. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 19:17 ` Peter Zijlstra @ 2010-12-08 19:20 ` Yinghai Lu 0 siblings, 0 replies; 42+ messages in thread From: Yinghai Lu @ 2010-12-08 19:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Don Zickus, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On 12/08/2010 11:17 AM, Peter Zijlstra wrote: > On Wed, 2010-12-08 at 11:05 -0800, Yinghai Lu wrote: >> On 12/08/2010 11:01 AM, Don Zickus wrote: >>> On Wed, Dec 08, 2010 at 10:43:14AM -0800, Yinghai Lu wrote: >>>> >>>> can you add sth force_... in command line to take over ownership of perf from BIOS or previous kernel ? >>>> >>>> then still can use perf etc after we kexec from RHEL or SLES kernel to later kernel ( from 2.6.37) >>> >>> My understand is that you can't because the BIOS is actively using it >>> behind the scenes of the kernel (well during an SMI). I have a machine >>> where I tried to force take it but it still stopped triggering interrupts. >> >> how about second case: kexec from RHEL 6 stock kernel to upstream kernel ? > > Its impossible to distinguish between a BIOS having claimed a counter > and a previous kernel not having shut things down properly. > > The best we can do is allow a force parameter and let the user keep all > pieces when he uses it. ok, thanks. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 18:43 ` Yinghai Lu 2010-12-08 19:01 ` Don Zickus @ 2010-12-08 19:06 ` Peter Zijlstra 2010-12-08 19:20 ` Yinghai Lu 1 sibling, 1 reply; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 19:06 UTC (permalink / raw) To: Yinghai Lu Cc: Don Zickus, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-08 at 10:43 -0800, Yinghai Lu wrote: > can you add sth force_... in command line to take over ownership of > perf from BIOS or previous kernel ? The problem is, you cannot steal the thing from the BIOS, you'll trample on its settings and the next time it runs it will simply re-instate it. And aside from probing the EN bit on boot there is no way of determining this. So forcing the state might get you an ill-functioning system. > then still can use perf etc after we kexec from RHEL or SLES kernel to > later kernel ( from 2.6.37) I'm not sure why people would do that, but yeah I guess we can do something like that. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 19:06 ` Peter Zijlstra @ 2010-12-08 19:20 ` Yinghai Lu 0 siblings, 0 replies; 42+ messages in thread From: Yinghai Lu @ 2010-12-08 19:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Don Zickus, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On 12/08/2010 11:06 AM, Peter Zijlstra wrote: >> then still can use perf etc after we kexec from RHEL or SLES kernel to >> later kernel ( from 2.6.37) > > I'm not sure why people would do that, but yeah I guess we can do > something like that. more problem: system with linuxbios and have kernel in flash as bootloader. they may kexec to final production kernel. and they may need to update that embedded kernel to shutdown perf.... Yinghai ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:59 ` Peter Zijlstra 2010-12-08 18:43 ` Yinghai Lu @ 2010-12-08 22:37 ` Don Zickus 2010-12-08 23:20 ` Eric W. Biederman 2010-12-09 20:20 ` Don Zickus 2 siblings, 1 reply; 42+ messages in thread From: Don Zickus @ 2010-12-08 22:37 UTC (permalink / raw) To: Peter Zijlstra Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 03:59:16PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-08 at 15:20 +0100, Peter Zijlstra wrote: > > > > I wonder if you should reverse these checks. If the bios has the perf > > > counter enabled, there might be a high chance that it fails the first > > > check and never gets to the actually bios checks. > > > > Ah, good point. > > Something like so.. This seems to work correctly on my Nehalem and broken bios machines during boot and kexec. As expected it fails during kdump. My p4 box failed during kexec for some reason. But p4 has other issues. Cheers, Don > > --- > Subject: perf, x86: Detect broken BIOSes > From: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Wed Dec 08 15:56:23 CET 2010 > > Some BIOSes use PMU resources, this is a bug. > > Try to detect this, warn about it, and further refuse to touch the > PMU ourselves. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > --- > Index: linux-2.6/arch/x86/kernel/cpu/perf_event.c > =================================================================== > --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event.c > +++ linux-2.6/arch/x86/kernel/cpu/perf_event.c > @@ -375,15 +375,51 @@ static void release_pmc_hardware(void) { > static bool check_hw_exists(void) > { > u64 val, val_new = 0; > - int ret = 0; > + int i, reg, ret = 0; > > + /* > + * Check to see if the BIOS enabled any of the counters, if so > + * complain and bail. > + */ > + for (i = 0; i < x86_pmu.num_counters; i++) { > + reg = x86_pmu.eventsel + i; > + ret = rdmsrl_safe(reg, &val); > + if (ret) > + goto msr_fail; > + if (val & ARCH_PERFMON_EVENTSEL_ENABLE) > + goto bios_fail; > + } > + > + for (i = 0; i < x86_pmu.num_counters_fixed; i++) { > + reg = MSR_ARCH_PERFMON_FIXED_CTR_CTRL; > + ret = rdmsrl_safe(reg, &val); > + if (ret) > + goto msr_fail; > + if (val & (0x03 << i*4)) > + goto bios_fail; > + } > + > + /* > + * Now write a value and read it back to see if it matches, > + * this is needed to detect certain hardware emulators (qemu/kvm) > + * that don't trap on the MSR access and always return 0s. > + */ > val = 0xabcdUL; > - ret |= checking_wrmsrl(x86_pmu.perfctr, val); > + ret = checking_wrmsrl(x86_pmu.perfctr, val); > ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); > if (ret || val != val_new) > - return false; > + goto msr_fail; > > return true; > + > +bios_fail: > + printk(KERN_CONT "Broken BIOS detected, software events only.\n"); > + printk(KERN_ERR FW_BUG "invalid MSR: %x=%Lx\n", reg, val); > + return false; > + > +msr_fail: > + printk(KERN_CONT "Broken PMU hardware detected, software events only.\n"); > + return false; > } > > static void reserve_ds_buffers(void); > @@ -1378,10 +1414,8 @@ int __init init_hw_perf_events(void) > pmu_check_apic(); > > /* sanity check that the hardware exists or is emulated */ > - if (!check_hw_exists()) { > - pr_cont("Broken PMU hardware detected, software events only.\n"); > + if (!check_hw_exists()) > return 0; > - } > > pr_cont("%s PMU driver.\n", x86_pmu.name); > > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 22:37 ` Don Zickus @ 2010-12-08 23:20 ` Eric W. Biederman 2010-12-09 4:34 ` Don Zickus 0 siblings, 1 reply; 42+ messages in thread From: Eric W. Biederman @ 2010-12-08 23:20 UTC (permalink / raw) To: Don Zickus Cc: Peter Zijlstra, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni Don Zickus <dzickus@redhat.com> writes: > On Wed, Dec 08, 2010 at 03:59:16PM +0100, Peter Zijlstra wrote: >> On Wed, 2010-12-08 at 15:20 +0100, Peter Zijlstra wrote: >> >> > > I wonder if you should reverse these checks. If the bios has the perf >> > > counter enabled, there might be a high chance that it fails the first >> > > check and never gets to the actually bios checks. >> > >> > Ah, good point. >> >> Something like so.. > > This seems to work correctly on my Nehalem and broken bios machines during > boot and kexec. As expected it fails during kdump. My p4 box failed > during kexec for some reason. But p4 has other issues. Does the kdump kernel still boot? It looks like it should I just want to double check. Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 23:20 ` Eric W. Biederman @ 2010-12-09 4:34 ` Don Zickus 0 siblings, 0 replies; 42+ messages in thread From: Don Zickus @ 2010-12-09 4:34 UTC (permalink / raw) To: Eric W. Biederman Cc: Peter Zijlstra, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 03:20:37PM -0800, Eric W. Biederman wrote: > Don Zickus <dzickus@redhat.com> writes: > > > On Wed, Dec 08, 2010 at 03:59:16PM +0100, Peter Zijlstra wrote: > >> On Wed, 2010-12-08 at 15:20 +0100, Peter Zijlstra wrote: > >> > >> > > I wonder if you should reverse these checks. If the bios has the perf > >> > > counter enabled, there might be a high chance that it fails the first > >> > > check and never gets to the actually bios checks. > >> > > >> > Ah, good point. > >> > >> Something like so.. > > > > This seems to work correctly on my Nehalem and broken bios machines during > > boot and kexec. As expected it fails during kdump. My p4 box failed > > during kexec for some reason. But p4 has other issues. > > Does the kdump kernel still boot? > > It looks like it should I just want to double check. Yeah, sorry for not being clear. It definitely boots and does it thing. perf init (and thus nmi watchdog) fail with 'BIOS broken' because the perf counters were not shutdown prior to executing kdump. Cheers, Don ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:59 ` Peter Zijlstra 2010-12-08 18:43 ` Yinghai Lu 2010-12-08 22:37 ` Don Zickus @ 2010-12-09 20:20 ` Don Zickus 2010-12-09 20:44 ` Cyrill Gorcunov 2 siblings, 1 reply; 42+ messages in thread From: Don Zickus @ 2010-12-09 20:20 UTC (permalink / raw) To: Peter Zijlstra, gorcunov Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 03:59:16PM +0100, Peter Zijlstra wrote: > On Wed, 2010-12-08 at 15:20 +0100, Peter Zijlstra wrote: > > > > I wonder if you should reverse these checks. If the bios has the perf > > > counter enabled, there might be a high chance that it fails the first > > > check and never gets to the actually bios checks. > > > > Ah, good point. > > Something like so.. Getting closer... Pentium4s are special they need the double write, so... > + /* > + * Now write a value and read it back to see if it matches, > + * this is needed to detect certain hardware emulators (qemu/kvm) > + * that don't trap on the MSR access and always return 0s. > + */ > val = 0xabcdUL; > - ret |= checking_wrmsrl(x86_pmu.perfctr, val); > + ret = checking_wrmsrl(x86_pmu.perfctr, val); if (x86_pmu.perfctr_second_write) ret |= checking_wrmsrl(x86_pmu.perfctr, val); solved my p4 problems for kexec. > ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); > if (ret || val != val_new) > - return false; > + goto msr_fail; > > return true; Cheers, Don ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-09 20:20 ` Don Zickus @ 2010-12-09 20:44 ` Cyrill Gorcunov 0 siblings, 0 replies; 42+ messages in thread From: Cyrill Gorcunov @ 2010-12-09 20:44 UTC (permalink / raw) To: Don Zickus Cc: Peter Zijlstra, Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Thu, Dec 09, 2010 at 03:20:08PM -0500, Don Zickus wrote: ... > > Getting closer... > > Pentium4s are special they need the double write, so... > > > + /* > > + * Now write a value and read it back to see if it matches, > > + * this is needed to detect certain hardware emulators (qemu/kvm) > > + * that don't trap on the MSR access and always return 0s. > > + */ > > val = 0xabcdUL; > > - ret |= checking_wrmsrl(x86_pmu.perfctr, val); > > + ret = checking_wrmsrl(x86_pmu.perfctr, val); > > if (x86_pmu.perfctr_second_write) > ret |= checking_wrmsrl(x86_pmu.perfctr, val); > > solved my p4 problems for kexec. > ... yeah, thanks! would you push a patch upstream? Cyrill ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:01 ` Don Zickus 2010-12-08 14:20 ` Peter Zijlstra @ 2010-12-08 14:33 ` Peter Zijlstra 2010-12-08 14:39 ` Vivek Goyal 2 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 14:33 UTC (permalink / raw) To: Don Zickus Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, 2010-12-08 at 09:01 -0500, Don Zickus wrote: > Privately, I used the above wrapped with for_each_online_cpu(cpu) and it > worked fine for me. Something like so then? --- Subject: perf: Stop all counters on reboot From: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Wed Dec 08 15:29:02 CET 2010 Use the reboot notifier to detach all running counters on reboot, this solves a problem with kexec where the new kernel doesn't expect running counters (rightly so). It will however decrease the coverage of the NMI watchdog. Making a kexec specific reboot notifier callback would be best, however that would require touching all notifier callback handlers as they are not properly structured to deal with new state. As a compromise, place the perf reboot notifier at the very last position in the list. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> --- Index: linux-2.6/kernel/perf_event.c =================================================================== --- linux-2.6.orig/kernel/perf_event.c +++ linux-2.6/kernel/perf_event.c @@ -21,6 +21,7 @@ #include <linux/dcache.h> #include <linux/percpu.h> #include <linux/ptrace.h> +#include <linux/reboot.h> #include <linux/vmstat.h> #include <linux/vmalloc.h> #include <linux/hardirq.h> @@ -6329,7 +6330,7 @@ static void __cpuinit perf_event_init_cp mutex_unlock(&swhash->hlist_mutex); } -#ifdef CONFIG_HOTPLUG_CPU +#if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC static void perf_pmu_rotate_stop(struct pmu *pmu) { struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); @@ -6383,6 +6384,26 @@ static void perf_event_exit_cpu(int cpu) static inline void perf_event_exit_cpu(int cpu) { } #endif +static int +perf_reboot(struct notifier_block *notifier, unsigned long val, void *v) +{ + int cpu; + + for_each_online_cpu(cpu) + perf_event_exit_cpu(cpu); + + return NOTIFY_OK; +} + +/* + * Run the perf reboot notifier at the very last possible moment so that + * the generic watchdog code runs as long as possible. + */ +static struct notifier_block perf_reboot_notifier = { + .notifier_call = perf_reboot, + .priority = INT_MIN, +}; + static int __cpuinit perf_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu) { @@ -6418,6 +6439,7 @@ void __init perf_event_init(void) perf_pmu_register(&perf_task_clock); perf_tp_register(); perf_cpu_notifier(perf_cpu_notify); + register_reboot_notifier(&perf_reboot_notifier); ret = init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 14:01 ` Don Zickus 2010-12-08 14:20 ` Peter Zijlstra 2010-12-08 14:33 ` Peter Zijlstra @ 2010-12-08 14:39 ` Vivek Goyal 2 siblings, 0 replies; 42+ messages in thread From: Vivek Goyal @ 2010-12-08 14:39 UTC (permalink / raw) To: Don Zickus Cc: Peter Zijlstra, Eric W. Biederman, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Wed, Dec 08, 2010 at 09:01:03AM -0500, Don Zickus wrote: > On Wed, Dec 08, 2010 at 12:30:20AM +0100, Peter Zijlstra wrote: > > On Thu, 2010-12-02 at 11:15 -0500, Don Zickus wrote: > > > > > Vivek suggested to me this morning that I should just blantantly disable the > > > perf counter during init when running my test. > > > > Nah, we should actively scan for that during the bring-up and kill > > hw-perf when we find an enable bit set, some BIOSes actively use the > > PMU, this is something that should be discouraged. > > Ok, the reboot notifier addresses the kexec problem but doesn't fix it > though (I have to test to confirm that, comments below). The bios check > should catch those situations (ironically I stumbled upon a machine with > this problem, so I will test your patch with it, though it only uses perf > counter 0). The kdump problem will still exist, not sure if we care and > perhaps we should document in the changelog that we know kdump is still > broken (unless we do care). Can't think why would somebody like to use performance counters in kdump kernel. So that probably should not be a concern. Vivek ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-02 7:34 ` Peter Zijlstra 2010-12-02 16:15 ` Don Zickus @ 2010-12-07 21:16 ` Don Zickus 2010-12-08 0:26 ` Yinghai Lu 1 sibling, 1 reply; 42+ messages in thread From: Don Zickus @ 2010-12-07 21:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Eric W. Biederman, Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Thu, Dec 02, 2010 at 08:34:30AM +0100, Peter Zijlstra wrote: > > void __init lockup_detector_init(void) > > { > > void *cpu = (void *)(long)smp_processor_id(); > > @@ -563,6 +576,7 @@ void __init lockup_detector_init(void) > > > > cpu_callback(&cpu_nfb, CPU_ONLINE, cpu); > > register_cpu_notifier(&cpu_nfb); > > + register_reboot_notifier(&reboot_nfb); > > > > return; > > } > > We'd really want a perf_event.c callback there to do as the hot-unplug > code does and detach all running counters from the cpu. Ok, here is a simpler patch for now. --------------------------------8<-------- From: Don Zickus <dzickus@redhat.com> Date: Tue, 7 Dec 2010 16:06:59 -0500 Subject: [PATCH] perf: Use event select bits for hardware check The counter registers can continue to increment if left enabled across a kexec or a kdump. The makes the perf hardware check accidentally return false when the hardware really does exist. Change the check to use the first bits of event selection. Those bits should be safe as they are used to program the type of events to use. And more importantly, they won't increment across kexec/kdump. Signed-off-by: Don Zickus <dzickus@redhat.com> --- arch/x86/kernel/cpu/perf_event.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index 7b91396..7d869c0 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -377,10 +377,10 @@ static bool check_hw_exists(void) u64 val, val_new = 0; int ret = 0; - val = 0xabcdUL; - ret |= checking_wrmsrl(x86_pmu.perfctr, val); - ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); - if (ret || val != val_new) + val = 0xabUL; + ret |= checking_wrmsrl(x86_pmu.eventsel, val); + ret |= rdmsrl_safe(x86_pmu.eventsel, &val_new); + if (ret || val != (val_new & 0xFF)) return false; return true; -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-07 21:16 ` Don Zickus @ 2010-12-08 0:26 ` Yinghai Lu 2010-12-08 10:39 ` Peter Zijlstra 0 siblings, 1 reply; 42+ messages in thread From: Yinghai Lu @ 2010-12-08 0:26 UTC (permalink / raw) To: Don Zickus Cc: Peter Zijlstra, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On 12/07/2010 01:16 PM, Don Zickus wrote: > On Thu, Dec 02, 2010 at 08:34:30AM +0100, Peter Zijlstra wrote: >>> void __init lockup_detector_init(void) >>> { >>> void *cpu = (void *)(long)smp_processor_id(); >>> @@ -563,6 +576,7 @@ void __init lockup_detector_init(void) >>> >>> cpu_callback(&cpu_nfb, CPU_ONLINE, cpu); >>> register_cpu_notifier(&cpu_nfb); >>> + register_reboot_notifier(&reboot_nfb); >>> >>> return; >>> } >> >> We'd really want a perf_event.c callback there to do as the hot-unplug >> code does and detach all running counters from the cpu. > > Ok, here is a simpler patch for now. > > --------------------------------8<-------- > From: Don Zickus <dzickus@redhat.com> > Date: Tue, 7 Dec 2010 16:06:59 -0500 > Subject: [PATCH] perf: Use event select bits for hardware check > > The counter registers can continue to increment if left enabled > across a kexec or a kdump. The makes the perf hardware check > accidentally return false when the hardware really does exist. > > Change the check to use the first bits of event selection. Those > bits should be safe as they are used to program the type of events > to use. And more importantly, they won't increment across kexec/kdump. > > Signed-off-by: Don Zickus <dzickus@redhat.com> > --- > arch/x86/kernel/cpu/perf_event.c | 8 ++++---- > 1 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c > index 7b91396..7d869c0 100644 > --- a/arch/x86/kernel/cpu/perf_event.c > +++ b/arch/x86/kernel/cpu/perf_event.c > @@ -377,10 +377,10 @@ static bool check_hw_exists(void) > u64 val, val_new = 0; > int ret = 0; > > - val = 0xabcdUL; > - ret |= checking_wrmsrl(x86_pmu.perfctr, val); > - ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); > - if (ret || val != val_new) > + val = 0xabUL; > + ret |= checking_wrmsrl(x86_pmu.eventsel, val); > + ret |= rdmsrl_safe(x86_pmu.eventsel, &val_new); > + if (ret || val != (val_new & 0xFF)) > return false; > > return true; Thanks. it fixes the problem. Yinghai ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-08 0:26 ` Yinghai Lu @ 2010-12-08 10:39 ` Peter Zijlstra 0 siblings, 0 replies; 42+ messages in thread From: Peter Zijlstra @ 2010-12-08 10:39 UTC (permalink / raw) To: Yinghai Lu Cc: Don Zickus, Eric W. Biederman, Vivek Goyal, Ingo Molnar, Jason Wessel, linux-kernel@vger.kernel.org, Haren Myneni On Tue, 2010-12-07 at 16:26 -0800, Yinghai Lu wrote: > > +++ b/arch/x86/kernel/cpu/perf_event.c > > @@ -377,10 +377,10 @@ static bool check_hw_exists(void) > > u64 val, val_new = 0; > > int ret = 0; > > > > - val = 0xabcdUL; > > - ret |= checking_wrmsrl(x86_pmu.perfctr, val); > > - ret |= rdmsrl_safe(x86_pmu.perfctr, &val_new); > > - if (ret || val != val_new) > > + val = 0xabUL; > > + ret |= checking_wrmsrl(x86_pmu.eventsel, val); > > + ret |= rdmsrl_safe(x86_pmu.eventsel, &val_new); > > + if (ret || val != (val_new & 0xFF)) > > return false; > > > > return true; > > Thanks. it fixes the problem. > Won't merge it though, I think it stinks.. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: perf hw in kexeced kernel broken in tip 2010-12-01 19:49 ` Peter Zijlstra 2010-12-01 19:58 ` Vivek Goyal @ 2010-12-01 20:41 ` Eric W. Biederman 1 sibling, 0 replies; 42+ messages in thread From: Eric W. Biederman @ 2010-12-01 20:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Vivek Goyal, Yinghai Lu, Ingo Molnar, Jason Wessel, Don Zickus, linux-kernel@vger.kernel.org, Haren Myneni Peter Zijlstra <peterz@infradead.org> writes: > On Wed, 2010-12-01 at 14:46 -0500, Vivek Goyal wrote: >> On Wed, Dec 01, 2010 at 08:38:12PM +0100, Peter Zijlstra wrote: >> > On Wed, 2010-12-01 at 11:23 -0500, Vivek Goyal wrote: >> > > > What does kexec normally do to ensure hardware is left in a sane state? >> > > >> > > Typically calls device_shutdown() and sysdev_shutdown() from >> > > kernel_restart_prepare() to shutdown the devices. >> > > >> > > Also calls machine_shutdown() which depending on architecture can take >> > > care of various things like stopping other cpus, shutting down LAPIC, >> > > disabling IOAPIC, disabling hpet, shutting down IOMMU etc >> > > (native_machine_shutdown()). >> > >> > So basically there's no sane generic reset callout? >> >> I think ->shutdown() calls are sane generic callouts. Isn't it? > > ->shutdown looks like it's about to reset/halt the hardware, no point in > slowing down the regular shutdown/reboot path for something like this, > we know the hardware will get reset to a sane state. No you don't! Most BIOSen implement a board level reset there, but it isn't required. Just doing a software only reinitialization is allowed, and on some arches is the only thing you can do. Speed during reboot is not a reason to avoid anything. reboot is not a fast path, and we are talking about things in human tersm. The only argument I have heard that holds the least amount of sense is to keep what we do to a minimum, to increase the chances that we can do a reboot even after a kernel oops. All of that said. What insane start are we leaving the hardware in that we think it is going to be slow in human terms to remove? Eric ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2010-12-09 20:44 UTC | newest] Thread overview: 42+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-12-01 8:00 perf hw in kexeced kernel broken in tip Yinghai Lu 2010-12-01 11:27 ` Peter Zijlstra 2010-12-01 16:06 ` Vivek Goyal 2010-12-01 16:11 ` Peter Zijlstra 2010-12-01 16:23 ` Vivek Goyal 2010-12-01 19:38 ` Peter Zijlstra 2010-12-01 19:46 ` Vivek Goyal 2010-12-01 19:49 ` Peter Zijlstra 2010-12-01 19:58 ` Vivek Goyal 2010-12-01 20:07 ` Peter Zijlstra 2010-12-01 21:48 ` Eric W. Biederman 2010-12-02 5:23 ` Don Zickus 2010-12-02 7:34 ` Peter Zijlstra 2010-12-02 16:15 ` Don Zickus 2010-12-07 23:30 ` Peter Zijlstra 2010-12-08 14:01 ` Don Zickus 2010-12-08 14:20 ` Peter Zijlstra 2010-12-08 14:42 ` Vivek Goyal 2010-12-08 14:48 ` Peter Zijlstra 2010-12-08 15:02 ` Vivek Goyal 2010-12-08 15:15 ` Peter Zijlstra 2010-12-08 15:22 ` Vivek Goyal 2010-12-08 21:16 ` Eric W. Biederman 2010-12-08 14:59 ` Peter Zijlstra 2010-12-08 18:43 ` Yinghai Lu 2010-12-08 19:01 ` Don Zickus 2010-12-08 19:05 ` Yinghai Lu 2010-12-08 19:17 ` Peter Zijlstra 2010-12-08 19:20 ` Yinghai Lu 2010-12-08 19:06 ` Peter Zijlstra 2010-12-08 19:20 ` Yinghai Lu 2010-12-08 22:37 ` Don Zickus 2010-12-08 23:20 ` Eric W. Biederman 2010-12-09 4:34 ` Don Zickus 2010-12-09 20:20 ` Don Zickus 2010-12-09 20:44 ` Cyrill Gorcunov 2010-12-08 14:33 ` Peter Zijlstra 2010-12-08 14:39 ` Vivek Goyal 2010-12-07 21:16 ` Don Zickus 2010-12-08 0:26 ` Yinghai Lu 2010-12-08 10:39 ` Peter Zijlstra 2010-12-01 20:41 ` Eric W. Biederman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).