* nmi_watchdog=2 - Oops with 2.6.8
@ 2004-08-25 1:42 Zarakin
2004-08-25 6:12 ` Philippe Elie
2004-08-25 16:01 ` Philippe Elie
0 siblings, 2 replies; 8+ messages in thread
From: Zarakin @ 2004-08-25 1:42 UTC (permalink / raw)
To: linux-kernel
Hi,
My Gentoo machine will not boot with nmi_watchdog=2 parameter - I get an
oops at clear_msr_range.
Handwritten oops Info:
CPU 0
EIP: 0060: [<0xc0110d4b>] Not tainted
EIP is at clear_msr_range+0x18/0x25
eax: 0 ebx:1f ecx: 3ba edx: 0
esi: 3a0 edi: 1a ebp:0 esp: d7d83f74
ds: 7b es: 7b ss: 68
Dump of assembler code for function clear_msr_range:
0xc0110d33 <clear_msr_range+0>: push %edi
0xc0110d34 <clear_msr_range+1>: xor %edi,%edi
0xc0110d36 <clear_msr_range+3>: push %esi
0xc0110d37 <clear_msr_range+4>: push %ebx
0xc0110d38 <clear_msr_range+5>: mov 0x14(%esp,1),%ebx
0xc0110d3c <clear_msr_range+9>: mov 0x10(%esp,1),%esi
0xc0110d40 <clear_msr_range+13>: cmp %ebx,%edi
0xc0110d42 <clear_msr_range+15>: jae 0xc0110d54
0xc0110d44 <clear_msr_range+17>: xor %eax,%eax
0xc0110d46 <clear_msr_range+19>: lea (%edi,%esi,1),%ecx
0xc0110d49 <clear_msr_range+22>: mov %eax,%edx
0xc0110d4b <clear_msr_range+24>: wrmsr
0xc0110d4d <clear_msr_range+26>: add $0x1,%edi
0xc0110d50 <clear_msr_range+29>: cmp %ebx,%edi
0xc0110d52 <clear_msr_range+31>: jb 0xc0110d46
0xc0110d54 <clear_msr_range+33>: pop %ebx
0xc0110d55 <clear_msr_range+34>: pop %esi
0xc0110d56 <clear_msr_range+35>: pop %edi
0xc0110d57 <clear_msr_range+36>: ret
HW Info:
* Shuttle ST61G4 Box -
http://www.shuttle.com/hq/product/barebone/specification.asp?B_id=28
* Chipsets: North bridge:ATI RS300 South bridge:ATI IXP150
* Intel P4 2.8E GHz (Prescott)
cat /proc/version:
Linux version 2.6.8-gentoo (root@tux) (gcc version 3.3.3 20040412 (Gentoo
Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)) #1 Mon Aug 23 21:09:40 PDT 2004
cat /proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 3
model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping : 3
cpu MHz : 2794.263
cache size : 1024 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 5
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor
ds_cpl cid
bogomips : 5554.17
Processor section from my .config
#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
CONFIG_MPENTIUM4=y
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
CONFIG_X86_GENERIC=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=7
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_SMP is not set
CONFIG_PREEMPT=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
CONFIG_X86_TSC=y
CONFIG_X86_MCE=y
CONFIG_X86_MCE_NONFATAL=y
CONFIG_X86_MCE_P4THERMAL=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
CONFIG_MICROCODE=y
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8
2004-08-25 1:42 nmi_watchdog=2 - Oops with 2.6.8 Zarakin
@ 2004-08-25 6:12 ` Philippe Elie
2004-08-25 9:26 ` Mikael Pettersson
2004-08-25 16:01 ` Philippe Elie
1 sibling, 1 reply; 8+ messages in thread
From: Philippe Elie @ 2004-08-25 6:12 UTC (permalink / raw)
To: Zarakin; +Cc: linux-kernel
On Tue, 24 Aug 2004 at 18:42 +0000, Zarakin wrote:
> Hi,
>
> My Gentoo machine will not boot with nmi_watchdog=2 parameter - I get an
> oops at clear_msr_range.
>
> Handwritten oops Info:
> CPU 0
> EIP: 0060: [<0xc0110d4b>] Not tainted
> EIP is at clear_msr_range+0x18/0x25
> eax: 0 ebx:1f ecx: 3ba edx: 0
> esi: 3a0 edi: 1a ebp:0 esp: d7d83f74
> ds: 7b es: 7b ss: 68
>
> Dump of assembler code for function clear_msr_range:
> 0xc0110d33 <clear_msr_range+0>: push %edi
> 0xc0110d34 <clear_msr_range+1>: xor %edi,%edi
> 0xc0110d36 <clear_msr_range+3>: push %esi
> 0xc0110d37 <clear_msr_range+4>: push %ebx
> 0xc0110d38 <clear_msr_range+5>: mov 0x14(%esp,1),%ebx
> 0xc0110d3c <clear_msr_range+9>: mov 0x10(%esp,1),%esi
> 0xc0110d40 <clear_msr_range+13>: cmp %ebx,%edi
> 0xc0110d42 <clear_msr_range+15>: jae 0xc0110d54
> 0xc0110d44 <clear_msr_range+17>: xor %eax,%eax
> 0xc0110d46 <clear_msr_range+19>: lea (%edi,%esi,1),%ecx
> 0xc0110d49 <clear_msr_range+22>: mov %eax,%edx
> 0xc0110d4b <clear_msr_range+24>: wrmsr
Intel removed MSR 0x3ba/0x3bb (MSR_IQ_ESCR0 and 1) in prescott processor
(family 15 model 3). I'm going to sleep, if nobody beat me I'll try to
provide a patch, see nmi.c:setup_p4_watchdog() --> clear_msr_range(0x3A0, 31);
This probably break oprofile too, patch will be a bit less obvious
regards,
phe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8
2004-08-25 6:12 ` Philippe Elie
@ 2004-08-25 9:26 ` Mikael Pettersson
2004-08-25 17:05 ` Philippe Elie
0 siblings, 1 reply; 8+ messages in thread
From: Mikael Pettersson @ 2004-08-25 9:26 UTC (permalink / raw)
To: Philippe Elie; +Cc: Zarakin, linux-kernel
Philippe Elie writes:
> > EIP: 0060: [<0xc0110d4b>] Not tainted
> > EIP is at clear_msr_range+0x18/0x25
> > eax: 0 ebx:1f ecx: 3ba edx: 0
> > esi: 3a0 edi: 1a ebp:0 esp: d7d83f74
> > ds: 7b es: 7b ss: 68
> >
> > Dump of assembler code for function clear_msr_range:
> > 0xc0110d33 <clear_msr_range+0>: push %edi
> > 0xc0110d34 <clear_msr_range+1>: xor %edi,%edi
> > 0xc0110d36 <clear_msr_range+3>: push %esi
> > 0xc0110d37 <clear_msr_range+4>: push %ebx
> > 0xc0110d38 <clear_msr_range+5>: mov 0x14(%esp,1),%ebx
> > 0xc0110d3c <clear_msr_range+9>: mov 0x10(%esp,1),%esi
> > 0xc0110d40 <clear_msr_range+13>: cmp %ebx,%edi
> > 0xc0110d42 <clear_msr_range+15>: jae 0xc0110d54
> > 0xc0110d44 <clear_msr_range+17>: xor %eax,%eax
> > 0xc0110d46 <clear_msr_range+19>: lea (%edi,%esi,1),%ecx
> > 0xc0110d49 <clear_msr_range+22>: mov %eax,%edx
> > 0xc0110d4b <clear_msr_range+24>: wrmsr
>
> Intel removed MSR 0x3ba/0x3bb (MSR_IQ_ESCR0 and 1) in prescott processor
> (family 15 model 3). I'm going to sleep, if nobody beat me I'll try to
> provide a patch, see nmi.c:setup_p4_watchdog() --> clear_msr_range(0x3A0, 31);
I figured that too. Strangely enough, perfctr has been run
successfully on two CPUID 0xF3x machines, and it didn't hit
this problem. I have no idea why, yet. Maybe they haven't
removed IQ_ESCR{0,1} from the Nocona?
I don't have physical access to either a Prescott or a Nocona,
but it it shouldn't be difficult to test. Just set up IQ_ESCR{0,1}
with a clock-like event and see what happens.
/Mikael
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8
2004-08-25 1:42 nmi_watchdog=2 - Oops with 2.6.8 Zarakin
2004-08-25 6:12 ` Philippe Elie
@ 2004-08-25 16:01 ` Philippe Elie
2004-08-26 3:22 ` Zarakin
1 sibling, 1 reply; 8+ messages in thread
From: Philippe Elie @ 2004-08-25 16:01 UTC (permalink / raw)
To: Zarakin; +Cc: linux-kernel
On Tue, 24 Aug 2004 at 18:42 +0000, Zarakin wrote:
> Hi,
>
> My Gentoo machine will not boot with nmi_watchdog=2 parameter - I get an
> oops at clear_msr_range.
>
> Handwritten oops Info:
> CPU 0
> EIP: 0060: [<0xc0110d4b>] Not tainted
> EIP is at clear_msr_range+0x18/0x25
> eax: 0 ebx:1f ecx: 3ba edx: 0
> esi: 3a0 edi: 1a ebp:0 esp: d7d83f74
> ds: 7b es: 7b ss: 68
> 0xc0110d4b <clear_msr_range+24>: wrmsr
> model : 3
> model name : Intel(R) Pentium(R) 4 CPU 2.80GHz
try this patch please.
--- linux-2.5/arch/i386/kernel/nmi.c~ 2004-06-15 10:52:00.000000000 +0200
+++ linux-2.5/arch/i386/kernel/nmi.c 2004-08-25 17:33:45.000000000 +0200
@@ -376,7 +376,13 @@
clear_msr_range(0x3F1, 2);
/* MSR 0x3F0 seems to have a default value of 0xFC00, but current
docs doesn't fully define it, so leave it alone for now. */
- clear_msr_range(0x3A0, 31);
+ if (boot_cpu_data.x86_model >= 0x3) {
+ /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */
+ clear_msr_range(0x3A0, 26);
+ clear_msr_range(0x3BC, 3);
+ } else {
+ clear_msr_range(0x3A0, 31);
+ }
clear_msr_range(0x3C0, 6);
clear_msr_range(0x3C8, 6);
clear_msr_range(0x3E0, 2);
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8
2004-08-25 9:26 ` Mikael Pettersson
@ 2004-08-25 17:05 ` Philippe Elie
0 siblings, 0 replies; 8+ messages in thread
From: Philippe Elie @ 2004-08-25 17:05 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: Zarakin, linux-kernel
On Wed, 25 Aug 2004 at 11:26 +0000, Mikael Pettersson wrote:
> I figured that too. Strangely enough, perfctr has been run
> successfully on two CPUID 0xF3x machines, and it didn't hit
> this problem. I have no idea why, yet. Maybe they haven't
> removed IQ_ESCR{0,1} from the Nocona?
I'm suprised too, w/o more information I'll follow strictly
the documentation.
> I don't have physical access to either a Prescott or a Nocona,
> but it it shouldn't be difficult to test. Just set up IQ_ESCR{0,1}
> with a clock-like event and see what happens.
loading oprofile driver must segfault w/o IQ_ESCR0/1 but we don't use
these MSR in oprofile so it can't be used to check if they are
functionnal on nocoma (and no HW to test it too)
regards,
Phe
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8
2004-08-25 16:01 ` Philippe Elie
@ 2004-08-26 3:22 ` Zarakin
2004-08-26 17:02 ` Philippe Elie
0 siblings, 1 reply; 8+ messages in thread
From: Zarakin @ 2004-08-26 3:22 UTC (permalink / raw)
To: linux-kernel
> try this patch please.
>
> --- linux-2.5/arch/i386/kernel/nmi.c~ 2004-06-15 10:52:00.000000000 +0200
> +++ linux-2.5/arch/i386/kernel/nmi.c 2004-08-25 17:33:45.000000000 +0200
> @@ -376,7 +376,13 @@
> clear_msr_range(0x3F1, 2);
> /* MSR 0x3F0 seems to have a default value of 0xFC00, but current
> docs doesn't fully define it, so leave it alone for now. */
> - clear_msr_range(0x3A0, 31);
> + if (boot_cpu_data.x86_model >= 0x3) {
> + /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */
> + clear_msr_range(0x3A0, 26);
> + clear_msr_range(0x3BC, 3);
> + } else {
> + clear_msr_range(0x3A0, 31);
> + }
> clear_msr_range(0x3C0, 6);
> clear_msr_range(0x3C8, 6);
> clear_msr_range(0x3E0, 2);
It worked, my machine boots now fine with nmi_watchdog=2.
I can also confirm that oprofile is broken due to the missing
MSR_P4_IQ_ESCR0/1:
http://marc.theaimsgroup.com/?l=oprofile-list&m=109323108114060&w=2
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8
2004-08-26 3:22 ` Zarakin
@ 2004-08-26 17:02 ` Philippe Elie
[not found] ` <00f801c48bfa$f5ccefb0$6401a8c0@novustelecom.net>
0 siblings, 1 reply; 8+ messages in thread
From: Philippe Elie @ 2004-08-26 17:02 UTC (permalink / raw)
To: Zarakin; +Cc: linux-kernel
On Wed, 25 Aug 2004 at 20:22 +0000, Zarakin wrote:
> > try this patch please.
> It worked, my machine boots now fine with nmi_watchdog=2.
>
> I can also confirm that oprofile is broken due to the missing
> MSR_P4_IQ_ESCR0/1:
> http://marc.theaimsgroup.com/?l=oprofile-list&m=109323108114060&w=2
>
gahh, I missed this mail...
try this patch for oprofile I'll appreciate a lot if you can test with
an UP and a SMP with HT enabled kernel. I tested the patch only with
model = 2 P4 with an HT kernel. I don't see any other user of these
two msr in oprofile nor in kernel.
regards, phe.
--- linux-2.5/arch/i386/oprofile/op_model_p4.c.old 2004-08-25 20:00:56.000000000 +0200
+++ linux-2.5/arch/i386/oprofile/op_model_p4.c 2004-08-25 21:46:14.000000000 +0200
@@ -419,9 +419,28 @@
msrs->controls[i].addr = addr;
}
- /* 43 ESCR registers in three discontiguous group */
+ /* 43 ESCR registers in three or four discontiguous group */
for (addr = MSR_P4_BSU_ESCR0 + stag;
- addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
+ addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1
+ * to avoid special case in nmi_{save|restore}_registers() */
+ if (boot_cpu_data.x86_model >= 0x3) {
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ } else {
+ for (addr = MSR_P4_IQ_ESCR0 + stag;
+ addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ }
+
+ for (addr = MSR_P4_RAT_ESCR0 + stag;
+ addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
msrs->controls[i].addr = addr;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 and oprofile - Oops with 2.6.8
[not found] ` <00f801c48bfa$f5ccefb0$6401a8c0@novustelecom.net>
@ 2004-08-27 16:52 ` Philippe Elie
0 siblings, 0 replies; 8+ messages in thread
From: Philippe Elie @ 2004-08-27 16:52 UTC (permalink / raw)
To: Zarakin; +Cc: Andrew Morton, linux-kernel
On Thu, 26 Aug 2004 at 22:59 +0000, Zarakin wrote:
> Oprofile oops'd this time at p4_setup_ctrs - there seems to be one more loop
> that tries to
> access MSR_IQ_ESCR0/1. I did a small change myself and it seems to work
> fine, on both
> SMP&HT and UP kernels. I was able to start oprofile and get profile sample
> data.
Thanks, I missed this one.
I redo the diff, mangled space in last patch, I added the
nmi.c:clear_msr_range() fix. It must go in 2.6.9 imho, please
Andrew apply.
-- phe
--- linux-2.5/arch/i386/kernel/nmi.c~ 2004-06-15 10:52:00.000000000 +0200
+++ linux-2.5/arch/i386/kernel/nmi.c 2004-08-25 17:33:45.000000000 +0200
@@ -376,7 +376,13 @@
clear_msr_range(0x3F1, 2);
/* MSR 0x3F0 seems to have a default value of 0xFC00, but current
docs doesn't fully define it, so leave it alone for now. */
- clear_msr_range(0x3A0, 31);
+ if (boot_cpu_data.x86_model >= 0x3) {
+ /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */
+ clear_msr_range(0x3A0, 26);
+ clear_msr_range(0x3BC, 3);
+ } else {
+ clear_msr_range(0x3A0, 31);
+ }
clear_msr_range(0x3C0, 6);
clear_msr_range(0x3C8, 6);
clear_msr_range(0x3E0, 2);
--- linux-2.5/arch/i386/oprofile/op_model_p4.c.old 2004-08-25 20:00:56.000000000 +0200
+++ linux-2.5/arch/i386/oprofile/op_model_p4.c 2004-08-27 18:35:16.000000000 +0200
@@ -419,9 +419,28 @@
msrs->controls[i].addr = addr;
}
- /* 43 ESCR registers in three discontiguous group */
+ /* 43 ESCR registers in three or four discontiguous group */
for (addr = MSR_P4_BSU_ESCR0 + stag;
- addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
+ addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+
+ /* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1
+ * to avoid special case in nmi_{save|restore}_registers() */
+ if (boot_cpu_data.x86_model >= 0x3) {
+ for (addr = MSR_P4_BSU_ESCR0 + stag;
+ addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ } else {
+ for (addr = MSR_P4_IQ_ESCR0 + stag;
+ addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) {
+ msrs->controls[i].addr = addr;
+ }
+ }
+
+ for (addr = MSR_P4_RAT_ESCR0 + stag;
+ addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
msrs->controls[i].addr = addr;
}
@@ -553,7 +572,18 @@
/* clear all escrs (including those outside our concern) */
for (addr = MSR_P4_BSU_ESCR0 + stag;
- addr <= MSR_P4_SSU_ESCR0; addr += addr_increment()) {
+ addr < MSR_P4_IQ_ESCR0; addr += addr_increment()) {
+ wrmsr(addr, 0, 0);
+ }
+
+ /* On older models clear also MSR_P4_IQ_ESCR0/1 */
+ if (boot_cpu_data.x86_model < 0x3) {
+ wrmsr(MSR_P4_IQ_ESCR0, 0, 0);
+ wrmsr(MSR_P4_IQ_ESCR1, 0, 0);
+ }
+
+ for (addr = MSR_P4_RAT_ESCR0 + stag;
+ addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) {
wrmsr(addr, 0, 0);
}
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-08-27 16:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-25 1:42 nmi_watchdog=2 - Oops with 2.6.8 Zarakin
2004-08-25 6:12 ` Philippe Elie
2004-08-25 9:26 ` Mikael Pettersson
2004-08-25 17:05 ` Philippe Elie
2004-08-25 16:01 ` Philippe Elie
2004-08-26 3:22 ` Zarakin
2004-08-26 17:02 ` Philippe Elie
[not found] ` <00f801c48bfa$f5ccefb0$6401a8c0@novustelecom.net>
2004-08-27 16:52 ` nmi_watchdog=2 and oprofile " Philippe Elie
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox