* nmi_watchdog=2 - Oops with 2.6.8 @ 2004-08-25 1:42 Zarakin 2004-08-25 6:12 ` Philippe Elie 2004-08-25 16:01 ` Philippe Elie 0 siblings, 2 replies; 8+ messages in thread From: Zarakin @ 2004-08-25 1:42 UTC (permalink / raw) To: linux-kernel Hi, My Gentoo machine will not boot with nmi_watchdog=2 parameter - I get an oops at clear_msr_range. Handwritten oops Info: CPU 0 EIP: 0060: [<0xc0110d4b>] Not tainted EIP is at clear_msr_range+0x18/0x25 eax: 0 ebx:1f ecx: 3ba edx: 0 esi: 3a0 edi: 1a ebp:0 esp: d7d83f74 ds: 7b es: 7b ss: 68 Dump of assembler code for function clear_msr_range: 0xc0110d33 <clear_msr_range+0>: push %edi 0xc0110d34 <clear_msr_range+1>: xor %edi,%edi 0xc0110d36 <clear_msr_range+3>: push %esi 0xc0110d37 <clear_msr_range+4>: push %ebx 0xc0110d38 <clear_msr_range+5>: mov 0x14(%esp,1),%ebx 0xc0110d3c <clear_msr_range+9>: mov 0x10(%esp,1),%esi 0xc0110d40 <clear_msr_range+13>: cmp %ebx,%edi 0xc0110d42 <clear_msr_range+15>: jae 0xc0110d54 0xc0110d44 <clear_msr_range+17>: xor %eax,%eax 0xc0110d46 <clear_msr_range+19>: lea (%edi,%esi,1),%ecx 0xc0110d49 <clear_msr_range+22>: mov %eax,%edx 0xc0110d4b <clear_msr_range+24>: wrmsr 0xc0110d4d <clear_msr_range+26>: add $0x1,%edi 0xc0110d50 <clear_msr_range+29>: cmp %ebx,%edi 0xc0110d52 <clear_msr_range+31>: jb 0xc0110d46 0xc0110d54 <clear_msr_range+33>: pop %ebx 0xc0110d55 <clear_msr_range+34>: pop %esi 0xc0110d56 <clear_msr_range+35>: pop %edi 0xc0110d57 <clear_msr_range+36>: ret HW Info: * Shuttle ST61G4 Box - http://www.shuttle.com/hq/product/barebone/specification.asp?B_id=28 * Chipsets: North bridge:ATI RS300 South bridge:ATI IXP150 * Intel P4 2.8E GHz (Prescott) cat /proc/version: Linux version 2.6.8-gentoo (root@tux) (gcc version 3.3.3 20040412 (Gentoo Linux 3.3.3-r6, ssp-3.3.2-2, pie-8.7.6)) #1 Mon Aug 23 21:09:40 PDT 2004 cat /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 3 model name : Intel(R) Pentium(R) 4 CPU 2.80GHz stepping : 3 cpu MHz : 2794.263 cache size : 1024 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni monitor ds_cpl cid bogomips : 5554.17 Processor section from my .config # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set CONFIG_MPENTIUM4=y # CONFIG_MK6 is not set # CONFIG_MK7 is not set # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set CONFIG_X86_GENERIC=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=7 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_SMP is not set CONFIG_PREEMPT=y CONFIG_X86_UP_APIC=y CONFIG_X86_UP_IOAPIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_X86_IO_APIC=y CONFIG_X86_TSC=y CONFIG_X86_MCE=y CONFIG_X86_MCE_NONFATAL=y CONFIG_X86_MCE_P4THERMAL=y # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set CONFIG_MICROCODE=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8 2004-08-25 1:42 nmi_watchdog=2 - Oops with 2.6.8 Zarakin @ 2004-08-25 6:12 ` Philippe Elie 2004-08-25 9:26 ` Mikael Pettersson 2004-08-25 16:01 ` Philippe Elie 1 sibling, 1 reply; 8+ messages in thread From: Philippe Elie @ 2004-08-25 6:12 UTC (permalink / raw) To: Zarakin; +Cc: linux-kernel On Tue, 24 Aug 2004 at 18:42 +0000, Zarakin wrote: > Hi, > > My Gentoo machine will not boot with nmi_watchdog=2 parameter - I get an > oops at clear_msr_range. > > Handwritten oops Info: > CPU 0 > EIP: 0060: [<0xc0110d4b>] Not tainted > EIP is at clear_msr_range+0x18/0x25 > eax: 0 ebx:1f ecx: 3ba edx: 0 > esi: 3a0 edi: 1a ebp:0 esp: d7d83f74 > ds: 7b es: 7b ss: 68 > > Dump of assembler code for function clear_msr_range: > 0xc0110d33 <clear_msr_range+0>: push %edi > 0xc0110d34 <clear_msr_range+1>: xor %edi,%edi > 0xc0110d36 <clear_msr_range+3>: push %esi > 0xc0110d37 <clear_msr_range+4>: push %ebx > 0xc0110d38 <clear_msr_range+5>: mov 0x14(%esp,1),%ebx > 0xc0110d3c <clear_msr_range+9>: mov 0x10(%esp,1),%esi > 0xc0110d40 <clear_msr_range+13>: cmp %ebx,%edi > 0xc0110d42 <clear_msr_range+15>: jae 0xc0110d54 > 0xc0110d44 <clear_msr_range+17>: xor %eax,%eax > 0xc0110d46 <clear_msr_range+19>: lea (%edi,%esi,1),%ecx > 0xc0110d49 <clear_msr_range+22>: mov %eax,%edx > 0xc0110d4b <clear_msr_range+24>: wrmsr Intel removed MSR 0x3ba/0x3bb (MSR_IQ_ESCR0 and 1) in prescott processor (family 15 model 3). I'm going to sleep, if nobody beat me I'll try to provide a patch, see nmi.c:setup_p4_watchdog() --> clear_msr_range(0x3A0, 31); This probably break oprofile too, patch will be a bit less obvious regards, phe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8 2004-08-25 6:12 ` Philippe Elie @ 2004-08-25 9:26 ` Mikael Pettersson 2004-08-25 17:05 ` Philippe Elie 0 siblings, 1 reply; 8+ messages in thread From: Mikael Pettersson @ 2004-08-25 9:26 UTC (permalink / raw) To: Philippe Elie; +Cc: Zarakin, linux-kernel Philippe Elie writes: > > EIP: 0060: [<0xc0110d4b>] Not tainted > > EIP is at clear_msr_range+0x18/0x25 > > eax: 0 ebx:1f ecx: 3ba edx: 0 > > esi: 3a0 edi: 1a ebp:0 esp: d7d83f74 > > ds: 7b es: 7b ss: 68 > > > > Dump of assembler code for function clear_msr_range: > > 0xc0110d33 <clear_msr_range+0>: push %edi > > 0xc0110d34 <clear_msr_range+1>: xor %edi,%edi > > 0xc0110d36 <clear_msr_range+3>: push %esi > > 0xc0110d37 <clear_msr_range+4>: push %ebx > > 0xc0110d38 <clear_msr_range+5>: mov 0x14(%esp,1),%ebx > > 0xc0110d3c <clear_msr_range+9>: mov 0x10(%esp,1),%esi > > 0xc0110d40 <clear_msr_range+13>: cmp %ebx,%edi > > 0xc0110d42 <clear_msr_range+15>: jae 0xc0110d54 > > 0xc0110d44 <clear_msr_range+17>: xor %eax,%eax > > 0xc0110d46 <clear_msr_range+19>: lea (%edi,%esi,1),%ecx > > 0xc0110d49 <clear_msr_range+22>: mov %eax,%edx > > 0xc0110d4b <clear_msr_range+24>: wrmsr > > Intel removed MSR 0x3ba/0x3bb (MSR_IQ_ESCR0 and 1) in prescott processor > (family 15 model 3). I'm going to sleep, if nobody beat me I'll try to > provide a patch, see nmi.c:setup_p4_watchdog() --> clear_msr_range(0x3A0, 31); I figured that too. Strangely enough, perfctr has been run successfully on two CPUID 0xF3x machines, and it didn't hit this problem. I have no idea why, yet. Maybe they haven't removed IQ_ESCR{0,1} from the Nocona? I don't have physical access to either a Prescott or a Nocona, but it it shouldn't be difficult to test. Just set up IQ_ESCR{0,1} with a clock-like event and see what happens. /Mikael ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8 2004-08-25 9:26 ` Mikael Pettersson @ 2004-08-25 17:05 ` Philippe Elie 0 siblings, 0 replies; 8+ messages in thread From: Philippe Elie @ 2004-08-25 17:05 UTC (permalink / raw) To: Mikael Pettersson; +Cc: Zarakin, linux-kernel On Wed, 25 Aug 2004 at 11:26 +0000, Mikael Pettersson wrote: > I figured that too. Strangely enough, perfctr has been run > successfully on two CPUID 0xF3x machines, and it didn't hit > this problem. I have no idea why, yet. Maybe they haven't > removed IQ_ESCR{0,1} from the Nocona? I'm suprised too, w/o more information I'll follow strictly the documentation. > I don't have physical access to either a Prescott or a Nocona, > but it it shouldn't be difficult to test. Just set up IQ_ESCR{0,1} > with a clock-like event and see what happens. loading oprofile driver must segfault w/o IQ_ESCR0/1 but we don't use these MSR in oprofile so it can't be used to check if they are functionnal on nocoma (and no HW to test it too) regards, Phe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8 2004-08-25 1:42 nmi_watchdog=2 - Oops with 2.6.8 Zarakin 2004-08-25 6:12 ` Philippe Elie @ 2004-08-25 16:01 ` Philippe Elie 2004-08-26 3:22 ` Zarakin 1 sibling, 1 reply; 8+ messages in thread From: Philippe Elie @ 2004-08-25 16:01 UTC (permalink / raw) To: Zarakin; +Cc: linux-kernel On Tue, 24 Aug 2004 at 18:42 +0000, Zarakin wrote: > Hi, > > My Gentoo machine will not boot with nmi_watchdog=2 parameter - I get an > oops at clear_msr_range. > > Handwritten oops Info: > CPU 0 > EIP: 0060: [<0xc0110d4b>] Not tainted > EIP is at clear_msr_range+0x18/0x25 > eax: 0 ebx:1f ecx: 3ba edx: 0 > esi: 3a0 edi: 1a ebp:0 esp: d7d83f74 > ds: 7b es: 7b ss: 68 > 0xc0110d4b <clear_msr_range+24>: wrmsr > model : 3 > model name : Intel(R) Pentium(R) 4 CPU 2.80GHz try this patch please. --- linux-2.5/arch/i386/kernel/nmi.c~ 2004-06-15 10:52:00.000000000 +0200 +++ linux-2.5/arch/i386/kernel/nmi.c 2004-08-25 17:33:45.000000000 +0200 @@ -376,7 +376,13 @@ clear_msr_range(0x3F1, 2); /* MSR 0x3F0 seems to have a default value of 0xFC00, but current docs doesn't fully define it, so leave it alone for now. */ - clear_msr_range(0x3A0, 31); + if (boot_cpu_data.x86_model >= 0x3) { + /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */ + clear_msr_range(0x3A0, 26); + clear_msr_range(0x3BC, 3); + } else { + clear_msr_range(0x3A0, 31); + } clear_msr_range(0x3C0, 6); clear_msr_range(0x3C8, 6); clear_msr_range(0x3E0, 2); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8 2004-08-25 16:01 ` Philippe Elie @ 2004-08-26 3:22 ` Zarakin 2004-08-26 17:02 ` Philippe Elie 0 siblings, 1 reply; 8+ messages in thread From: Zarakin @ 2004-08-26 3:22 UTC (permalink / raw) To: linux-kernel > try this patch please. > > --- linux-2.5/arch/i386/kernel/nmi.c~ 2004-06-15 10:52:00.000000000 +0200 > +++ linux-2.5/arch/i386/kernel/nmi.c 2004-08-25 17:33:45.000000000 +0200 > @@ -376,7 +376,13 @@ > clear_msr_range(0x3F1, 2); > /* MSR 0x3F0 seems to have a default value of 0xFC00, but current > docs doesn't fully define it, so leave it alone for now. */ > - clear_msr_range(0x3A0, 31); > + if (boot_cpu_data.x86_model >= 0x3) { > + /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */ > + clear_msr_range(0x3A0, 26); > + clear_msr_range(0x3BC, 3); > + } else { > + clear_msr_range(0x3A0, 31); > + } > clear_msr_range(0x3C0, 6); > clear_msr_range(0x3C8, 6); > clear_msr_range(0x3E0, 2); It worked, my machine boots now fine with nmi_watchdog=2. I can also confirm that oprofile is broken due to the missing MSR_P4_IQ_ESCR0/1: http://marc.theaimsgroup.com/?l=oprofile-list&m=109323108114060&w=2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: nmi_watchdog=2 - Oops with 2.6.8 2004-08-26 3:22 ` Zarakin @ 2004-08-26 17:02 ` Philippe Elie [not found] ` <00f801c48bfa$f5ccefb0$6401a8c0@novustelecom.net> 0 siblings, 1 reply; 8+ messages in thread From: Philippe Elie @ 2004-08-26 17:02 UTC (permalink / raw) To: Zarakin; +Cc: linux-kernel On Wed, 25 Aug 2004 at 20:22 +0000, Zarakin wrote: > > try this patch please. > It worked, my machine boots now fine with nmi_watchdog=2. > > I can also confirm that oprofile is broken due to the missing > MSR_P4_IQ_ESCR0/1: > http://marc.theaimsgroup.com/?l=oprofile-list&m=109323108114060&w=2 > gahh, I missed this mail... try this patch for oprofile I'll appreciate a lot if you can test with an UP and a SMP with HT enabled kernel. I tested the patch only with model = 2 P4 with an HT kernel. I don't see any other user of these two msr in oprofile nor in kernel. regards, phe. --- linux-2.5/arch/i386/oprofile/op_model_p4.c.old 2004-08-25 20:00:56.000000000 +0200 +++ linux-2.5/arch/i386/oprofile/op_model_p4.c 2004-08-25 21:46:14.000000000 +0200 @@ -419,9 +419,28 @@ msrs->controls[i].addr = addr; } - /* 43 ESCR registers in three discontiguous group */ + /* 43 ESCR registers in three or four discontiguous group */ for (addr = MSR_P4_BSU_ESCR0 + stag; - addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) { + addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) { + msrs->controls[i].addr = addr; + } + + /* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1 + * to avoid special case in nmi_{save|restore}_registers() */ + if (boot_cpu_data.x86_model >= 0x3) { + for (addr = MSR_P4_BSU_ESCR0 + stag; + addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) { + msrs->controls[i].addr = addr; + } + } else { + for (addr = MSR_P4_IQ_ESCR0 + stag; + addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) { + msrs->controls[i].addr = addr; + } + } + + for (addr = MSR_P4_RAT_ESCR0 + stag; + addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) { msrs->controls[i].addr = addr; } ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <00f801c48bfa$f5ccefb0$6401a8c0@novustelecom.net>]
* Re: nmi_watchdog=2 and oprofile - Oops with 2.6.8 [not found] ` <00f801c48bfa$f5ccefb0$6401a8c0@novustelecom.net> @ 2004-08-27 16:52 ` Philippe Elie 0 siblings, 0 replies; 8+ messages in thread From: Philippe Elie @ 2004-08-27 16:52 UTC (permalink / raw) To: Zarakin; +Cc: Andrew Morton, linux-kernel On Thu, 26 Aug 2004 at 22:59 +0000, Zarakin wrote: > Oprofile oops'd this time at p4_setup_ctrs - there seems to be one more loop > that tries to > access MSR_IQ_ESCR0/1. I did a small change myself and it seems to work > fine, on both > SMP&HT and UP kernels. I was able to start oprofile and get profile sample > data. Thanks, I missed this one. I redo the diff, mangled space in last patch, I added the nmi.c:clear_msr_range() fix. It must go in 2.6.9 imho, please Andrew apply. -- phe --- linux-2.5/arch/i386/kernel/nmi.c~ 2004-06-15 10:52:00.000000000 +0200 +++ linux-2.5/arch/i386/kernel/nmi.c 2004-08-25 17:33:45.000000000 +0200 @@ -376,7 +376,13 @@ clear_msr_range(0x3F1, 2); /* MSR 0x3F0 seems to have a default value of 0xFC00, but current docs doesn't fully define it, so leave it alone for now. */ - clear_msr_range(0x3A0, 31); + if (boot_cpu_data.x86_model >= 0x3) { + /* MSR_P4_IQ_ESCR0/1 (0x3ba/0x3bb) removed */ + clear_msr_range(0x3A0, 26); + clear_msr_range(0x3BC, 3); + } else { + clear_msr_range(0x3A0, 31); + } clear_msr_range(0x3C0, 6); clear_msr_range(0x3C8, 6); clear_msr_range(0x3E0, 2); --- linux-2.5/arch/i386/oprofile/op_model_p4.c.old 2004-08-25 20:00:56.000000000 +0200 +++ linux-2.5/arch/i386/oprofile/op_model_p4.c 2004-08-27 18:35:16.000000000 +0200 @@ -419,9 +419,28 @@ msrs->controls[i].addr = addr; } - /* 43 ESCR registers in three discontiguous group */ + /* 43 ESCR registers in three or four discontiguous group */ for (addr = MSR_P4_BSU_ESCR0 + stag; - addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) { + addr < MSR_P4_IQ_ESCR0; ++i, addr += addr_increment()) { + msrs->controls[i].addr = addr; + } + + /* no IQ_ESCR0/1 on some models, we save a seconde time BSU_ESCR0/1 + * to avoid special case in nmi_{save|restore}_registers() */ + if (boot_cpu_data.x86_model >= 0x3) { + for (addr = MSR_P4_BSU_ESCR0 + stag; + addr <= MSR_P4_BSU_ESCR1; ++i, addr += addr_increment()) { + msrs->controls[i].addr = addr; + } + } else { + for (addr = MSR_P4_IQ_ESCR0 + stag; + addr <= MSR_P4_IQ_ESCR1; ++i, addr += addr_increment()) { + msrs->controls[i].addr = addr; + } + } + + for (addr = MSR_P4_RAT_ESCR0 + stag; + addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) { msrs->controls[i].addr = addr; } @@ -553,7 +572,18 @@ /* clear all escrs (including those outside our concern) */ for (addr = MSR_P4_BSU_ESCR0 + stag; - addr <= MSR_P4_SSU_ESCR0; addr += addr_increment()) { + addr < MSR_P4_IQ_ESCR0; addr += addr_increment()) { + wrmsr(addr, 0, 0); + } + + /* On older models clear also MSR_P4_IQ_ESCR0/1 */ + if (boot_cpu_data.x86_model < 0x3) { + wrmsr(MSR_P4_IQ_ESCR0, 0, 0); + wrmsr(MSR_P4_IQ_ESCR1, 0, 0); + } + + for (addr = MSR_P4_RAT_ESCR0 + stag; + addr <= MSR_P4_SSU_ESCR0; ++i, addr += addr_increment()) { wrmsr(addr, 0, 0); } ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-08-27 16:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-25 1:42 nmi_watchdog=2 - Oops with 2.6.8 Zarakin
2004-08-25 6:12 ` Philippe Elie
2004-08-25 9:26 ` Mikael Pettersson
2004-08-25 17:05 ` Philippe Elie
2004-08-25 16:01 ` Philippe Elie
2004-08-26 3:22 ` Zarakin
2004-08-26 17:02 ` Philippe Elie
[not found] ` <00f801c48bfa$f5ccefb0$6401a8c0@novustelecom.net>
2004-08-27 16:52 ` nmi_watchdog=2 and oprofile " Philippe Elie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox