Re: [discuss] [OOPS] powernow on smp dual core amd64

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-09 23:46 Tom Duffy
@ 2005-06-10 16:53 ` Andi Kleen
  2005-06-10 18:46   ` Tom Duffy
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2005-06-10 16:53 UTC (permalink / raw)
  To: Tom Duffy; +Cc: linux-kernel, discuss

On Thu, Jun 09, 2005 at 04:46:19PM -0700, Tom Duffy wrote:
> Got this panic when I recently upgraded my BIOS so that it supports k8
> powernow on SMP dual-core.

2.6.12-rc has a dual core aware powernow k8 driver.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-10 16:53 ` [discuss] " Andi Kleen
@ 2005-06-10 18:46   ` Tom Duffy
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Duffy @ 2005-06-10 18:46 UTC (permalink / raw)
  To: Andi Kleen; +Cc: discuss, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3564 bytes --]

On Fri, 2005-06-10 at 18:53 +0200, Andi Kleen wrote:
> On Thu, Jun 09, 2005 at 04:46:19PM -0700, Tom Duffy wrote:
> > Got this panic when I recently upgraded my BIOS so that it supports k8
> > powernow on SMP dual-core.
> 
> 2.6.12-rc has a dual core aware powernow k8 driver.

Despite the name of kernel, it is based off of 2.6.12-rc6.

Here is the panic on bootup.

Unable to handle kernel NULL pointer dereference at 0000000000000024 RIP:
<ffffffff8011dadc>{query_current_values_with_pending_wait+60}
PGD 3f255067 PUD 7fe7e067 PMD 0
Oops: 0002 [1] SMP
CPU 1
Modules linked in: mptscsih(U) mptbase(U) sd_mod scsi_mod
Pid: 33, comm: events/7 Not tainted 2.6.11-1.1381_FC5smp
RIP: 0010:[<ffffffff8011dadc>]  sdb1 sdb2
<ffffffff8011dadc>{query_current_values_with_pending_wait+60}
RSP: 0018:ffff81007fd9fdc8  EFLAGS: 00010202
RAX: 000000000000000e RBX: 0000000000000000 RCX: 00000000c0010042
RDX: 0000000000000008 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff81007fd9e000 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000080
R13: 0000000000000000 R14: 0000000000000292 R15: ffffffff80112950
FS:  00000000005a5858(0000) GS:ffffffff8050ec00(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000024 CR3: 000000007fd76000 CR4: 00000000000006e0
Process events/7 (pid: 33, threadinfo ffff81007fd9e000, task ffff81007fd43070)
Stack: 0000000000000000 ffffffff8011e0b1 0000000000000001 ffff81007fa02d10
       ffff81007fa02d40 ffffffff802e6f23 0000000000000000 0000000000000003
       0000000000000001 0000000000000020
Call Trace:<ffffffff8011e0b1>{powernowk8_get+129} <ffffffff802e6f23>{cpufreq_get+115}
       <ffffffff8011298a>{handle_cpufreq_delayed_get+58} <ffffffff8014b9dc>{worker_thread+476}
       <ffffffff80134710>{default_wake_function+0} <ffffffff801326a3>{__wake_up_common+67}
       <ffffffff8014b800>{worker_thread+0} <ffffffff80150469>{kthread+217}
       <ffffffff80135c90>{schedule_tail+64} <ffffffff8010f76b>{child_rip+8}
       <ffffffff8011d680>{flat_send_IPI_mask+0} <ffffffff80150390>{kthread+0}
       <ffffffff8010f763>{child_rip+0}

Code: 89 47 24 89 57 20 31 c0 48 83 c4 08 c3 66 66 66 90 66 66 90
RIP <ffffffff8011dadc>{query_current_values_with_pending_wait+60} RSP <ffff81007fd9fdc8>
CR2: 0000000000000024
 <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1

Call Trace:<ffffffff8013abc5>{profile_task_exit+21} <ffffffff8013bfe2>{do_exit+34}
       <ffffffff80267378>{do_unblank_screen+40} <ffffffff80124286>{do_page_fault+1926}
       <ffffffff8035c032>{thread_return+0} <ffffffff8035c084>{thread_return+82}
       <ffffffff8013433d>{activate_task+141} <ffffffff80112950>{handle_cpufreq_delayed_get+0}
       <ffffffff8010f5b5>{error_exit+0} <ffffffff80112950>{handle_cpufreq_delayed_get+0}
       <ffffffff8011dadc>{query_current_values_with_pending_wait+60}
       <ffffffff8011e0b1>{powernowk8_get+129} <ffffffff802e6f23>{cpufreq_get+115}
       <ffffffff8011298a>{handle_cpufreq_delayed_get+58} <ffffffff8014b9dc>{worker_thread+476}
       <ffffffff80134710>{default_wake_function+0} <ffffffff801326a3>{__wake_up_common+67}
       <ffffffff8014b800>{worker_thread+0} <ffffffff80150469>{kthread+217}
       <ffffffff80135c90>{schedule_tail+64} <ffffffff8010f76b>{child_rip+8}
       <ffffffff8011d680>{flat_send_IPI_mask+0} <ffffffff80150390>{kthread+0}
       <ffffffff8010f763>{child_rip+0}

-tduffy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [discuss] [OOPS] powernow on smp dual core amd64
@ 2005-06-10 19:48 Langsdorf, Mark
  2005-06-10 20:01 ` Andi Kleen
  0 siblings, 1 reply; 15+ messages in thread
From: Langsdorf, Mark @ 2005-06-10 19:48 UTC (permalink / raw)
  To: Tom Duffy, Andi Kleen; +Cc: discuss, linux-kernel

It looks like the crash is caused by an invalid
pointer dereference in 
query_current_values_with_pending_wait(), which
implies that powernowk8_get() was called with an
invalid CPU number.

Andi, what will happen if you do
set_cpus_allowed(current, cpumask_of_cpu(cpu)) when
cpu isn't in the range of online CPUs?  There's
supposed to be a check to prevent an invalid
pointer access from happening but it's failing for 
some reason.

-Mark Langsdorf
AMD, Inc.

> -----Original Message-----
> From: Tom Duffy [mailto:tduffy@sun.com] 
> Sent: Friday, June 10, 2005 1:47 PM
> To: Andi Kleen
> Cc: discuss@x86-64.org; linux-kernel@vger.kernel.org
> Subject: Re: [discuss] [OOPS] powernow on smp dual core amd64
> 
> 
> On Fri, 2005-06-10 at 18:53 +0200, Andi Kleen wrote:
> > On Thu, Jun 09, 2005 at 04:46:19PM -0700, Tom Duffy wrote:
> > > Got this panic when I recently upgraded my BIOS so that 
> it supports 
> > > k8 powernow on SMP dual-core.
> > 
> > 2.6.12-rc has a dual core aware powernow k8 driver.
> 
> Despite the name of kernel, it is based off of 2.6.12-rc6.
> 
> Here is the panic on bootup.
> 
> Unable to handle kernel NULL pointer dereference at 
> 0000000000000024 RIP: 
> <ffffffff8011dadc>{query_current_values_with_pending_wait+60}
> PGD 3f255067 PUD 7fe7e067 PMD 0
> Oops: 0002 [1] SMP
> CPU 1
> Modules linked in: mptscsih(U) mptbase(U) sd_mod scsi_mod
> Pid: 33, comm: events/7 Not tainted 2.6.11-1.1381_FC5smp
> RIP: 0010:[<ffffffff8011dadc>]  sdb1 sdb2 
> <ffffffff8011dadc>{query_current_values_with_pending_wait+60}
> RSP: 0018:ffff81007fd9fdc8  EFLAGS: 00010202
> RAX: 000000000000000e RBX: 0000000000000000 RCX: 00000000c0010042
> RDX: 0000000000000008 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: 0000000000000000 R08: ffff81007fd9e000 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000080
> R13: 0000000000000000 R14: 0000000000000292 R15: ffffffff80112950
> FS:  00000000005a5858(0000) GS:ffffffff8050ec00(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000024 CR3: 000000007fd76000 CR4: 
> 00000000000006e0 Process events/7 (pid: 33, threadinfo 
> ffff81007fd9e000, task ffff81007fd43070)
> Stack: 0000000000000000 ffffffff8011e0b1 0000000000000001 
> ffff81007fa02d10
>        ffff81007fa02d40 ffffffff802e6f23 0000000000000000 
> 0000000000000003
>        0000000000000001 0000000000000020
> Call Trace:<ffffffff8011e0b1>{powernowk8_get+129} 
> <ffffffff802e6f23>{cpufreq_get+115}
>        <ffffffff8011298a>{handle_cpufreq_delayed_get+58} 
> <ffffffff8014b9dc>{worker_thread+476}
>        <ffffffff80134710>{default_wake_function+0} 
> <ffffffff801326a3>{__wake_up_common+67}
>        <ffffffff8014b800>{worker_thread+0} 
> <ffffffff80150469>{kthread+217}
>        <ffffffff80135c90>{schedule_tail+64} 
> <ffffffff8010f76b>{child_rip+8}
>        <ffffffff8011d680>{flat_send_IPI_mask+0} 
> <ffffffff80150390>{kthread+0}
>        <ffffffff8010f763>{child_rip+0}
> 
> Code: 89 47 24 89 57 20 31 c0 48 83 c4 08 c3 66 66 66 90 66 
> 66 90 RIP 
> <ffffffff8011dadc>{query_current_values_with_pending_wait+60} 
> RSP <ffff81007fd9fdc8>
> CR2: 0000000000000024
>  <3>Debug: sleeping function called from invalid context at 
> include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1
> 
> Call Trace:<ffffffff8013abc5>{profile_task_exit+21} 
> <ffffffff8013bfe2>{do_exit+34}
>        <ffffffff80267378>{do_unblank_screen+40} 
> <ffffffff80124286>{do_page_fault+1926}
>        <ffffffff8035c032>{thread_return+0} 
> <ffffffff8035c084>{thread_return+82}
>        <ffffffff8013433d>{activate_task+141} 
> <ffffffff80112950>{handle_cpufreq_delayed_get+0}
>        <ffffffff8010f5b5>{error_exit+0} 
> <ffffffff80112950>{handle_cpufreq_delayed_get+0}
>        <ffffffff8011dadc>{query_current_values_with_pending_wait+60}
>        <ffffffff8011e0b1>{powernowk8_get+129} 
> <ffffffff802e6f23>{cpufreq_get+115}
>        <ffffffff8011298a>{handle_cpufreq_delayed_get+58} 
> <ffffffff8014b9dc>{worker_thread+476}
>        <ffffffff80134710>{default_wake_function+0} 
> <ffffffff801326a3>{__wake_up_common+67}
>        <ffffffff8014b800>{worker_thread+0} 
> <ffffffff80150469>{kthread+217}
>        <ffffffff80135c90>{schedule_tail+64} 
> <ffffffff8010f76b>{child_rip+8}
>        <ffffffff8011d680>{flat_send_IPI_mask+0} 
> <ffffffff80150390>{kthread+0}
>        <ffffffff8010f763>{child_rip+0}
> 
> -tduffy
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-10 19:48 Langsdorf, Mark
@ 2005-06-10 20:01 ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2005-06-10 20:01 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: Tom Duffy, Andi Kleen, discuss, linux-kernel

On Fri, Jun 10, 2005 at 02:48:58PM -0500, Langsdorf, Mark wrote:
> It looks like the crash is caused by an invalid
> pointer dereference in 
> query_current_values_with_pending_wait(), which
> implies that powernowk8_get() was called with an
> invalid CPU number.
> 
> Andi, what will happen if you do
> set_cpus_allowed(current, cpumask_of_cpu(cpu)) when
> cpu isn't in the range of online CPUs?  There's

It just returns with -EINVAL.

Also it really shouldnt happen. 

> supposed to be a check to prevent an invalid
> pointer access from happening but it's failing for 
> some reason.
-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [discuss] [OOPS] powernow on smp dual core amd64
       [not found] <84EA05E2CA77634C82730353CBE3A84301CFC14B@SAUSEXMB1.amd.com>
@ 2005-06-13 21:27 ` Tom Duffy
  0 siblings, 0 replies; 15+ messages in thread
From: Tom Duffy @ 2005-06-13 21:27 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: Linux Kernel Mailing List, discuss

[-- Attachment #1: Type: text/plain, Size: 16580 bytes --]

OK, here is the whole dmesg on stock rc6 (with the added printout of cpu
number in powernowk8_get()):

Bootdata ok (command line is ro root=/dev/VolGroup00/LogVol00 console=ttyS0 console=tty1 3)
Linux version 2.6.12-rc6andro (root@androdemolin1.sfbay.sun.com) (gcc version 4.0.0 20050606 (Red Hat 4.0.0-11)) #840 SMP Mon Jun 13 11:32:51 PDT 2005
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e6000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007fffe000 (ACPI data)
 BIOS-e820: 000000007fffe000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
SRAT: PXM 0 -> APIC 0 -> Node 0
SRAT: PXM 0 -> APIC 1 -> Node 0
SRAT: PXM 1 -> APIC 2 -> Node 1
SRAT: PXM 1 -> APIC 3 -> Node 1
SRAT: PXM 2 -> APIC 4 -> Node 2
SRAT: PXM 2 -> APIC 5 -> Node 2
SRAT: PXM 3 -> APIC 6 -> Node 3
SRAT: PXM 3 -> APIC 7 -> Node 3
SRAT: Node 0 PXM 0 100000-3fffffff
SRAT: Node 1 PXM 1 40000000-7fffffff
SRAT: Node 0 PXM 0 0-3fffffff
Bootmem setup node 0 0000000000000000-000000003fffffff
Bootmem setup node 1 0000000040000000-000000007ffeffff
Nvidia board detected. Ignoring ACPI timer override.
ACPI: PM-Timer IO Port: 0x4008
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
Processor #2 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
Processor #3 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] enabled)
Processor #4 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] enabled)
Processor #5 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] enabled)
Processor #6 15:1 APIC version 16
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] enabled)
Processor #7 15:1 APIC version 16
ACPI: IOAPIC (id[0x08] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 8, version 17, address 0xfec00000, GSI 0-23
ACPI: IOAPIC (id[0x09] address[0xfeaff000] gsi_base[24])
IOAPIC[1]: apic_id 9, version 17, address 0xfeaff000, GSI 24-47
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: BIOS IRQ0 pin2 override ignored.
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
Setting APIC routing to flat
Using ACPI (MADT) for SMP configuration information
Allocating PCI resources starting at 80000000 (gap: 80000000:7ec00000)
Checking aperture...
CPU 0: aperture @ ef54000000 size 32 MB
Aperture from northbridge cpu 0 too small (32 MB)
No AGP bridge found
Built 2 zonelists
Kernel command line: ro root=/dev/VolGroup00/LogVol00 console=ttyS0 console=tty1 3
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 3.579545 MHz PM timer.
time.c: Detected 2200.028 MHz processor.
Console: colour VGA+ 80x25
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Memory: 2054588k/2097088k available (2403k kernel code, 0k reserved, 885k data, 228k init)
Security Framework v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
selinux_register_security:  Registering secondary module capability
Capability LSM initialized as secondary
Mount-cache hash table entries: 256
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 0(2) -> Node 0 -> Core 0
Using local APIC timer interrupts.
Detected 12.500 MHz APIC timer.
Booting processor 1/1 rip 6000 rsp ffff81003ffa9f58
Initializing CPU#1
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 1(2) -> Node 0 -> Core 1
Dual Core AMD Opteron(tm) Processor 875 stepping 00
CPU 1: Syncing TSC to CPU 0.
Booting processor 2/2 rip 6000 rsp ffff81007ff31f58
Initializing CPU#2
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 2(2) -> Node 1 -> Core 0
Dual Core AMD Opteron(tm) Processor 875 stepping 00
CPU 2: Syncing TSC to CPU 0.
CPU 2: synchronized TSC with CPU 0 (last diff -121 cycles, maxerr 889 cycles)
Booting processor 3/3 rip 6000 rsp ffff810041491f58
Initializing CPU#3
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 3(2) -> Node 1 -> Core 1
Dual Core AMD Opteron(tm) Processor 875 stepping 00
CPU 3: Syncing TSC to CPU 0.
CPU 3: synchronized TSC with CPU 0 (last diff -121 cycles, maxerr 887 cycles)
Booting processor 4/4 rip 6000 rsp ffff8100414f5f58
Initializing CPU#4
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 4(2) -> Node 1 -> Core 0
Dual Core AMD Opteron(tm) Processor 875 stepping 00
CPU 4: Syncing TSC to CPU 0.
CPU 4: synchronized TSC with CPU 0 (last diff -119 cycles, maxerr 886 cycles)
Booting processor 5/5 rip 6000 rsp ffff810001efff58
Initializing CPU#5
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 5(2) -> Node 0 -> Core 1
Dual Core AMD Opteron(tm) Processor 875 stepping 00
CPU 5: Syncing TSC to CPU 0.
CPU 5: synchronized TSC with CPU 0 (last diff -124 cycles, maxerr 889 cycles)
Booting processor 6/6 rip 6000 rsp ffff81007fe9ff58
Initializing CPU#6
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 6(2) -> Node 0 -> Core 0
Dual Core AMD Opteron(tm) Processor 875 stepping 00
CPU 6: Syncing TSC to CPU 0.
CPU 6: synchronized TSC with CPU 0 (last diff -161 cycles, maxerr 1549 cycles)
Booting processor 7/7 rip 6000 rsp ffff810041501f58
Initializing CPU#7
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU 7(2) -> Node 0 -> Core 1
Dual Core AMD Opteron(tm) Processor 875 stepping 00
CPU 7: Syncing TSC to CPU 0.
CPU 7: synchronized TSC with CPU 0 (last diff -162 cycles, maxerr 1549 cycles)
Brought up 8 CPUs
Disabling vsyscall due to use of PM timer
time.c: Using PM based timekeeping.
testing NMI watchdog ... OK.
CPU 1: synchronized TSC with CPU 0 (last diff 4 cycles, maxerr 869 cycles)
checking if image is initramfs... it is
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20050309
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (0000:00)
PCI: Probing PCI hardware (bus 00)
PCI: Transparent bridge - 0000:00:09.0
ACPI: PCI Root Bridge [PCIB] (0000:80)
PCI: Probing PCI hardware (bus 80)
ACPI: PCI Interrupt Link [LNKA] (IRQs 16 17 18 19) *10
ACPI: PCI Interrupt Link [LNKB] (IRQs 16 17 18 19) *5
ACPI: PCI Interrupt Link [LNKC] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LNKD] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LNKE] (IRQs 16 17 18 19) *0, disabled.
ACPI: PCI Interrupt Link [LUS0] (IRQs 20 21 22) *9
ACPI: PCI Interrupt Link [LUS1] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [LUS2] (IRQs 20 21 22) *11
ACPI: PCI Interrupt Link [LKLN] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [LAUI] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [LKMO] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [LKSM] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [LTID] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [LTIE] (IRQs 20 21 22) *0, disabled.
ACPI: PCI Interrupt Link [LATA] (IRQs 20 21 22) *14
ACPI: PCI Interrupt Link [LN2A] (IRQs 40 41 42 43) *0, disabled.
ACPI: PCI Interrupt Link [LN2B] (IRQs 40 41 42 43) *0, disabled.
ACPI: PCI Interrupt Link [LN2C] (IRQs 40 41 42 43) *0, disabled.
ACPI: PCI Interrupt Link [LN2D] (IRQs 40 41 42 43) *0, disabled.
ACPI: PCI Interrupt Link [LK2N] (IRQs 44 45 46 47) *0, disabled.
ACPI: PCI Interrupt Link [LT3D] (IRQs 44 45 46 47) *0, disabled.
ACPI: PCI Interrupt Link [LT2E] (IRQs 44 45 46 47) *0, disabled.
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: PnP ACPI init
pnp: PnP ACPI: found 11 devices
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: If a device doesn't work, try "pci=routeirq".  If it helps, post a report
PCI-DMA: Disabling IOMMU.
pnp: 00:08: ioport range 0x680-0x6ff has been reserved
k8-bus.c: some busses have empty cpu mask
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
audit: initializing netlink socket (disabled)
audit(1118697207.709:0): initialized
Total HugeTLB memory allocated, 0
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
SELinux:  Registering netfilter hooks
Initializing Cryptographic API
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
pciehp: add_host_bridge: status 5
pciehp: add_host_bridge: status 5
pciehp: Fails to gain control of native hot-plug
pcie_portdrv_probe->Dev[005d:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
pcie_portdrv_probe->Dev[005d:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
pcie_portdrv_probe->Dev[005d:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
pcie_portdrv_probe->Dev[005d:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
pcie_portdrv_probe->Dev[005d:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
pcie_portdrv_probe->Dev[005d:10de] has invalid IRQ. Check vendor BIOS
assign_interrupt_mode Found MSI capability
Real Time Clock Driver v1.12
Linux agpgart interface v0.101 (c) Dave Jones
[drm] Initialized drm 1.0.0 20040925
PNP: PS/2 controller doesn't have KBD irq; using default 0x1
PNP: PS/2 Controller [PNP0f12:PS2M] at 0x60,0x64 irq 1,12
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
RAMDISK driver initialized: 16 RAM disks of 16384K size 1024 blocksize
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE-CK804: IDE controller at PCI slot 0000:00:06.0
NFORCE-CK804: chipset revision 162
NFORCE-CK804: not 100% native mode: will probe irqs later
NFORCE-CK804: 0000:00:06.0 (rev a2) UDMA133 controller
    ide0: BM-DMA at 0x6000-0x6007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0x6008-0x600f, BIOS settings: hdc:DMA, hdd:DMA
ide-floppy driver 0.99.newide
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.01:USB HID core driver
mice: PS/2 mouse device common for all mice
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 8192 buckets, 128Kbytes
TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 262144 bind 65536)
Initializing IPsec netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 17
powernow-k8: Found 8 AMD Athlon 64 / Opteron processors (version 1.40.2)
powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
cpu_init done, current fid 0xe, vid 0x8
powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
cpu_init done, current fid 0xe, vid 0x8
powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
cpu_init done, current fid 0xe, vid 0x8
powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
cpu_init done, current fid 0xe, vid 0x8
ACPI wakeup devices:
PS2M USB0 USB1  MAC P0P1 P0P2 P0P3 P0P4 P0P5 IO4B  MA4 BR5B BR5C BR5D BR5E PWRB
ACPI: (supports S0 S1 S5)
BIOS EDD facility v0.16 2004-Jun-25, 2 devices found
Freeing unused kernel memory: 228k freed
SCSI subsystem initialized
Fusion MPT base driver 3.02.18
Copyright (c) 1999-2005 LSI Logic Corporation
PCI: Enabling device 0000:04:01.0 (0116 -> 0117)
ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 19
ACPI: PCI Interrupt 0000:04:01.0[A] -> Link [LNKB] -> GSI 19 (level, low) -> IRQ 233
mptbase: Initiating ioc0 bringup
ioc0: SAS1064: Capabilities={Initiator}
Fusion MPT SCSI Host driver 3.02.18
scsi0 : ioc0: LSISAS1064, FwRev=00031400h, Ports=1, MaxQ=203, IRQ=233
  Vendor: SEAGATE   Model: ST973401LSUN72G   Rev: 0356
  Type:   Direct-Access                      ANSI SCSI revision: 03
SCSI device sda: 143374738 512-byte hdwr sectors (73408 MB)
SCSI device sda: drive cache: write through
SCSI device sda: 143374738 512-byte hdwr sectors (73408 MB)
SCSI device sda: drive cache: write through
 sda: sda1 sda2
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
  Vendor: ATA       Model: FUJITSU MHT2080B  Rev: 0043
  Type:   Direct-Access                      ANSI SCSI revision: 05
powernowk8_get() cpu is 0
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 156301488 512-byte hdwr sectors (80026 MB)
SCSI device sdb: drive cache: write back
 sdb:<3>powernowk8_get() cpu is 1
Unable to handle kernel NULL pointer dereference at 0000000000000024 RIP:
<ffffffff8011d94c>{query_current_values_with_pending_wait+60} sdb1 sdb2

PGD 3fe74067 PUD 3fd28067 PMD 0
Oops: 0002 [1] SMP
CPU 1
Modules linked in: mptscsih mptbase sd_mod scsi_mod
Pid: 25, comm: events/7 Not tainted 2.6.12-rc6andro
RIP: 0010:[<ffffffff8011d94c>] <ffffffff8011d94c>{query_current_values_with_pending_wait+60}
RSP: 0000:ffff81007fddbdc8  EFLAGS: 00010202
RAX: 000000000000000e RBX: 0000000000000000 RCX: 00000000c0010042
RDX: 0000000000000008 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 0000000000000080 R08: ffff81007fdda000 R09: ffff81003fd421f0
R10: 000000000000001c R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000283 R15: ffffffff80112630
FS:  000000000057d850(0000) GS:ffffffff80498180(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000024 CR3: 000000003f415000 CR4: 00000000000006e0
Process events/7 (pid: 25, threadinfo ffff81007fdda000, task ffff81003fd421f0)
Stack: 0000000000000080 ffffffff8011dea1 0000000000000001 ffff81003fd91430
       ffff81003fd91400 ffffffff802e2b90 0000000000000000 0000000000000003
       ffff81007fddbe48 0000000000000001
Call Trace:<ffffffff8011dea1>{powernowk8_get+145} <ffffffff802e2b90>{cpufreq_get+96}
       <ffffffff8011266a>{handle_cpufreq_delayed_get+58} <ffffffff80148eec>{worker_thread+476}
       <ffffffff801326d0>{default_wake_function+0} <ffffffff80130733>{__wake_up_common+67}
       <ffffffff80148d10>{worker_thread+0} <ffffffff8014d7a9>{kthread+217}
       <ffffffff80133be0>{schedule_tail+64} <ffffffff8010f5b7>{child_rip+8}
       <ffffffff8011d4f0>{flat_send_IPI_mask+0} <ffffffff8014d6d0>{kthread+0}
       <ffffffff8010f5af>{child_rip+0}

Code: 89 47 24 89 57 20 31 c0 48 83 c4 08 c3 66 66 66 90 66 66 90
RIP <ffffffff8011d94c>{query_current_values_with_pending_wait+60} RSP <ffff81007fddbdc8>
CR2: 0000000000000024
 <5>Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0
libata version 1.11 loaded.
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [discuss] [OOPS] powernow on smp dual core amd64
@ 2005-06-13 21:47 Langsdorf, Mark
  2005-06-13 22:20 ` Tom Duffy
  0 siblings, 1 reply; 15+ messages in thread
From: Langsdorf, Mark @ 2005-06-13 21:47 UTC (permalink / raw)
  To: Tom Duffy; +Cc: Linux Kernel Mailing List, discuss

[-- Attachment #1: Type: text/plain, Size: 3660 bytes --]

> powernow-k8: Found 8 AMD Athlon 64 / Opteron processors 
> (version 1.40.2)
> powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
> powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
> powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
> cpu_init done, current fid 0xe, vid 0x8
> powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
> powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
> powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
> cpu_init done, current fid 0xe, vid 0x8
> powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
> powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
> powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
> cpu_init done, current fid 0xe, vid 0x8
> powernow-k8:    0 : fid 0xe (2200 MHz), vid 0x8 (1350 mV)
> powernow-k8:    1 : fid 0xc (2000 MHz), vid 0xa (1300 mV)
> powernow-k8:    2 : fid 0xa (1800 MHz), vid 0xc (1250 mV)
> cpu_init done, current fid 0xe, vid 0x8

>  sdb:<3>powernowk8_get() cpu is 1
> Unable to handle kernel NULL pointer dereference at 
> 0000000000000024 RIP: 
> <ffffffff8011d94c>{query_current_values_with_pending_wait+60} 
> sdb1 sdb2
> 
> PGD 3fe74067 PUD 3fd28067 PMD 0
> Oops: 0002 [1] SMP
> CPU 1
> Modules linked in: mptscsih mptbase sd_mod scsi_mod
> Pid: 25, comm: events/7 Not tainted 2.6.12-rc6andro
> RIP: 0010:[<ffffffff8011d94c>] 
> <ffffffff8011d94c>{query_current_values_with_pending_wait+60}
> RSP: 0000:ffff81007fddbdc8  EFLAGS: 00010202
> RAX: 000000000000000e RBX: 0000000000000000 RCX: 00000000c0010042
> RDX: 0000000000000008 RSI: 0000000000000001 RDI: 0000000000000000
> RBP: 0000000000000080 R08: ffff81007fdda000 R09: ffff81003fd421f0
> R10: 000000000000001c R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000283 R15: ffffffff80112630
> FS:  000000000057d850(0000) GS:ffffffff80498180(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> CR2: 0000000000000024 CR3: 000000003f415000 CR4: 
> 00000000000006e0 Process events/7 (pid: 25, threadinfo 
> ffff81007fdda000, task ffff81003fd421f0)
> Stack: 0000000000000080 ffffffff8011dea1 0000000000000001 
> ffff81003fd91430
>        ffff81003fd91400 ffffffff802e2b90 0000000000000000 
> 0000000000000003
>        ffff81007fddbe48 0000000000000001
> Call Trace:<ffffffff8011dea1>{powernowk8_get+145} 
> <ffffffff802e2b90>{cpufreq_get+96}
>        <ffffffff8011266a>{handle_cpufreq_delayed_get+58} 
> <ffffffff80148eec>{worker_thread+476}
>        <ffffffff801326d0>{default_wake_function+0} 
> <ffffffff80130733>{__wake_up_common+67}
>        <ffffffff80148d10>{worker_thread+0} 
> <ffffffff8014d7a9>{kthread+217}
>        <ffffffff80133be0>{schedule_tail+64} 
> <ffffffff8010f5b7>{child_rip+8}
>        <ffffffff8011d4f0>{flat_send_IPI_mask+0} 
> <ffffffff8014d6d0>{kthread+0}
>        <ffffffff8010f5af>{child_rip+0}
> 
> Code: 89 47 24 89 57 20 31 c0 48 83 c4 08 c3 66 66 66 90 66 
> 66 90 RIP 
> <ffffffff8011d94c>{query_current_values_with_pending_wait+60} 
> RSP <ffff81007fddbdc8>
> CR2: 0000000000000024
>  <5>Attached scsi disk sdb at scsi0, channel 0, id 1, lun 0 

Okay, I think I have figured this out.  During initialization,
the cpufreq infrastruture only initializes the first core of
each processor.  When a request comes into the second core,
it's data structre is unitialized and we get the null point
dereference.

The solution is to assign the pointer to the data structure for
the first core to all the other cores.

Tom, could you try this patch and see if it helps?

-Mark Langsdorf
AMD, Inc.

[-- Attachment #2: jhpn-2.6.12-rc6.patch --]
[-- Type: application/octet-stream, Size: 837 bytes --]

--- linux-2.6.12-rc6/arch/i386/kernel/cpu/cpufreq/powernow-k8.c.old	2005-06-12 17:41:55.123651184 -0500
+++ linux-2.6.12-rc6/arch/i386/kernel/cpu/cpufreq/powernow-k8.c	2005-06-12 17:46:32.780440936 -0500
@@ -44,7 +44,7 @@
 
 #define PFX "powernow-k8: "
 #define BFX PFX "BIOS error: "
-#define VERSION "version 1.40.2"
+#define VERSION "version 1.40.4"
 #include "powernow-k8.h"
 
 /* serialize freq changes  */
@@ -978,7 +978,7 @@
 {
 	struct powernow_k8_data *data;
 	cpumask_t oldmask = CPU_MASK_ALL;
-	int rc;
+	int rc, i;
 
 	if (!check_supported_cpu(pol->cpu))
 		return -ENODEV;
@@ -1064,7 +1064,9 @@
 	printk("cpu_init done, current fid 0x%x, vid 0x%x\n",
 	       data->currfid, data->currvid);
 
-	powernow_data[pol->cpu] = data;
+	for_each_cpu_mask(i, cpu_core_map[pol->cpu]) {
+		powernow_data[i] = data;
+	}
 
 	return 0;
 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 21:47 [discuss] [OOPS] powernow on smp dual core amd64 Langsdorf, Mark
@ 2005-06-13 22:20 ` Tom Duffy
  2005-06-13 22:38   ` Andi Kleen
  2005-06-13 23:17   ` Zachary Amsden
  0 siblings, 2 replies; 15+ messages in thread
From: Tom Duffy @ 2005-06-13 22:20 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: discuss, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 2999 bytes --]

On Mon, 2005-06-13 at 16:47 -0500, Langsdorf, Mark wrote:
> Okay, I think I have figured this out.  During initialization,
> the cpufreq infrastruture only initializes the first core of
> each processor.  When a request comes into the second core,
> it's data structre is unitialized and we get the null point
> dereference.
> 
> The solution is to assign the pointer to the data structure for
> the first core to all the other cores.
> 
> Tom, could you try this patch and see if it helps?

Yes!  It fixed the panic.  I get much further.

Thanks!

Unfortunately, after starting cpuspeed daemon, I get this:

Starting cpuspeed: [  OK  ]
Starting pcmcia:  Starting PCMCIA services:
CPU 6: Machine Check Exception:                4 Bank 4: b200000000070f0f
TSC 4129a3d70d
Kernel panic - not syncing: Machine check
 <1>Unable to handle kernel NULL pointer dereference at 00000000000000ff RIP:
[<00000000000000ff>]
PGD 41460067 PUD 3f12c067 PMD 0
Oops: 0010 [1] SMP
CPU 6
Modules linked in: video container button battery ac ohci_hcd ehci_hcd i2c_nforce2 i2c_core shpchp usbnet mii dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sata_nv libata mptscsih mptbase sd_mod scsi_mod
Pid: 1672, comm: usb.agent Tainted: G   M  2.6.12-rc6andro
RIP: 0010:[<00000000000000ff>] [<00000000000000ff>]
RSP: 0000:ffff81003fe63fa0  EFLAGS: 00010006
RAX: ffff81007f5b1fd8 RBX: 0000000000000008 RCX: 0000ffff0000ffff
RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000ffff0000ffff
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000004129a3cf88 R14: ffffffff80370a7d R15: 0000000000000001
FS:  00002aaaaaae0ee0(0000) GS:ffffffff80498400(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000ff CR3: 000000003e7e5000 CR4: 00000000000006e0
Process usb.agent (pid: 1672, threadinfo ffff81007f5b0000, task ffff81007fddcff0)
Stack: ffffffff8011ac99 ffffffff80370a7d ffffffff8010f247 ffff81007fea2c88  <EOI>
       0000000000000005 0000000000000030 00000000000000fa 0000000000000000
       ffffffff8011a860 0000000000000000
Call Trace: <IRQ> <ffffffff8011ac99>{smp_call_function_interrupt+73}
       <ffffffff8010f247>{call_function_interrupt+99}  <EOI>  <#MC> <ffffffff8011a860>{smp_really_stop_cpu+0}
       <ffffffff8011aa68>{smp_send_stop+72} <ffffffff80136ebc>{panic+140}
       <ffffffff80115ce8>{print_mce+136} <ffffffff80115db9>{mce_panic+137}
       <ffffffff8011643f>{do_machine_check+767} <ffffffff8010facb>{machine_check+127}
       <ffffffff8010f24c>{apic_timer_interrupt+0}  <EOE> <ffffffff801224dc>{do_page_fault+1196}
       <ffffffff801224d9>{do_page_fault+1193} <ffffffff8018c14e>{sys_newstat+46}       <ffffffff8010f401>{error_exit+0}

Code:  Bad RIP value.
RIP [<00000000000000ff>] RSP <ffff81003fe63fa0>
CR2: 00000000000000ff
 <0>Kernel panic - not syncing: Oops

The machine resets itself after this.

-tduffy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 22:20 ` Tom Duffy
@ 2005-06-13 22:38   ` Andi Kleen
  2005-06-13 23:17   ` Zachary Amsden
  1 sibling, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2005-06-13 22:38 UTC (permalink / raw)
  To: Tom Duffy; +Cc: Langsdorf, Mark, discuss, Linux Kernel Mailing List

On Mon, Jun 13, 2005 at 03:20:45PM -0700, Tom Duffy wrote:
> On Mon, 2005-06-13 at 16:47 -0500, Langsdorf, Mark wrote:
> > Okay, I think I have figured this out.  During initialization,
> > the cpufreq infrastruture only initializes the first core of
> > each processor.  When a request comes into the second core,
> > it's data structre is unitialized and we get the null point
> > dereference.
> > 
> > The solution is to assign the pointer to the data structure for
> > the first core to all the other cores.
> > 
> > Tom, could you try this patch and see if it helps?
> 
> Yes!  It fixed the panic.  I get much further.
> 
> Thanks!
> 
> Unfortunately, after starting cpuspeed daemon, I get this:
> 
> Starting cpuspeed: [  OK  ]
> Starting pcmcia:  Starting PCMCIA services:
> CPU 6: Machine Check Exception:                4 Bank 4: b200000000070f0f
> TSC 4129a3d70d

it is 

CPU 6 4 northbridge
  Northbridge Watchdog error
       bit57 = processor context corrupt
       bit61 = error uncorrected
  bus error 'generic participation, request timed out
      generic error mem transaction
      generic access, level generic'
STATUS b200000000070f0f MCGSTATUS 4

Something tried to access a physical memory address that was not
mapped in the CPU.




> Kernel panic - not syncing: Machine check
>  <1>Unable to handle kernel NULL pointer dereference at 00000000000000ff RIP:
> [<00000000000000ff>]
> PGD 41460067 PUD 3f12c067 PMD 0
> Oops: 0010 [1] SMP
> CPU 6
> Modules linked in: video container button battery ac ohci_hcd ehci_hcd i2c_nforce2 i2c_core shpchp usbnet mii dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod sata_nv libata mptscsih mptbase sd_mod scsi_mod
> Pid: 1672, comm: usb.agent Tainted: G   M  2.6.12-rc6andro
> RIP: 0010:[<00000000000000ff>] [<00000000000000ff>]
> RSP: 0000:ffff81003fe63fa0  EFLAGS: 00010006
> RAX: ffff81007f5b1fd8 RBX: 0000000000000008 RCX: 0000ffff0000ffff
> RDX: 00000000000000ff RSI: 0000000000000008 RDI: 0000ffff0000ffff
> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000004129a3cf88 R14: ffffffff80370a7d R15: 0000000000000001
> FS:  00002aaaaaae0ee0(0000) GS:ffffffff80498400(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000000ff CR3: 000000003e7e5000 CR4: 00000000000006e0
> Process usb.agent (pid: 1672, threadinfo ffff81007f5b0000, task ffff81007fddcff0)
> Stack: ffffffff8011ac99 ffffffff80370a7d ffffffff8010f247 ffff81007fea2c88  <EOI>
>        0000000000000005 0000000000000030 00000000000000fa 0000000000000000
>        ffffffff8011a860 0000000000000000
> Call Trace: <IRQ> <ffffffff8011ac99>{smp_call_function_interrupt+73}
>        <ffffffff8010f247>{call_function_interrupt+99}  <EOI>  <#MC> <ffffffff8011a860>{smp_really_stop_cpu+0}
>        <ffffffff8011aa68>{smp_send_stop+72} <ffffffff80136ebc>{panic+140}

Looks like the machine was too confused after the MCE to even
run panic correctly. I will investigate.

But of course fixing that will not fix the cause of the MCE.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [discuss] [OOPS] powernow on smp dual core amd64
@ 2005-06-13 22:44 Langsdorf, Mark
  2005-06-13 22:47 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Langsdorf, Mark @ 2005-06-13 22:44 UTC (permalink / raw)
  To: Tom Duffy; +Cc: discuss, Linux Kernel Mailing List

> On Mon, 2005-06-13 at 16:47 -0500, Langsdorf, Mark wrote:
> > Okay, I think I have figured this out.  During initialization, the 
> > cpufreq infrastruture only initializes the first core of each 
> > processor.  When a request comes into the second core, it's data 
> > structre is unitialized and we get the null point dereference.
> > 
> > The solution is to assign the pointer to the data structure for the 
> > first core to all the other cores.
> > 
> > Tom, could you try this patch and see if it helps?
> 
> Yes!  It fixed the panic.  I get much further.

Great, I'll test that some more then submit it.

> Unfortunately, after starting cpuspeed daemon, I get this:

It looks like it's happening sometime after cpuspeed starts.
Could you disable cpuspeed and see if the problem still
occurs? 

> Starting cpuspeed: [  OK  ]
> Starting pcmcia:  Starting PCMCIA services:
> CPU 6: Machine Check Exception:                4 Bank 4: 
> b200000000070f0f
> TSC 4129a3d70d

> Code:  Bad RIP value.
> RIP [<00000000000000ff>] RSP <ffff81003fe63fa0>
> CR2: 00000000000000ff
>  <0>Kernel panic - not syncing: Oops

Andi said that "Something tried to access a physical memory 
address that was not mapped in the CPU."  Andi, is this
related to the bug that you thought might have been fixed
in 2.6.12-rc6-git4?

-Mark Langsdorf
AMD, Inc.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 22:44 Langsdorf, Mark
@ 2005-06-13 22:47 ` Andi Kleen
  2005-06-13 22:58 ` Tom Duffy
  2005-06-14 18:19 ` Tom Duffy
  2 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2005-06-13 22:47 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: Tom Duffy, discuss, Linux Kernel Mailing List

> > Unfortunately, after starting cpuspeed daemon, I get this:
> 
> It looks like it's happening sometime after cpuspeed starts.
> Could you disable cpuspeed and see if the problem still
> occurs? 

If anything it must be something in the kernel (except someone
does very strange stuff in /dev/mem) 

> 
> > Starting cpuspeed: [  OK  ]
> > Starting pcmcia:  Starting PCMCIA services:
> > CPU 6: Machine Check Exception:                4 Bank 4: 
> > b200000000070f0f
> > TSC 4129a3d70d
> 
> > Code:  Bad RIP value.
> > RIP [<00000000000000ff>] RSP <ffff81003fe63fa0>
> > CR2: 00000000000000ff
> >  <0>Kernel panic - not syncing: Oops
> 
> Andi said that "Something tried to access a physical memory 
> address that was not mapped in the CPU."  Andi, is this
> related to the bug that you thought might have been fixed
> in 2.6.12-rc6-git4?

No, I don't think so. In that bug something would be mapped
twice, not nothing. Also nobody reported watchdog timeouts
for that one.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 22:44 Langsdorf, Mark
  2005-06-13 22:47 ` Andi Kleen
@ 2005-06-13 22:58 ` Tom Duffy
  2005-06-13 23:35   ` Andi Kleen
  2005-06-14 18:19 ` Tom Duffy
  2 siblings, 1 reply; 15+ messages in thread
From: Tom Duffy @ 2005-06-13 22:58 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: Linux Kernel Mailing List, discuss

[-- Attachment #1: Type: text/plain, Size: 270 bytes --]

On Mon, 2005-06-13 at 17:44 -0500, Langsdorf, Mark wrote:
> It looks like it's happening sometime after cpuspeed starts.
> Could you disable cpuspeed and see if the problem still
> occurs? 

Yup, disabling cpuspeed will allow me to get up to multiuser.

-tduffy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 22:20 ` Tom Duffy
  2005-06-13 22:38   ` Andi Kleen
@ 2005-06-13 23:17   ` Zachary Amsden
  2005-06-13 23:34     ` Andi Kleen
  1 sibling, 1 reply; 15+ messages in thread
From: Zachary Amsden @ 2005-06-13 23:17 UTC (permalink / raw)
  To: Tom Duffy; +Cc: Langsdorf, Mark, discuss, Linux Kernel Mailing List

Tom Duffy wrote:

>On Mon, 2005-06-13 at 16:47 -0500, Langsdorf, Mark wrote:
>  
>
>>Okay, I think I have figured this out.  During initialization,
>>the cpufreq infrastruture only initializes the first core of
>>each processor.  When a request comes into the second core,
>>it's data structre is unitialized and we get the null point
>>dereference.
>>
>>The solution is to assign the pointer to the data structure for
>>the first core to all the other cores.
>>
>>Tom, could you try this patch and see if it helps?
>>    
>>
>
>Yes!  It fixed the panic.  I get much further.
>
>Thanks!
>
>Unfortunately, after starting cpuspeed daemon, I get this:
>
>Starting cpuspeed: [  OK  ]
>Starting pcmcia:  Starting PCMCIA services:
>CPU 6: Machine Check Exception:                4 Bank 4: b200000000070f0f
>TSC 4129a3d70d
>Kernel panic - not syncing: Machine check
> <1>Unable to handle kernel NULL pointer dereference at 00000000000000ff RIP:
>[<00000000000000ff>]
>  
>

asmlinkage void smp_call_function_interrupt(void)
{
        void (*func) (void *info) = call_data->func;
        void *info = call_data->info;
        int wait = call_data->wait;

        ack_APIC_irq();
        /*
         * Notify initiating CPU that I've grabbed the data and am
         * about to execute the function
         */
        mb();
        atomic_inc(&call_data->started);
        /*
         * At this point the info structure may be out of scope unless 
wait==1
         */
        irq_enter();
        (*func)(info);  <--- passed bogus data

Looks like you jumped through a bogus function pointer.  I'm guessing it 
has something to do with an unitialized IRQ vector for the CPU speed on 
one of the cores (simply because it seems somewhat plausible):

extern u8 irq_vector[NR_IRQ_VECTORS];
#define IO_APIC_VECTOR(irq)     (irq_vector[irq])
#define AUTO_ASSIGN             -1

So irq_vector[AUTO_ASSIGN] = 0xff which could have somehow made it into 
your function pointer.

Just a theory.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 23:17   ` Zachary Amsden
@ 2005-06-13 23:34     ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2005-06-13 23:34 UTC (permalink / raw)
  To: Zachary Amsden
  Cc: Tom Duffy, Langsdorf, Mark, discuss, Linux Kernel Mailing List

> asmlinkage void smp_call_function_interrupt(void)
> {
>        void (*func) (void *info) = call_data->func;
>        void *info = call_data->info;
>        int wait = call_data->wait;
> 
>        ack_APIC_irq();
>        /*
>         * Notify initiating CPU that I've grabbed the data and am
>         * about to execute the function
>         */
>        mb();
>        atomic_inc(&call_data->started);
>        /*
>         * At this point the info structure may be out of scope unless 
> wait==1
>         */
>        irq_enter();
>        (*func)(info);  <--- passed bogus data
> 
> Looks like you jumped through a bogus function pointer.  I'm guessing it 
> has something to do with an unitialized IRQ vector for the CPU speed on 
> one of the cores (simply because it seems somewhat plausible):

What should a "IRQ vector for the CPU speed" be?

> 
> extern u8 irq_vector[NR_IRQ_VECTORS];
> #define IO_APIC_VECTOR(irq)     (irq_vector[irq])
> #define AUTO_ASSIGN             -1
> 
> So irq_vector[AUTO_ASSIGN] = 0xff which could have somehow made it into 
> your function pointer.


Yes, but it is hard to see how that should happen short of massive
memory corruption. call_data is a global variable even.

However after a MCE things can be a bit instable. Maybe it would
be best to use a streamlined panic in this case that doesn't touch
the other CPUs.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 22:58 ` Tom Duffy
@ 2005-06-13 23:35   ` Andi Kleen
  0 siblings, 0 replies; 15+ messages in thread
From: Andi Kleen @ 2005-06-13 23:35 UTC (permalink / raw)
  To: Tom Duffy; +Cc: Langsdorf, Mark, Linux Kernel Mailing List, discuss

On Mon, Jun 13, 2005 at 03:58:46PM -0700, Tom Duffy wrote:
> On Mon, 2005-06-13 at 17:44 -0500, Langsdorf, Mark wrote:
> > It looks like it's happening sometime after cpuspeed starts.
> > Could you disable cpuspeed and see if the problem still
> > occurs? 
> 
> Yup, disabling cpuspeed will allow me to get up to multiuser.

That just means the powernow driver is not actually used.

-Andi

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [discuss] [OOPS] powernow on smp dual core amd64
  2005-06-13 22:44 Langsdorf, Mark
  2005-06-13 22:47 ` Andi Kleen
  2005-06-13 22:58 ` Tom Duffy
@ 2005-06-14 18:19 ` Tom Duffy
  2 siblings, 0 replies; 15+ messages in thread
From: Tom Duffy @ 2005-06-14 18:19 UTC (permalink / raw)
  To: Langsdorf, Mark; +Cc: Linux Kernel Mailing List, discuss

[-- Attachment #1: Type: text/plain, Size: 355 bytes --]

On Mon, 2005-06-13 at 17:44 -0500, Langsdorf, Mark wrote:
> > > Tom, could you try this patch and see if it helps?
> > 
> > Yes!  It fixed the panic.  I get much further.
> 
> Great, I'll test that some more then submit it.

I would like it if this patch could make it into 2.6.12 before it is
released.

Any possibility?

Thanks,

-tduffy

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2005-06-14 18:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-06-13 21:47 [discuss] [OOPS] powernow on smp dual core amd64 Langsdorf, Mark
2005-06-13 22:20 ` Tom Duffy
2005-06-13 22:38   ` Andi Kleen
2005-06-13 23:17   ` Zachary Amsden
2005-06-13 23:34     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2005-06-13 22:44 Langsdorf, Mark
2005-06-13 22:47 ` Andi Kleen
2005-06-13 22:58 ` Tom Duffy
2005-06-13 23:35   ` Andi Kleen
2005-06-14 18:19 ` Tom Duffy
     [not found] <84EA05E2CA77634C82730353CBE3A84301CFC14B@SAUSEXMB1.amd.com>
2005-06-13 21:27 ` Tom Duffy
2005-06-10 19:48 Langsdorf, Mark
2005-06-10 20:01 ` Andi Kleen
2005-06-09 23:46 Tom Duffy
2005-06-10 16:53 ` [discuss] " Andi Kleen
2005-06-10 18:46   ` Tom Duffy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox