* Re: APIC error on 32-bit kernel
[not found] <20070323200817.1f3e39b9@osprey.hogchain.net>
@ 2007-04-08 18:38 ` Jay Cliburn
2007-04-09 17:59 ` Chuck Ebbert
2007-05-12 3:28 ` Len Brown
0 siblings, 2 replies; 5+ messages in thread
From: Jay Cliburn @ 2007-04-08 18:38 UTC (permalink / raw)
To: netdev; +Cc: linux-kernel
[Adding linux-kernel to the cc list, hoping for wider exposure.]
On Fri, 23 Mar 2007 20:08:17 -0500
Jay Cliburn <jacliburn@bellsouth.net> wrote:
> We're trying to track down the source of a problem that occurs
> whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
> We can load the driver just fine, but whenever we activate the
> network, we see APIC errors (a sample of them are shown here,
> captured from a serial console):
>
> [root@hawk ~]# echo 8 > /proc/sys/kernel/printk
> [root@hawk ~]# [ 93.942012] process `sysctl' is using deprecated
> sysctl (sysc.
> [ 94.396609] atl1: eth0 link is up 1000 Mbps full duplex
> [ 94.498887] APIC error on CPU0: 00(08)
> [ 94.498534] APIC error on CPU1: 00(08)
> [ 94.550079] APIC error on CPU0: 08(08)
> [ 94.549725] APIC error on CPU1: 08(08)
> [ 94.600915] APIC error on CPU1: 08(08)
> [ 94.601276] APIC error on CPU0: 08(08)
> [ 94.652108] APIC error on CPU1: 08(08)
> [ 94.652470] APIC error on CPU0: 08(08)
> [ 94.703659] APIC error on CPU0: 08(08)
> [ 94.703305] APIC error on CPU1: 08(08)
> [ 94.754852] APIC error on CPU0: 08(40)
> [ 94.806045] APIC error on CPU0: 40(08)
> [ 94.805692] APIC error on CPU1: 08(08)
> [ 94.857238] APIC error on CPU0: 08(08)
> [ 94.856884] APIC error on CPU1: 08(08)
> [ 94.908432] APIC error on CPU0: 08(08)
> [ 94.908078] APIC error on CPU1: 08(08)
> [snip, more of the same]
> [ 98.901156] APIC error on CPU1: 08(08)
> [ 98.952702] APIC error on CPU0: 08(08)
> [ 98.952349] APIC error on CPU1: 08(08)
> [ 99.003895] APIC error on CPU0: 08(08)
> [ 99.003542] APIC error on CPU1: 08(08)
>
> The machine hangs for about 5-10 seconds, then spontaneously reboots
> without further console output.
I can prompt an oops by pinging my router while the apic errors are
scrolling by.
>
> This is an Asus M2V (Via K8T890) motherboard.
>
> The problem does not occur on a 32-bit kernel if we boot with
> pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
> motherboard.
>
> We also do not see this problem on Intel-based motherboards, with
> either 32- or 64-bit kernels.
A full raft of documentation -- including acpidump and
linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
(with apic=debug boot option), dmesg, and /proc/interrupts -- is
available at http://www.hogchain.net/m2v/apic-problem/
If this is a motherboard problem, that's fine; I'd just like to know
the details so I tell users something more than "it's a motherboard
problem."
Thanks,
Jay
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: APIC error on 32-bit kernel
2007-04-08 18:38 ` APIC error on 32-bit kernel Jay Cliburn
@ 2007-04-09 17:59 ` Chuck Ebbert
2007-04-09 19:04 ` Jay Cliburn
2007-05-12 3:28 ` Len Brown
1 sibling, 1 reply; 5+ messages in thread
From: Chuck Ebbert @ 2007-04-09 17:59 UTC (permalink / raw)
To: Jay Cliburn; +Cc: netdev, linux-kernel
Jay Cliburn wrote:
> [Adding linux-kernel to the cc list, hoping for wider exposure.]
>
> On Fri, 23 Mar 2007 20:08:17 -0500
> Jay Cliburn <jacliburn@bellsouth.net> wrote:
>
>> We're trying to track down the source of a problem that occurs
>> whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
>
> and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
>
>> We can load the driver just fine, but whenever we activate the
>> network, we see APIC errors (a sample of them are shown here,
>> captured from a serial console):
>>
>> [root@hawk ~]# echo 8 > /proc/sys/kernel/printk
>> [root@hawk ~]# [ 93.942012] process `sysctl' is using deprecated
>> sysctl (sysc.
>> [ 94.396609] atl1: eth0 link is up 1000 Mbps full duplex
>> [ 94.498887] APIC error on CPU0: 00(08)
>> [ 94.498534] APIC error on CPU1: 00(08)
>> [ 94.550079] APIC error on CPU0: 08(08)
>> [ 94.549725] APIC error on CPU1: 08(08)
>> [ 94.600915] APIC error on CPU1: 08(08)
>> [ 94.601276] APIC error on CPU0: 08(08)
>> [ 94.652108] APIC error on CPU1: 08(08)
>> [ 94.652470] APIC error on CPU0: 08(08)
>> [ 94.703659] APIC error on CPU0: 08(08)
>> [ 94.703305] APIC error on CPU1: 08(08)
>> [ 94.754852] APIC error on CPU0: 08(40)
>> [ 94.806045] APIC error on CPU0: 40(08)
>> [ 94.805692] APIC error on CPU1: 08(08)
>> [ 94.857238] APIC error on CPU0: 08(08)
>> [ 94.856884] APIC error on CPU1: 08(08)
>> [ 94.908432] APIC error on CPU0: 08(08)
>> [ 94.908078] APIC error on CPU1: 08(08)
>> [snip, more of the same]
>> [ 98.901156] APIC error on CPU1: 08(08)
>> [ 98.952702] APIC error on CPU0: 08(08)
>> [ 98.952349] APIC error on CPU1: 08(08)
>> [ 99.003895] APIC error on CPU0: 08(08)
>> [ 99.003542] APIC error on CPU1: 08(08)
>>
>> The machine hangs for about 5-10 seconds, then spontaneously reboots
>> without further console output.
>
> I can prompt an oops by pinging my router while the apic errors are
> scrolling by.
Where is the text of the oops?
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: APIC error on 32-bit kernel
2007-04-09 17:59 ` Chuck Ebbert
@ 2007-04-09 19:04 ` Jay Cliburn
0 siblings, 0 replies; 5+ messages in thread
From: Jay Cliburn @ 2007-04-09 19:04 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: netdev, linux-kernel
Chuck Ebbert wrote:
> Where is the text of the oops?
In one of the files on the website I referenced. Here's the text...
[ 173.584000] APIC error on CPU1: 08(08)
[ 173.665000] APIC error on CPU0: 08(08)
[ 173.665000] APIC error on CPU1: 08(08)
[ 173.746000] APIC error on CPU0: 08(08)
[ 173.746000] APIC error on CPU1: 08(08)
[ 173.827000] APIC error on CPU0: 08(08)
[ 173.827000] APIC error on CPU1: 08(08)
[ 173.908000] APIC error on CPU0: 08(08)
[ 173.908000] APIC error on CPU1: 08(08)
[ 173.989000] APIC error on CPU0: 08(08)
[ 173.989000] APIC error on CPU1: 08(08)
pinged my router somewhere along about here...
[ 174.069000] BUG: unable to handle kernel NULL pointer
dereference<1>BUG: unable to 0
[ 174.069000] printing eip:
[ 174.069000] 00000000
[ 174.069000] *pde = 1feb8067
[ 174.069000] Oops: 0000 [#1]
[ 174.069000] SMP
[ 174.069000] Modules linked in: nf_conntrack_netbios_ns ipt_REJECT
nf_conntrack_ipv4d
[ 174.069000] CPU: 1
[ 174.069000] EIP: 0060:[<00000000>] Not tainted VLI
[ 174.069000] EFLAGS: 00010006 (2.6.21-rc5-git1 #1)
[ 174.069000] EIP is at 0x0
[ 174.069000] eax: 000000a0 ebx: dfe99f98 ecx: c07bb000 edx: c074de00
[ 174.069000] esi: 000000a0 edi: 00000000 ebp: 00000000 esp: c07bbffc
[ 174.069000] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
[ 174.069000] Process beagled-helper (pid: 3393, ti=c07bb000
task=dfe28270 task.ti=df)
[ 174.069000] Stack: c040704b
[ 174.069000] Call Trace:
[ 174.069000] [<c040704b>] do_IRQ+0xac/0xd1
[ 174.069000] [<c040580e>] common_interrupt+0x2e/0x34
[ 174.069000] =======================
[ 174.069000] Code: Bad EIP value.
[ 174.069000] EIP: [<00000000>] 0x0 SS:ESP 0068:c07bbffc
[ 174.069000] Kernel panic - not syncing: Fatal exception in interrupt
[ 174.069000] BUG: at arch/i386/kernel/smp.c:546 smp_call_function()
[ 174.069000] [<c0417b4f>] smp_call_function+0x5c/0xc8
[ 174.069000] [<c054052e>] do_unblank_screen+0x2a/0x120
[ 174.069000] [<c0417bd6>] smp_send_stop+0x1b/0x2e
[ 174.069000] [<c04271ca>] panic+0x54/0xf2
[ 174.069000] [<c04062c5>] die+0x1f8/0x22c
[ 174.069000] [<c0623d13>] do_page_fault+0x40c/0x4df
[ 174.069000] [<c0623907>] do_page_fault+0x0/0x4df
[ 174.069000] [<c0622574>] error_code+0x7c/0x84
[ 174.069000] [<c040704b>] do_IRQ+0xac/0xd1
[ 174.069000] [<c040580e>] common_interrupt+0x2e/0x34
[ 174.069000] =======================
[ 174.069000] at virtual address 00000000
[ 174.069000] printing eip:
[ 174.069000] 00000000
[ 174.069000] *pde = 20bd3067
[ 174.069000] Oops: 0000 [#2]
[ 174.069000] SMP
[ 174.069000] Modules linked in: nf_conntrack_netbios_ns ipt_REJECT
nf_conntrack_ipv4d
[ 174.069000] CPU: 0
[ 174.069000] EIP: 0060:[<00000000>] Not tainted VLI
[ 174.069000] EFLAGS: 00010087 (2.6.21-rc5-git1 #1)
[ 174.069000] EIP is at 0x0
[ 174.069000] eax: 000000a0 ebx: c0753f74 ecx: c07ba000 edx: c074de00
[ 174.069000] esi: 000000a0 edi: 00000000 ebp: 00000000 esp: c07baffc
[ 174.069000] ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0068
[ 174.069000] Process swapper (pid: 0, ti=c07ba000 task=c07094c0
task.ti=c0753000)
[ 174.069000] Stack: c040704b
[ 174.069000] Call Trace:
[ 174.069000] [<c040704b>] do_IRQ+0xac/0xd1
[ 174.069000] [<c040580e>] common_interrupt+0x2e/0x34
[ 174.069000] [<c0403c74>] default_idle+0x3d/0x54
[ 174.069000] [<c040339b>] cpu_idle+0xa3/0xbc
[ 174.069000] [<c0758a37>] start_kernel+0x45d/0x465
[ 174.069000] [<c07581ae>] unknown_bootoption+0x0/0x202
[ 174.069000] =======================
[ 174.069000] Code: Bad EIP value.
[ 174.069000] EIP: [<00000000>] 0x0 SS:ESP 0068:c07baffc
[ 174.069000] Kernel panic - not syncing: Fatal exception in interrupt
Short hang, then spontaneous reboot.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: APIC error on 32-bit kernel
2007-04-08 18:38 ` APIC error on 32-bit kernel Jay Cliburn
2007-04-09 17:59 ` Chuck Ebbert
@ 2007-05-12 3:28 ` Len Brown
2007-05-12 14:24 ` Jay Cliburn
1 sibling, 1 reply; 5+ messages in thread
From: Len Brown @ 2007-05-12 3:28 UTC (permalink / raw)
To: Jay Cliburn; +Cc: netdev, linux-kernel
> > We're trying to track down the source of a problem that occurs
> > whenever the atl1 network driver is activated on a 32-bit 2.6.21-rc4
>
> and -rc5, -rc6, 2.6.20.x, 2.6.19.3, and probably others.
>
> > We can load the driver just fine, but whenever we activate the
> > network, we see APIC errors (a sample of them are shown here,
> > captured from a serial console):
> >
> > [root@hawk ~]# echo 8 > /proc/sys/kernel/printk
> > [root@hawk ~]# [ 93.942012] process `sysctl' is using deprecated
> > sysctl (sysc.
> > [ 94.396609] atl1: eth0 link is up 1000 Mbps full duplex
> > [ 94.498887] APIC error on CPU0: 00(08)
> > [ 94.498534] APIC error on CPU1: 00(08)
> > [ 94.550079] APIC error on CPU0: 08(08)
> > [ 94.549725] APIC error on CPU1: 08(08)
> > [ 94.600915] APIC error on CPU1: 08(08)
> > [ 94.601276] APIC error on CPU0: 08(08)
> > [ 94.652108] APIC error on CPU1: 08(08)
> > [ 94.652470] APIC error on CPU0: 08(08)
> > [ 94.703659] APIC error on CPU0: 08(08)
> > [ 94.703305] APIC error on CPU1: 08(08)
> > [ 94.754852] APIC error on CPU0: 08(40)
> > [ 94.806045] APIC error on CPU0: 40(08)
/* Here is what the APIC error bits mean:
0: Send CS error
1: Receive CS error
2: Send accept error
3: Receive accept error
4: Reserved
5: Send illegal vector
6: Received illegal vector
7: Illegal register address
*/
So the 40 means the APIC got an illegal vector.
Certainly this is consistent with the fact that
the errors start when a specific device is being
used. I assume that device is using MSI?
Curious that it is different in 32-bit and 64-bit mode.
> > [ 94.805692] APIC error on CPU1: 08(08)
> > [ 94.857238] APIC error on CPU0: 08(08)
> > [ 94.856884] APIC error on CPU1: 08(08)
> > [ 94.908432] APIC error on CPU0: 08(08)
> > [ 94.908078] APIC error on CPU1: 08(08)
> > [snip, more of the same]
> > [ 98.901156] APIC error on CPU1: 08(08)
> > [ 98.952702] APIC error on CPU0: 08(08)
> > [ 98.952349] APIC error on CPU1: 08(08)
> > [ 99.003895] APIC error on CPU0: 08(08)
> > [ 99.003542] APIC error on CPU1: 08(08)
> >
> > The machine hangs for about 5-10 seconds, then spontaneously reboots
> > without further console output.
>
> I can prompt an oops by pinging my router while the apic errors are
> scrolling by.
>
> >
> > This is an Asus M2V (Via K8T890) motherboard.
> >
> > The problem does not occur on a 32-bit kernel if we boot with
> > pci=nomsi, and it doesn't occur at all on a 64-bit kernel on the same
> > motherboard.
pci=nomsi, works, okay...
> > We also do not see this problem on Intel-based motherboards, with
> > either 32- or 64-bit kernels.
>
> A full raft of documentation -- including acpidump and
> linux-firmware-kit output, console capture, kernel config, lspci -vvxxx
> (with apic=debug boot option), dmesg, and /proc/interrupts -- is
> available at http://www.hogchain.net/m2v/apic-problem/
[06Dh 109 2] Boot Architecture Flags : 0003
for what it is worth, the bit in ACPI that is used to
disable MSI support is not set -- so as far as the BIOS
is concerned, this system should support MSI.
Is it an add-in card, or lan-on-motherboard?
-Len
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: APIC error on 32-bit kernel
2007-05-12 3:28 ` Len Brown
@ 2007-05-12 14:24 ` Jay Cliburn
0 siblings, 0 replies; 5+ messages in thread
From: Jay Cliburn @ 2007-05-12 14:24 UTC (permalink / raw)
To: Len Brown; +Cc: netdev, linux-kernel
Thank you very much for looking at this, Len.
On Fri, 11 May 2007 23:28:58 -0400
Len Brown <lenb@kernel.org> wrote:
> > > [ 94.754852] APIC error on CPU0: 08(40)
> > > [ 94.806045] APIC error on CPU0: 40(08)
>
> /* Here is what the APIC error bits mean:
> 0: Send CS error
> 1: Receive CS error
> 2: Send accept error
> 3: Receive accept error
> 4: Reserved
> 5: Send illegal vector
> 6: Received illegal vector
> 7: Illegal register address
> */
>
> So the 40 means the APIC got an illegal vector.
> Certainly this is consistent with the fact that
> the errors start when a specific device is being
> used. I assume that device is using MSI?
Yes, the device is using MSI.
> Curious that it is different in 32-bit and 64-bit mode.
Agreed, although I had one user back in March report APIC errors on the
Asus M2V board while running Debian x86_64. I personally have never
encountered the problem under a 64-bit kernel, but I admit that just
might be random luck.
> > > We also do not see this problem on Intel-based motherboards, with
> > > either 32- or 64-bit kernels.
> >
> > A full raft of documentation -- including acpidump and
> > linux-firmware-kit output, console capture, kernel config, lspci
> > -vvxxx (with apic=debug boot option), dmesg, and /proc/interrupts
> > -- is available at http://www.hogchain.net/m2v/apic-problem/
>
>
> [06Dh 109 2] Boot Architecture Flags : 0003
>
> for what it is worth, the bit in ACPI that is used to
> disable MSI support is not set -- so as far as the BIOS
> is concerned, this system should support MSI.
>
> Is it an add-in card, or lan-on-motherboard?
This is a PCIe LAN-on-motherboard.
My goal is to understand whether this is a problem in the atl1 driver,
or a problem on the motherboard. If it's the former, obviously I want
to fix it. If it's the latter, then I want to disable MSI in the driver
when we discover we're running on this motherboard.
Thanks again for taking time to look at this. Any advice or hints you
provide will be greatly appreciated.
Jay
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2007-05-12 14:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20070323200817.1f3e39b9@osprey.hogchain.net>
2007-04-08 18:38 ` APIC error on 32-bit kernel Jay Cliburn
2007-04-09 17:59 ` Chuck Ebbert
2007-04-09 19:04 ` Jay Cliburn
2007-05-12 3:28 ` Len Brown
2007-05-12 14:24 ` Jay Cliburn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox