* skge: PCI error cmd=0x117 status=0x22b0
@ 2008-11-14 15:36 Jan Kasprzak
2008-11-14 16:42 ` Stephen Hemminger
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kasprzak @ 2008-11-14 15:36 UTC (permalink / raw)
To: shemminger; +Cc: netdev
Hello,
I have an ASUS M2R32-MVP board with the skge network card. From time to time
the network freezes with the following message (this one taken from
2.6.27-rc7, another one added even with timing information at the end
of this mail):
skge 0000:03:04.0: PCI error cmd=0x117 status=0x22b0
skge 0000:03:04.0: unable to clear error (so ignoring them)
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x11b/0x1ab()
NETDEV WATCHDOG: eth0 (skge): transmit timed out
Modules linked in: ati_remote bnep rfcomm l2cap ext4dev jbd2 crc16 floppy btusb bluetooth
Pid: 6, comm: ksoftirqd/1 Not tainted 2.6.27-rc7 #1
Call Trace:
<IRQ> [<ffffffff80245558>] warn_slowpath+0xb4/0xdc
[<ffffffff8023bab2>] update_curr+0x3b/0x57
[<ffffffff8023bab2>] update_curr+0x3b/0x57
[<ffffffff8023f262>] enqueue_entity+0x1b/0xbc
[<ffffffff80344ef9>] __next_cpu+0x19/0x26
[<ffffffff8023f4ce>] tg_shares_up+0x172/0x18d
[<ffffffff8023bb38>] place_entity+0x6a/0xa3
[<ffffffff8023bf6d>] activate_task+0x29/0x3b
[<ffffffff80240d81>] try_to_wake_up+0x16c/0x17e
[<ffffffff8024f570>] signal_wake_up+0x24/0x33
[<ffffffff80250555>] send_sigqueue+0x10e/0x11d
[<ffffffff8049ba5f>] dev_watchdog+0x11b/0x1ab
[<ffffffff80256604>] posix_timer_fn+0xa0/0xab
[<ffffffff80256564>] posix_timer_fn+0x0/0xab
[<ffffffff8049b944>] dev_watchdog+0x0/0x1ab
[<ffffffff8024d5a7>] run_timer_softirq+0x151/0x1bf
[<ffffffff80249ced>] __do_softirq+0x63/0xcc
[<ffffffff80220f6c>] call_softirq+0x1c/0x28
<EOI> [<ffffffff802228fb>] do_softirq+0x2c/0x68
[<ffffffff80249900>] ksoftirqd+0x56/0xcc
[<ffffffff802498aa>] ksoftirqd+0x0/0xcc
[<ffffffff8025689e>] kthread+0x47/0x73
[<ffffffff80243247>] schedule_tail+0x27/0x5f
[<ffffffff80220c09>] child_rip+0xa/0x11
[<ffffffff80256857>] kthread+0x0/0x73
[<ffffffff80220bff>] child_rip+0x0/0x11
---[ end trace 032a20d1e1b1dec2 ]---
This happens usually when the _Tx_ network traffic is high.
The frequency is every week or two, so I don't have an easy way of
reproducing it. I have a 2.6.28-rc4 running now, so I may have
a report from the newer kernel as well soon. I have seen it also on older
kernels.
When this happens, the network on this box is dead (no packets
received nor transmitted), but the box is still alive (incl. my X session).
I am not able to shut it down correctly, and even after I push the
reset button, it does not get past the BIOS setup. Holding the power
button for 4+ seconds makes the box switch off (except the standby power,
of course), but it still cannot boot. Only cutting off the power supply
for several seconds allows me to boot it again. So it _may_ be a hardware
problem (or the NIC needs the standby power to be cut off in order
to reset itself correctly).
The box is AMD Athlon64 X2 6400+, AMD CrossFire Xpress 3200 + SB600
chipset, 8 GB RAM, Fedora 9. lspci of the network card is the
following:
03:04.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit Ethernet Controller (rev 13)
Subsystem: ASUSTeK Computer Inc. Marvell 88E8001 Gigabit Ethernet Controller (Asus)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64 (5750ns min, 7750ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 23
Region 0: Memory at fbffc000 (32-bit, non-prefetchable) [size=16K]
Region 1: I/O ports at e800 [size=256]
Expansion ROM at f0000000 [disabled] [size=128K]
Capabilities: [48] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [50] Vital Product Data <?>
Kernel driver in use: skge
Kernel modules: skge
Has anybody else seen this problem, or should I poke my HW vendor?
Thanks,
-Yenya
[ the second dmesg output ]
Nov 14 15:04:03 calypso kernel: skge 0000:03:04.0: PCI error cmd=0x117 status=0x22b0
Nov 14 15:04:03 calypso kernel: skge 0000:03:04.0: unable to clear error (so ignoring them)
Nov 14 15:05:09 calypso kernel: ------------[ cut here ]------------
Nov 14 15:05:09 calypso kernel: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x11b/0x1ab()
Nov 14 15:05:09 calypso kernel: NETDEV WATCHDOG: eth0 (skge): transmit timed out
Nov 14 15:05:09 calypso kernel: Modules linked in: udf crc_itu_t ov511 bnep rfcomm l2cap ext4dev jbd2 crc16 btusb bluetooth floppy
Nov 14 15:05:09 calypso kernel: Pid: 0, comm: swapper Not tainted 2.6.27-rc7 #1
Nov 14 15:05:09 calypso kernel:
Nov 14 15:05:09 calypso kernel: Call Trace:
Nov 14 15:05:09 calypso kernel: <IRQ> [<ffffffff80245558>] warn_slowpath+0xb4/0xdc
Nov 14 15:05:09 calypso kernel: [<ffffffff8023f400>] tg_shares_up+0xa4/0x18d
Nov 14 15:05:09 calypso kernel: [<ffffffff8023bb38>] place_entity+0x6a/0xa3
Nov 14 15:05:09 calypso kernel: [<ffffffff80344ef9>] __next_cpu+0x19/0x26
Nov 14 15:05:09 calypso kernel: [<ffffffff8023db82>] find_busiest_group+0x315/0x7c3
Nov 14 15:05:09 calypso kernel: [<ffffffff8025b418>] getnstimeofday+0x38/0x92
Nov 14 15:05:09 calypso kernel: [<ffffffff8049ba5f>] dev_watchdog+0x11b/0x1ab
Nov 14 15:05:09 calypso kernel: [<ffffffff8025ab97>] sched_clock_cpu+0x123/0x12b
Nov 14 15:05:09 calypso kernel: [<ffffffff8049b944>] dev_watchdog+0x0/0x1ab
Nov 14 15:05:09 calypso kernel: [<ffffffff8024d5a7>] run_timer_softirq+0x151/0x1bf
Nov 14 15:05:09 calypso kernel: [<ffffffff802597e2>] ktime_get+0xc/0x41
Nov 14 15:05:09 calypso kernel: [<ffffffff80249ced>] __do_softirq+0x63/0xcc
Nov 14 15:05:09 calypso kernel: [<ffffffff80220f6c>] call_softirq+0x1c/0x28
Nov 14 15:05:09 calypso kernel: [<ffffffff802228fb>] do_softirq+0x2c/0x68
Nov 14 15:05:09 calypso kernel: [<ffffffff80249a33>] irq_exit+0x3f/0x85
Nov 14 15:05:09 calypso kernel: [<ffffffff8022f2f9>] smp_apic_timer_interrupt+0x8a/0xa2
Nov 14 15:05:09 calypso kernel: [<ffffffff802209b6>] apic_timer_interrupt+0x66/0x70
Nov 14 15:05:09 calypso kernel: <EOI> [<ffffffff8022ee1f>] lapic_next_event+0x0/0xe
Nov 14 15:05:09 calypso kernel: [<ffffffff80226437>] default_idle+0x27/0x3b
Nov 14 15:05:09 calypso kernel: [<ffffffff8022659e>] c1e_idle+0xd0/0xd4
Nov 14 15:05:09 calypso kernel: [<ffffffff8021ee83>] cpu_idle+0x48/0x89
Nov 14 15:05:09 calypso kernel:
Nov 14 15:05:09 calypso kernel: ---[ end trace f6cdd1d002fa89c4 ]---
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
>> If you find yourself arguing with Alan Cox, you’re _probably_ wrong. <<
>> --James Morris in "How and Why You Should Become a Kernel Hacker" <<
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: skge: PCI error cmd=0x117 status=0x22b0
2008-11-14 15:36 skge: PCI error cmd=0x117 status=0x22b0 Jan Kasprzak
@ 2008-11-14 16:42 ` Stephen Hemminger
2008-11-14 17:55 ` Jan Kasprzak
0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2008-11-14 16:42 UTC (permalink / raw)
To: Jan Kasprzak; +Cc: netdev
On Fri, 14 Nov 2008 16:36:46 +0100
Jan Kasprzak <kas@fi.muni.cz> wrote:
> Hello,
>
> I have an ASUS M2R32-MVP board with the skge network card. From time to time
> the network freezes with the following message (this one taken from
> 2.6.27-rc7, another one added even with timing information at the end
> of this mail):
>
> skge 0000:03:04.0: PCI error cmd=0x117 status=0x22b0
> skge 0000:03:04.0: unable to clear error (so ignoring them)
The device is getting a PCI master abort on a DMA.
Either, you have hardware issues, or there is a driver bug when accessing
a particular memory address. Do you have 4G or more of memory?
Some motherboards are crap and don't actually allow DMA from high memory.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: skge: PCI error cmd=0x117 status=0x22b0
2008-11-14 16:42 ` Stephen Hemminger
@ 2008-11-14 17:55 ` Jan Kasprzak
2008-11-14 18:06 ` Stephen Hemminger
0 siblings, 1 reply; 4+ messages in thread
From: Jan Kasprzak @ 2008-11-14 17:55 UTC (permalink / raw)
To: Stephen Hemminger; +Cc: netdev
Stephen Hemminger wrote:
: On Fri, 14 Nov 2008 16:36:46 +0100
: Jan Kasprzak <kas@fi.muni.cz> wrote:
:
: > Hello,
: >
: > I have an ASUS M2R32-MVP board with the skge network card. From time to time
: > the network freezes with the following message (this one taken from
: > 2.6.27-rc7, another one added even with timing information at the end
: > of this mail):
: >
: > skge 0000:03:04.0: PCI error cmd=0x117 status=0x22b0
: > skge 0000:03:04.0: unable to clear error (so ignoring them)
:
: The device is getting a PCI master abort on a DMA.
: Either, you have hardware issues, or there is a driver bug when accessing
: a particular memory address. Do you have 4G or more of memory?
Yes (I have written it in my previous mail) - I have 8 GB of RAM.
Can - for example - a non-linear memory (i.e. the memory hole remapping)
be a problem? IIRC, I have it set to "on" in my BIOS setup.
: Some motherboards are crap and don't actually allow DMA from high memory.
With 8 GB only half of my RAM is below the 32-bit barrier,
so I guess if this was a problem, I would got this error more often
(every second packet on average?).
OK, I will try to poke my HW vendor. Thanks,
-Yenya
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
>> If you find yourself arguing with Alan Cox, you’re _probably_ wrong. <<
>> --James Morris in "How and Why You Should Become a Kernel Hacker" <<
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: skge: PCI error cmd=0x117 status=0x22b0
2008-11-14 17:55 ` Jan Kasprzak
@ 2008-11-14 18:06 ` Stephen Hemminger
0 siblings, 0 replies; 4+ messages in thread
From: Stephen Hemminger @ 2008-11-14 18:06 UTC (permalink / raw)
To: Jan Kasprzak; +Cc: netdev
On Fri, 14 Nov 2008 18:55:07 +0100
Jan Kasprzak <kas@fi.muni.cz> wrote:
> Stephen Hemminger wrote:
> : On Fri, 14 Nov 2008 16:36:46 +0100
> : Jan Kasprzak <kas@fi.muni.cz> wrote:
> :
> : > Hello,
> : >
> : > I have an ASUS M2R32-MVP board with the skge network card. From time to time
> : > the network freezes with the following message (this one taken from
> : > 2.6.27-rc7, another one added even with timing information at the end
> : > of this mail):
> : >
> : > skge 0000:03:04.0: PCI error cmd=0x117 status=0x22b0
> : > skge 0000:03:04.0: unable to clear error (so ignoring them)
> :
> : The device is getting a PCI master abort on a DMA.
> : Either, you have hardware issues, or there is a driver bug when accessing
> : a particular memory address. Do you have 4G or more of memory?
>
> Yes (I have written it in my previous mail) - I have 8 GB of RAM.
> Can - for example - a non-linear memory (i.e. the memory hole remapping)
> be a problem? IIRC, I have it set to "on" in my BIOS setup.
>
> : Some motherboards are crap and don't actually allow DMA from high memory.
>
> With 8 GB only half of my RAM is below the 32-bit barrier,
> so I guess if this was a problem, I would got this error more often
> (every second packet on average?).
>
> OK, I will try to poke my HW vendor. Thanks,
>
> -Yenya
>
You could also try adjustments of iommu on kernel command line
(force, off, soft, etc). Andi Kleen might no more about this.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-11-14 18:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-14 15:36 skge: PCI error cmd=0x117 status=0x22b0 Jan Kasprzak
2008-11-14 16:42 ` Stephen Hemminger
2008-11-14 17:55 ` Jan Kasprzak
2008-11-14 18:06 ` Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).