* iomemory causing a data bus error
@ 2008-02-07 15:20 Jon Dufresne
2008-02-07 20:32 ` Jon Dufresne
0 siblings, 1 reply; 4+ messages in thread
From: Jon Dufresne @ 2008-02-07 15:20 UTC (permalink / raw)
To: linux-mips
Hi,
I am writing a linux mips device driver for a PCI device. I am currently
running into a strange problem when writing a lot of data to iomemory.
I have written a test function that looks like this:
void test_iomem(struct pci_dev *dev)
{
void __iomem *mem;
unsigned long start;
unsigned long size;
start = pci_resource_start(dev, 0);
size = pci_resource_len(dev, 0);
printk("start=%08lx size=%08lx\n", start, size);
mem = ioremap(start, size);
while(1) {
printk("memset %p\n", mem);
memset_io(mem, 0, 10000);
}
}
This function is just being used to test the iomemory. If this works, it
should just keep writing 0's to the iomemory forever (at least as I
understand it). When I run insmod on this device driver I see the
following output:
start=20000000 size=04000000
memset c0580000
memset c0580000
memset c0580000
...
...
...
And this goes on for quite some time. Eventually it will stop, moments
later I will see a kernel panic with the following output:
> memset c0580000
> memset c0580000
> memset c0580000
> Data bus error, epc == c027a64c, ra == 800aa27c
> Oops[#1]:
> Cpu 0
> $ 0 : 00000000 10008400 c0363050 00063050
> $ 4 : 00000037 82937800 8109d000 10008401
> $ 8 : 10008400 1000001f 80320000 80320000
> $12 : 80320000 00000001 83133bf8 8031c960
> $16 : 8311c600 00000080 00000001 00000037
> $20 : c02815e8 802e1ae4 c025a000 8008cde4
> $24 : 00000008 00000000
> $28 : 83132000 83133be8 c02815ac 800aa27c
> Hi : 00000000
> Lo : 00000800
> epc : c027a64c Tainted: P
> ra : 800aa27c Status: 10008403 KERNEL EXL IE
> Cause : 1080801c
> PrId : 00061200
> Modules linked in: tmman1700(P) mousedev usbhid usb_storage phStbFB(P) phStbFBRead(P) phStbVideoRenderer(P) phStbVideoRenderer_Layer(P) phStbStreamingSystem(P) phStbDraw(P) snd_usb_audio snd_usb_lib snd_rawmie
> Process insmod (pid: 790, threadinfo=83132000, task=833e1278)
> Stack : 801ecd20 801ecd20 00000000 00000037 8311c600 00000080 00000001 800aa27c
> 00000000 8008c23c 0000004f 810b9c00 802f7dc0 810c3c00 00000037 810b9c00
> 800aa398 800aa340 10008400 8008c31c 80320000 80320000 00000000 04000000
> c0280000 8006386c 80320000 c027e814 10008401 8031c570 80061310 802e1ae4
> c025a000 8008cde4 00000008 00000000 00000000 80062040 83132000 83133ca8
> ...
> Call Trace:[<801ecd20>][<801ecd20>][<800aa27c>][<8008c23c>][<800aa398>][<800aa340>][<8008c31c>][<8006386c>][<80061310>][<8008cde4>][<80062040>][<80086918>][<8019f924>][<8008cde4>][<c02746d0>][<8017039c>][<801]
>
> Code: 3c030006 34633050 00431021 <8c430000> 2402ffff 1062002e 00a08821 38620001 30420001
> Kernel panic - not syncing: Fatal exception in interrupt
I am at a complete loss as to what could be causing this to occur. Any
ideas about why this would crash? In case it helps this is what my PCI
configuration space looks like:
00: 31 11 06 54 02 00 90 02 00 00 80 04 00 40 00 00
10: 08 00 00 20 00 00 00 24 00 00 00 1c 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 36 11 17 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 09 18
40: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
I thought the command register and latency timer looked wrong, so I did
modify them before running the above function. And now it looks like the
following:
00: 31 11 06 54 16 01 90 02 00 00 80 04 00 40 00 00
10: 08 00 00 20 00 00 00 24 00 00 00 1c 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 36 11 17 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 01 09 18
40: 01 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Still no luck. Anyone have an idea?
Thanks,
Jon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: iomemory causing a data bus error
2008-02-07 15:20 iomemory causing a data bus error Jon Dufresne
@ 2008-02-07 20:32 ` Jon Dufresne
[not found] ` <DDFD17CC94A9BD49A82147DDF7D545C57986B1@exchange.ZeugmaSystems.local>
0 siblings, 1 reply; 4+ messages in thread
From: Jon Dufresne @ 2008-02-07 20:32 UTC (permalink / raw)
To: linux-mips
I took some time and wrote a VERY simplified driver. The driver is now
doing only the required steps to test this error. I figured this would
localize the problem as much as possible. When I run this new driver. I
now see the following error:
ohci_hcd 0000:00:09.0: OHCI Unrecoverable Error, disabled
ohci_hcd 0000:00:09.0: HC died; cleaning up
irq 55: nobody cared (try booting with the "irqpoll" option)
Call
Trace:[<8006926c>][<8006926c>][<800ab1cc>][<800ab434>][<800aa27c>][<800aa27c>][<800aa3b8>][<8007faac>][<80085e9c>][<80085e70>][<8006386c>][<80061310>][<8009e3c4>][<8019d278>][<80062040>][<80062040>][<800]
handlers:
[<801dd630>]
[<801ecce8>]
[<801ecce8>]
[<801ecce8>]
[<801ae6c8>]
Disabling IRQ #55
It looks to me like there is a problem with the USB driver. An
interesting note is that my device's interrupt is also irq 55. I have
tried this simplified driver both requesting the interrupt, in which
case it is never triggered, and not requesting the interrupt. Both cases
end the same way. After I get that message the system is still running,
but extremely slowly.
Any ideas?
Thanks,
Jon
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: iomemory causing a data bus error
[not found] ` <DDFD17CC94A9BD49A82147DDF7D545C57986B1@exchange.ZeugmaSystems.local>
@ 2008-02-07 21:01 ` Jon Dufresne
[not found] ` <DDFD17CC94A9BD49A82147DDF7D545C57986CE@exchange.ZeugmaSystems.local>
0 siblings, 1 reply; 4+ messages in thread
From: Jon Dufresne @ 2008-02-07 21:01 UTC (permalink / raw)
To: Don Hiatt, linux-mips
> Take a look at /proc/interrupts to see if you have something firing
> that you do not expect.
I took a look and this is what I see:
# cat /proc/interrupts
CPU0
2: 0 PNX Level IRQ GIC
7: 0 PNX Level IRQ Timer
10: 661 PNX Level IRQ pnx8550-1
11: 605 PNX Level IRQ pnx8550-2
13: 1 PNX Level IRQ ohci_hcd:usb2
23: 583 PNX Level IRQ i2c
24: 845 PNX Level IRQ i2c
28: 334 PNX Level IRQ pnx8xxx-uart
34: 1 PNX Level IRQ Drawing Engine
47: 0 PNX Level IRQ vmsp1
49: 0 PNX Level IRQ vmsp2
55: 15876 PNX Level IRQ libata, ehci_hcd:usb1, ohci_hcd:usb3, ohci_hcd:usb4, eth0
75: 18 PNX Level IRQ i2c
78: 192 PNX Level IRQ i2c
79: 80239 PNX Level IRQ timer
80: 19 PNX Level IRQ Monotonic timer
ERR: 99373
It looks like there are quite a few devices on irq 55 even before I load
my module. Is it at all possible that I could get my device to use a
different interrupt line? or is this totally restricted by hardware?
Also what does the "ERR" mean? Does this keep a tally of errors? If so
does 99K errors seem high?
> If you are sharing the same IRQ as USB, do you request the IRQ as
> shared? Does the USB as well?
My device does, yes. At this point I have to assume the USB driver is
too. But even if that was the problem, it wouldn't explain why the error
also happens when I don't request the interrupt at all.
Thanks,
Jon
^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: iomemory causing a data bus error
[not found] ` <DDFD17CC94A9BD49A82147DDF7D545C57986CE@exchange.ZeugmaSystems.local>
@ 2008-02-07 21:40 ` Jon Dufresne
0 siblings, 0 replies; 4+ messages in thread
From: Jon Dufresne @ 2008-02-07 21:40 UTC (permalink / raw)
To: Don Hiatt; +Cc: linux-mips
> For ERR: http://lkml.org/lkml/2005/1/12/356
Thanks, I read through that. Seems like I could be dealing with some
imperfect hardware.
Whether or no my device is plugged into the box, I get an order of
magnitude of 10^4 ERRs. Does this seem like a huge amount to anyone
else?
Is there anything I can do to try to reduce this number? Or should I
even worry about it? Could this be connected to the device driver issues
I am having?
Thanks,
Jon
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-02-07 21:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-07 15:20 iomemory causing a data bus error Jon Dufresne
2008-02-07 20:32 ` Jon Dufresne
[not found] ` <DDFD17CC94A9BD49A82147DDF7D545C57986B1@exchange.ZeugmaSystems.local>
2008-02-07 21:01 ` Jon Dufresne
[not found] ` <DDFD17CC94A9BD49A82147DDF7D545C57986CE@exchange.ZeugmaSystems.local>
2008-02-07 21:40 ` Jon Dufresne
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.