* [PATCH 2.6.16] Shared interrupts sometimes lost
@ 2006-04-08 4:10 Neil Brown
2006-04-08 16:31 ` Lee Revell
2006-04-11 17:07 ` Pavel Machek
0 siblings, 2 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-08 4:10 UTC (permalink / raw)
To: linux-kernel
Hi,
I've been having enormous fun just lately playing with my new Minitar
wireless cards - based on the RaLink 2560 chip set. I have a
couple of PCI cards that seem to work well, and a PCMCIA that mostly
works, but will lockup the computer under load, which isn't nice :-(
This is using the "legacy" rt2500 driver derived from the vendor
provided driver, rather than the newer rt2x00 which I cannot get to
work at all. These are all part of rt2400.sourceforge.net.
Anyway, it turns out that my lockup problem is actually a problem in
the Linux kernel rather than with the driver (though possibly aspects
of the driver make the situation worse). I have a patch, below,
which fixes my problem for me.
The problem is that after a while, the interrupts generated by the
card stop being handled by Linux, so the ISR routine in the driver
doesn't get called, and progress of all sorts stop. The fact that
this leads to a full system lockup is the fault of the driver, not
Linux. But Linux is definitely implicated in the loss of interrupts.
I can work around the problem using "irqpoll", but that is a bit
heavy handed, and shouldn't be needed.
This is my first real introduction to the IRQ handling code in Linux,
so please forgive any little errors. I'm fairly sure the big picture
is right, partly because the patch helps so much.
To explain what I think is happening, let me start with a very simple
case. A number of PCI devices (this one included) have a number of
events which can trigger an interrupt. The events which are current
are presented as bits in a register, and are ORed together (and
possibly masked by another register) to make the IRQ line.
When 1's are written to any bits in this register, it acknowledges
the event and clears the bit.
A typical code fragment is
events = read_register(INTERRUPTS);
write_register(INTERRUPTS, events);
... handle each 1 bits in events ....
This would normally clear all pending events and cause the interrupt
line to go low (or at least to not be asserted).
However there is room for a race here. If an event occurs between
the read and the write, then this will NOT de-assert the IRQ line.
It will remain asserted throughout.
Now if the IRQ is handled as an edge-triggered line (which I believe
they are in Linux), then losing this race will mean that we don't see
any more interrupts on this line.
This is a race that would be fairly hard to hit, and is easily fixed
by the driver. After handling all of 'events', it should read the
INTERRUPTS register again and see if there is anything more. I tried
this and it didn't fix my problem (though it might have made it
happen less often, I'm not certain).
In my particular case (and probably quite commonly with PCMCIA
cards), the IRQ line is shared:
10: 66178 XT-PIC yenta, yenta, ohci_hcd:usb2, ohci_hcd:usb3, ehci_hcd:usb4, eth0
So now the race is somewhat larger.
If yenta-1 handles its interrupt events and then another event
happens for yenta-1 before eth0 gets around to clearing its
interrupts, the shared IRQ line will remain asserted after all IRQ
handlers have been called, and so another edge will never be seen.
This race - with a shared IRQ line - cannot be handled within any one
driver. There needs to be some sort of end-to-end check.
In particular, the list of 'action' handlers needs to be called
repeatedly until a full pass has been made through all, and none of
them have reported that there was anything to do. Only then can we
be sure that the IRQ line has been de-asserted.
The following patch does that, and "works for me". It included a
kernel-parameter which allows me to disable the fix so I can test
that the fix really makes a difference. It does.
I am presenting this patch for comment at the moment rather than
inclusion. If it is generally acceptable, I'll remove the __setup
stuff and put a proper change-log at the top etc.
It might be appropriate to put a loop limit in so it doesn't more
than 1000 times or something, just in case someone claims to always
handle the interrupt. Is that needed?
Looking forward to your comments,
NeilBrown
------------------------------
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./kernel/irq/handle.c | 28 ++++++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff ./kernel/irq/handle.c~current~ ./kernel/irq/handle.c
--- ./kernel/irq/handle.c~current~ 2006-04-08 13:26:46.000000000 +1000
+++ ./kernel/irq/handle.c 2006-04-08 13:27:05.000000000 +1000
@@ -73,23 +73,47 @@ irqreturn_t no_action(int cpl, void *dev
return IRQ_NONE;
}
+static int safeirq = 1;
+int __init unsafeirq_setup(char *str)
+{
+ safeirq = 0;
+ printk(KERN_INFO "IRQ safety-line removed - good luck\n");
+ return 1;
+}
+__setup("unsafeirq", unsafeirq_setup);
/*
* Have got an event to handle:
*/
fastcall int handle_IRQ_event(unsigned int irq, struct pt_regs *regs,
- struct irqaction *action)
+ struct irqaction *actionlist)
{
int ret, retval = 0, status = 0;
+ struct irqaction *action = actionlist;
+ int repeat=0;
if (!(action->flags & SA_INTERRUPT))
local_irq_enable();
do {
ret = action->handler(irq, action->dev_id, regs);
- if (ret == IRQ_HANDLED)
+ if (ret == IRQ_HANDLED) {
status |= action->flags;
+ repeat = 1;
+ }
retval |= ret;
action = action->next;
+ if (!action &&
+ repeat &&
+ safeirq &&
+ (actionlist->flags & SA_SHIRQ)) {
+ /* at least one handler on the list did something,
+ * and the interrupt is sharable, so give
+ * every handler another chance, incase a new event
+ * came in and is holding the irq line asserted.
+ */
+ action = actionlist;
+ repeat = 0;
+ }
} while (action);
if (status & SA_SAMPLE_RANDOM)
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
[not found] <5Zd5E-3vi-7@gated-at.bofh.it>
@ 2006-04-08 14:10 ` Robert Hancock
2006-04-09 9:48 ` How to correct ELCR? - was " Neil Brown
[not found] ` <5ZoDL-3rE-7@gated-at.bofh.it>
1 sibling, 1 reply; 14+ messages in thread
From: Robert Hancock @ 2006-04-08 14:10 UTC (permalink / raw)
To: linux-kernel; +Cc: Neil Brown
Neil Brown wrote:
> However there is room for a race here. If an event occurs between
> the read and the write, then this will NOT de-assert the IRQ line.
> It will remain asserted throughout.
>
> Now if the IRQ is handled as an edge-triggered line (which I believe
> they are in Linux), then losing this race will mean that we don't see
> any more interrupts on this line.
PCI interrupts should always be level triggered, not edge triggered
(except maybe in a few special cases - non-native-mode PCI IDE maybe?
and in those cases I don't think the interrupt is considered sharable).
With a level triggered interrupt the ISR will simply be triggered again
and the event handled in this case so there is no race. I think this
patch is going to double interrupt overhead and only covers up some
other problem.
I think that in cases where the interrupt is edge triggered and is
shared (for example on ISA cards that support it) the kernel already has
such logic as you describe.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-08 4:10 Neil Brown
@ 2006-04-08 16:31 ` Lee Revell
2006-04-09 6:02 ` Neil Brown
2006-04-11 17:07 ` Pavel Machek
1 sibling, 1 reply; 14+ messages in thread
From: Lee Revell @ 2006-04-08 16:31 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel
On Sat, 2006-04-08 at 14:10 +1000, Neil Brown wrote:
>
> To explain what I think is happening, let me start with a very simple
> case. A number of PCI devices (this one included) have a number of
> events which can trigger an interrupt. The events which are current
> are presented as bits in a register, and are ORed together (and
> possibly masked by another register) to make the IRQ line.
> When 1's are written to any bits in this register, it acknowledges
> the event and clears the bit.
> A typical code fragment is
> events = read_register(INTERRUPTS);
> write_register(INTERRUPTS, events);
> ... handle each 1 bits in events ....
>
Isn't a more typical IRQ handler:
while (events = read_register(INTERRUPTS) != 0) {
...handle each bit in events and ACK it...
}
Lee
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-08 16:31 ` Lee Revell
@ 2006-04-09 6:02 ` Neil Brown
0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-09 6:02 UTC (permalink / raw)
To: Lee Revell; +Cc: linux-kernel
On Saturday April 8, rlrevell@joe-job.com wrote:
> On Sat, 2006-04-08 at 14:10 +1000, Neil Brown wrote:
> >
> > To explain what I think is happening, let me start with a very simple
> > case. A number of PCI devices (this one included) have a number of
> > events which can trigger an interrupt. The events which are current
> > are presented as bits in a register, and are ORed together (and
> > possibly masked by another register) to make the IRQ line.
> > When 1's are written to any bits in this register, it acknowledges
> > the event and clears the bit.
> > A typical code fragment is
> > events = read_register(INTERRUPTS);
> > write_register(INTERRUPTS, events);
> > ... handle each 1 bits in events ....
> >
>
> Isn't a more typical IRQ handler:
>
> while (events = read_register(INTERRUPTS) != 0) {
> ...handle each bit in events and ACK it...
> }
Maybe... I admit that I generalised from 2 examples: rt2500 and yenta.
However I don't think it makes a big difference to the problem with
shared interrupts (assuming my analysis is right).
The loop isn't really necessary if you are sure that unserviced
interrupts will be re-asserted (as level-triggered interrupts would
be) (though it may still help performance) and the loop isn't
sufficient if you need to be sure that all events get services (as
there may be more in the shared-chain).
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-08 14:10 ` [PATCH 2.6.16] Shared interrupts sometimes lost Robert Hancock
@ 2006-04-09 9:48 ` Neil Brown
2006-04-09 15:48 ` Francois Romieu
2006-04-11 17:07 ` Pavel Machek
0 siblings, 2 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-09 9:48 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel
On Saturday April 8, hancockr@shaw.ca wrote:
> Neil Brown wrote:
> > However there is room for a race here. If an event occurs between
> > the read and the write, then this will NOT de-assert the IRQ line.
> > It will remain asserted throughout.
> >
> > Now if the IRQ is handled as an edge-triggered line (which I believe
> > they are in Linux), then losing this race will mean that we don't see
> > any more interrupts on this line.
>
> PCI interrupts should always be level triggered, not edge triggered
Ok... so I guess I jumped to the wrong conclusion. Thanks for
straightening me out.
But it is behaving like edge-triggered..
So I have explored about the i8259 (wikipedia helped) and discovered
the ELCR (Edge/Level Control Register). Apparently this is meant to
be set up by the BIOS to the correct values. It seems that this isn't
happening.
It seems to get the value 0x0800 which corresponds to IRQ11 being the
only level-triggered interrupt. But I need IRQ10 to be level
triggered. I hacked the code to set the 0x0400 bit, and it seems to
work OK without my other patch.
Now I just need a way to set this correctly at boot time without a
hack.
I currently have Linux compiled without ACPI support (as I don't
really want that and being an oldish notebook I gather it has a good
chance of causing problems) so that isn't fiddling with the ELCR.
So thank you for helping me a step further in understand, but now I
have a new question:
How can I make sure the ELCR is set correctly?
and I guess,
What is the correct setting?
My /proc/interrupts is below.
Thanks.
NeilBrown
CPU0
0: 505852 XT-PIC timer
1: 10 XT-PIC i8042
2: 0 XT-PIC cascade
4: 10 XT-PIC serial
8: 4 XT-PIC rtc
10: 16442 XT-PIC yenta, yenta, ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb4, eth0
11: 0 XT-PIC uhci_hcd:usb3
12: 110 XT-PIC i8042
14: 5114 XT-PIC ide0
15: 38 XT-PIC ide1
NMI: 0
ERR: 0
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-09 9:48 ` How to correct ELCR? - was " Neil Brown
@ 2006-04-09 15:48 ` Francois Romieu
2006-04-09 22:28 ` Neil Brown
2006-04-11 17:07 ` Pavel Machek
1 sibling, 1 reply; 14+ messages in thread
From: Francois Romieu @ 2006-04-09 15:48 UTC (permalink / raw)
To: Neil Brown; +Cc: Robert Hancock, linux-kernel
Neil Brown <neilb@suse.de> :
[...]
Can you send an url for the exact rt2500 sources that you are
actually using ?
Just curious :o)
--
Ueimor
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
[not found] ` <5ZoDL-3rE-7@gated-at.bofh.it>
@ 2006-04-09 18:12 ` Robert Hancock
2006-04-09 18:24 ` Lee Revell
0 siblings, 1 reply; 14+ messages in thread
From: Robert Hancock @ 2006-04-09 18:12 UTC (permalink / raw)
To: linux-kernel; +Cc: Lee Revell
Lee Revell wrote:
> On Sat, 2006-04-08 at 14:10 +1000, Neil Brown wrote:
>> To explain what I think is happening, let me start with a very simple
>> case. A number of PCI devices (this one included) have a number of
>> events which can trigger an interrupt. The events which are current
>> are presented as bits in a register, and are ORed together (and
>> possibly masked by another register) to make the IRQ line.
>> When 1's are written to any bits in this register, it acknowledges
>> the event and clears the bit.
>> A typical code fragment is
>> events = read_register(INTERRUPTS);
>> write_register(INTERRUPTS, events);
>> ... handle each 1 bits in events ....
>>
>
> Isn't a more typical IRQ handler:
>
> while (events = read_register(INTERRUPTS) != 0) {
> ...handle each bit in events and ACK it...
> }
That would be less efficient, it would read the register twice or more
if any events have been set, and reading device registers can be
expensive. In the unlikely event the event was set while inside the ISR
the interrupt should be asserted again so there is no need to do this.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-09 18:12 ` Robert Hancock
@ 2006-04-09 18:24 ` Lee Revell
0 siblings, 0 replies; 14+ messages in thread
From: Lee Revell @ 2006-04-09 18:24 UTC (permalink / raw)
To: Robert Hancock; +Cc: linux-kernel
On Sun, 2006-04-09 at 12:12 -0600, Robert Hancock wrote:
> Lee Revell wrote:
> > Isn't a more typical IRQ handler:
> >
> > while (events = read_register(INTERRUPTS) != 0) {
> > ...handle each bit in events and ACK it...
> > }
>
> That would be less efficient, it would read the register twice or more
> if any events have been set, and reading device registers can be
> expensive. In the unlikely event the event was set while inside the
> ISR the interrupt should be asserted again so there is no need to do
> this.
OK. FWIW I am looking at the emu10k1 driver (though I've seen this in
others). The OSS driver has this comment:
/*
** NOTE :
** We do a 'while loop' here cos on certain machines, with both
** playback and recording going on at the same time, IRQs will
** stop coming in after a while. Checking IPND indeed shows that
** there are interrupts pending but the PIC says no IRQs pending.
** I suspect that some boards need edge-triggered IRQs but are not
** getting that condition if we don't completely clear the IPND
** (make sure no more interrupts are pending).
** - Eric
*/
The ALSA driver preserves the while loop but omits the comment :-/
Lee
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-09 15:48 ` Francois Romieu
@ 2006-04-09 22:28 ` Neil Brown
0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-09 22:28 UTC (permalink / raw)
To: Francois Romieu; +Cc: Robert Hancock, linux-kernel
On Sunday April 9, romieu@fr.zoreil.com wrote:
> Neil Brown <neilb@suse.de> :
> [...]
>
> Can you send an url for the exact rt2500 sources that you are
> actually using ?
>
> Just curious :o)
http://rt2x00.serialmonkey.com/rt2500-cvs-daily.tar.gz
(though that may change from day to day... I just checked, and *now*
it is the same as what I started with)
plus a bunch of printks, which I attach for your amusement :-)
NeilBrown
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .assoc.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .auth.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .auth_rsp.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .connect.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .eeprom.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .md5.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .mlme.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rt2500.ko.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rt2500.mod.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rt2500.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_data.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_info.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_init.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_main.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_tkip.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_wep.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .sanity.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .sync.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .tmp_versions
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .wpa.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: assoc.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: auth.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: auth_rsp.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: connect.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: eeprom.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: md5.o
diff -ur rt2500-cvs-2006040703/Module/mlme.c /root/rt-latest/rt2500-cvs-2006040703/Module/mlme.c
--- rt2500-cvs-2006040703/Module/mlme.c 2006-04-05 13:52:43.000000000 +1000
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/mlme.c 2006-04-07 21:54:14.000000000 +1000
@@ -350,12 +350,19 @@
This task guarantee only one MlmeHandler will run.
==========================================================================
*/
+extern void dump_prio(PRTMP_ADAPTER A);
VOID MlmeHandler(
IN PRTMP_ADAPTER pAd)
{
MLME_QUEUE_ELEM *Elem = NULL;
unsigned long flags;
-
+ int loops=0;
+ { int static done = 0;
+ if (!done) {
+ done = 1;
+ dump_prio(pAd);
+ }
+ }
// Only accept MLME and Frame from peer side, no other (control/data) frame should
// get into this state machine
@@ -411,6 +418,14 @@
{
printk(KERN_ERR DRV_NAME "ERROR: empty Elem in MlmeQueue\n");
}
+ loops++;
+ if (loops > 10) {
+ static int done = 0;
+ if (done) break;
+ done=1;
+ dump_prio(pAd);
+ printk("BREAK\n"); break;
+ }
}
spin_lock_irqsave(&pAd->Mlme.TaskLock,flags);
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: mlme.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.ko
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.mod.c
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.mod.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.o
diff -ur rt2500-cvs-2006040703/Module/rt2560.h /root/rt-latest/rt2500-cvs-2006040703/Module/rt2560.h
--- rt2500-cvs-2006040703/Module/rt2560.h 2005-10-06 12:53:08.000000000 +1000
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/rt2560.h 2006-04-07 23:42:18.000000000 +1000
@@ -517,6 +517,15 @@
ULONG TwakeExpire:1; // Wakeup timer expired interrupt
ULONG TbcnExpire:1; // Beacon timer expired interrupt
#else
+/* 0x169
+ TbcnExpire
+ TxRingTxDone
+ PrioRingTxDone
+ RxDone
+ ExcryptionDone
+ fe14
+ 1eb
+*/
ULONG TbcnExpire:1; // Beacon timer expired interrupt
ULONG TwakeExpire:1; // Wakeup timer expired interrupt
ULONG TatimwExpire:1; // Timer of atim window expired interrupt
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2560.h.~1.4.~
diff -ur rt2500-cvs-2006040703/Module/rtmp_data.c /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_data.c
--- rt2500-cvs-2006040703/Module/rtmp_data.c 2005-11-25 22:30:47.000000000 +1100
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_data.c 2006-04-08 11:10:21.000000000 +1000
@@ -905,6 +905,25 @@
}
}
+void dump_prio(PRTMP_ADAPTER A)
+{
+ int i;
+ u32 u;
+ PTXD_STRUC T;
+ printk("Prio: Index=%d Done=%d\n", A->CurPrioIndex, A->NextPrioDoneIndex);
+ for (i=0; i<PRIO_RING_SIZE; i++) {
+ T=(PTXD_STRUC)(A->PrioRing[i].va_addr);
+ printk("%3d: size=%d FT=%d own=%d vld=%d, rty=%d\n",
+ i, A->PrioRing[i].size, A->PrioRing[i].FrameType,
+ T->Owner, T->Valid, T->RetryCount);
+ }
+ printk("\n");
+ RTMP_IO_READ32(A, CSR7, &u);
+ printk("CSR7 is 0x%x\n", u);
+ RTMP_IO_READ32(A, CSR8, &u);
+ printk("CSR8 is 0x%x\n", u);
+}
+
/*
========================================================================
@@ -947,10 +966,11 @@
pTxD = &TxD;
RTMPDescriptorEndianChange((PUCHAR)pTxD, TYPE_TXD);
#endif
-
+ //printk("PrioDone for %d\n", pAdapter->NextPrioDoneIndex);
// Check for the descriptor ownership
if ((pTxD->Owner == DESC_OWN_NIC) || (pTxD->Valid == FALSE))
{
+ //printk("..no\n");
#ifdef BIG_ENDIAN
RTMPDescriptorEndianChange((PUCHAR)pTxD, TYPE_TXD);
*pDestTxD = TxD;
@@ -2120,13 +2140,14 @@
pTxD = &TxD;
RTMPDescriptorEndianChange((PUCHAR)pTxD, TYPE_TXD);
#endif
-
+ printk("Queuing prio for %d\n", pAdapter->CurPrioIndex);
if (pTxD->Owner == DESC_OWN_NIC)
{
// Descriptor owned by NIC. No descriptor avaliable
// This should not happen since caller guaranteed.
// Make sure to release Prio ring resource
spin_unlock_irqrestore(&pAdapter->PrioRingLock, irqflag);
+ printk("...not mine\n");
return;
}
if (pTxD->Valid == TRUE)
@@ -2135,6 +2156,7 @@
// This should not happen since caller guaranteed.
// Make sure to release Prio ring resource
spin_unlock_irqrestore(&pAdapter->PrioRingLock, irqflag);
+ printk("...still valid\n");
return;
}
if (pBuffer == NULL)
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_data.c.~1.35.~
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_data.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_info.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_init.c.~1.27.~
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_init.o
diff -ur rt2500-cvs-2006040703/Module/rtmp_main.c /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_main.c
--- rt2500-cvs-2006040703/Module/rtmp_main.c 2006-02-25 20:45:51.000000000 +1100
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_main.c 2006-04-08 11:09:41.000000000 +1000
@@ -194,7 +194,7 @@
// register_netdev() will call dev_alloc_name() for us
// TODO: Remove the following line to keep the default eth%d name
if (ifname == NULL)
- strcpy(net_dev->name, "ra%d");
+ strcpy(net_dev->name, "eth%d");
else
strncpy(net_dev->name, ifname, IFNAMSIZ);
@@ -456,7 +456,7 @@
int ret = 0;
DBGPRINT(RT_DEBUG_INFO, "====> RTMPHandleInterrupt\n");
-
+ again:
// 1. Disable interrupt
if (RTMP_TEST_FLAG(pAdapter, fRTMP_ADAPTER_INTERRUPT_IN_USE) && RTMP_TEST_FLAG(pAdapter, fRTMP_ADAPTER_INTERRUPT_ACTIVE))
{
@@ -523,6 +523,7 @@
if (IntSource.field.PrioRingTxDone)
{
DBGPRINT(RT_DEBUG_INFO, "====> RTMPHandlePrioRingTxDoneInterrupt\n");
+ printk("Word = %d\n", IntSource.word);
RTMPHandlePrioRingTxDoneInterrupt(pAdapter);
ret = 1;
}
@@ -546,6 +547,7 @@
if (RTMP_TEST_FLAG(pAdapter, fRTMP_ADAPTER_RESET_IN_PROGRESS))
{
ret = 1;
+ printk("Don't enable interrupt\n");
goto out;
}
@@ -554,12 +556,19 @@
//
NICEnableInterrupt(pAdapter);
+ RTMP_IO_READ32(pAdapter, CSR7, &IntSource.word);
+ if (IntSource.word & ~0xFE14) {
+ /*printk("word now %x\n", IntSource.word); */
+ // goto again;
+ }
DBGPRINT(RT_DEBUG_INFO, "<==== RTMPHandleInterrupt\n");
out:
if(ret)
return IRQ_RETVAL(IRQ_HANDLED);
- else
+ else {
+// printk("Nothing handled: %x\n", IntSource.word);
return IRQ_RETVAL(IRQ_NONE);
+ }
}
int rt2500_set_mac_address(struct net_device *net_dev, void *addr)
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_main.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_tkip.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_wep.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: sanity.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: sync.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: wpa.o
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-08 4:10 Neil Brown
2006-04-08 16:31 ` Lee Revell
@ 2006-04-11 17:07 ` Pavel Machek
2006-04-12 0:01 ` Neil Brown
1 sibling, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2006-04-11 17:07 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel
Hi!
> This is my first real introduction to the IRQ handling code in Linux,
> so please forgive any little errors. I'm fairly sure the big picture
> is right, partly because the patch helps so much.
>
> To explain what I think is happening, let me start with a very simple
> case. A number of PCI devices (this one included) have a number of
> events which can trigger an interrupt. The events which are current
> are presented as bits in a register, and are ORed together (and
> possibly masked by another register) to make the IRQ line.
> When 1's are written to any bits in this register, it acknowledges
> the event and clears the bit.
> A typical code fragment is
> events = read_register(INTERRUPTS);
> write_register(INTERRUPTS, events);
> ... handle each 1 bits in events ....
>
> This would normally clear all pending events and cause the interrupt
> line to go low (or at least to not be asserted).
>
> However there is room for a race here. If an event occurs between
> the read and the write, then this will NOT de-assert the IRQ line.
> It will remain asserted throughout.
>
> Now if the IRQ is handled as an edge-triggered line (which I believe
> they are in Linux), then losing this race will mean that we don't see
> any more interrupts on this line.
I believe that
a) any shared interrupts should be level-triggered. It is not okay to
share edge-triggered interrupt
b) your patch does not fix that issue. It only makes race window
smaller.
> if (!(action->flags & SA_INTERRUPT))
> local_irq_enable();
>
> do {
> ret = action->handler(irq, action->dev_id, regs);
> - if (ret == IRQ_HANDLED)
> + if (ret == IRQ_HANDLED) {
> status |= action->flags;
> + repeat = 1;
> + }
> retval |= ret;
> action = action->next;
> + if (!action &&
> + repeat &&
> + safeirq &&
> + (actionlist->flags & SA_SHIRQ)) {
> + /* at least one handler on the list did something,
> + * and the interrupt is sharable, so give
> + * every handler another chance, incase a new event
> + * came in and is holding the irq line asserted.
> + */
> + action = actionlist;
> + repeat = 0;
> + }
> } while (action);
I think it is still racy. What if another interrupt comes here?
> if (status & SA_SAMPLE_RANDOM)
Pavel
--
Thanks for all the (sleeping) penguins.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-09 9:48 ` How to correct ELCR? - was " Neil Brown
2006-04-09 15:48 ` Francois Romieu
@ 2006-04-11 17:07 ` Pavel Machek
2006-04-12 4:01 ` Neil Brown
1 sibling, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2006-04-11 17:07 UTC (permalink / raw)
To: Neil Brown; +Cc: Robert Hancock, linux-kernel
Hi!
> > > However there is room for a race here. If an event occurs between
> > > the read and the write, then this will NOT de-assert the IRQ line.
> > > It will remain asserted throughout.
> > >
> > > Now if the IRQ is handled as an edge-triggered line (which I believe
> > > they are in Linux), then losing this race will mean that we don't see
> > > any more interrupts on this line.
> >
> > PCI interrupts should always be level triggered, not edge triggered
>
> Ok... so I guess I jumped to the wrong conclusion. Thanks for
> straightening me out.
> But it is behaving like edge-triggered..
>
> So I have explored about the i8259 (wikipedia helped) and discovered
> the ELCR (Edge/Level Control Register). Apparently this is meant to
> be set up by the BIOS to the correct values. It seems that this isn't
> happening.
>
> It seems to get the value 0x0800 which corresponds to IRQ11 being the
> only level-triggered interrupt. But I need IRQ10 to be level
> triggered. I hacked the code to set the 0x0400 bit, and it seems to
> work OK without my other patch.
>
> Now I just need a way to set this correctly at boot time without a
> hack.
>
> I currently have Linux compiled without ACPI support (as I don't
> really want that and being an oldish notebook I gather it has a good
> chance of causing problems) so that isn't fiddling with the ELCR.
Can you try to enable ACPI and/or APIC? Some machines are known to
require APIC...
Pavel
--
Thanks for all the (sleeping) penguins.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-11 17:07 ` Pavel Machek
@ 2006-04-12 0:01 ` Neil Brown
2006-04-13 5:41 ` Pavel Machek
0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2006-04-12 0:01 UTC (permalink / raw)
To: Pavel Machek; +Cc: linux-kernel
On Tuesday April 11, pavel@suse.cz wrote:
> >
> > Now if the IRQ is handled as an edge-triggered line (which I believe
> > they are in Linux), then losing this race will mean that we don't see
> > any more interrupts on this line.
>
> I believe that
>
> a) any shared interrupts should be level-triggered. It is not okay to
> share edge-triggered interrupt
I see that now, thanks.
>
> b) your patch does not fix that issue. It only makes race window
> smaller.
>
> > if (!(action->flags & SA_INTERRUPT))
> > local_irq_enable();
> >
> > do {
> > ret = action->handler(irq, action->dev_id, regs);
> > - if (ret == IRQ_HANDLED)
> > + if (ret == IRQ_HANDLED) {
> > status |= action->flags;
> > + repeat = 1;
> > + }
> > retval |= ret;
> > action = action->next;
> > + if (!action &&
> > + repeat &&
> > + safeirq &&
> > + (actionlist->flags & SA_SHIRQ)) {
> > + /* at least one handler on the list did something,
> > + * and the interrupt is sharable, so give
> > + * every handler another chance, incase a new event
> > + * came in and is holding the irq line asserted.
> > + */
> > + action = actionlist;
> > + repeat = 0;
> > + }
> > } while (action);
>
> I think it is still racy. What if another interrupt comes here?
Well... at this point we are certain that the irq line was low at some
point since the loop was first entered, otherwise we would not have
been able to pass through all actions without getting a single
IRQ_HANDLED.
So if an interrupt happens now, it will generate an edge.
However I concede that I don't know how the PIC works exactly. The
interrupt is still disabled in the PIC at this point. If this means
that the edge will be ignored, then you are right. If it just means
that the edge will be "delayed" and we will see the interrupt when
they are re-enabled at the PIC, then we should be safe.
>
> > if (status & SA_SAMPLE_RANDOM)
>
> Pavel
> --
> Thanks for all the (sleeping) penguins.
Do you have a gallery somewhere ???
Thanks for your input.
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-11 17:07 ` Pavel Machek
@ 2006-04-12 4:01 ` Neil Brown
0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-12 4:01 UTC (permalink / raw)
To: Pavel Machek; +Cc: Robert Hancock, linux-kernel
On Tuesday April 11, pavel@suse.cz wrote:
> >
> > I currently have Linux compiled without ACPI support (as I don't
> > really want that and being an oldish notebook I gather it has a good
> > chance of causing problems) so that isn't fiddling with the ELCR.
>
> Can you try to enable ACPI and/or APIC? Some machines are known to
> require APIC...
Thanks for the suggestions.
I now have
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
in my .config, and it doesn't make any apparent difference.
I haven't tried ACPI yet... maybe in a couple of days.
Thanks again,
NeilBrown
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
2006-04-12 0:01 ` Neil Brown
@ 2006-04-13 5:41 ` Pavel Machek
0 siblings, 0 replies; 14+ messages in thread
From: Pavel Machek @ 2006-04-13 5:41 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-kernel
Hi!
> > > if (status & SA_SAMPLE_RANDOM)
> >
> > Pavel
> > --
> > Thanks for all the (sleeping) penguins.
>
> Do you have a gallery somewhere ???
http://atrey.karlin.mff.cuni.cz/~pavel/swsusp/
Pavel
--
Thanks, Sharp!
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-04-13 5:41 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <5Zd5E-3vi-7@gated-at.bofh.it>
2006-04-08 14:10 ` [PATCH 2.6.16] Shared interrupts sometimes lost Robert Hancock
2006-04-09 9:48 ` How to correct ELCR? - was " Neil Brown
2006-04-09 15:48 ` Francois Romieu
2006-04-09 22:28 ` Neil Brown
2006-04-11 17:07 ` Pavel Machek
2006-04-12 4:01 ` Neil Brown
[not found] ` <5ZoDL-3rE-7@gated-at.bofh.it>
2006-04-09 18:12 ` Robert Hancock
2006-04-09 18:24 ` Lee Revell
2006-04-08 4:10 Neil Brown
2006-04-08 16:31 ` Lee Revell
2006-04-09 6:02 ` Neil Brown
2006-04-11 17:07 ` Pavel Machek
2006-04-12 0:01 ` Neil Brown
2006-04-13 5:41 ` Pavel Machek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).