linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 2.6.16] Shared interrupts sometimes lost
@ 2006-04-08  4:10 Neil Brown
  2006-04-08 16:31 ` Lee Revell
  2006-04-11 17:07 ` Pavel Machek
  0 siblings, 2 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-08  4:10 UTC (permalink / raw)
  To: linux-kernel


Hi,
 I've been having enormous fun just lately playing with my new Minitar
 wireless cards - based on the RaLink 2560 chip set.    I have a
 couple of PCI cards that seem to work well, and a PCMCIA that mostly
 works, but will lockup the computer under load, which isn't nice :-(

 This is using the "legacy" rt2500 driver derived from the vendor
 provided driver, rather than the newer rt2x00 which I cannot get to
 work at all.  These are all part of rt2400.sourceforge.net.

 Anyway, it turns out that my lockup problem is actually a problem in
 the Linux kernel rather than with the driver (though possibly aspects
 of the driver make the situation worse).  I have a patch, below,
 which fixes my problem for me.

 The problem is that after a while, the interrupts generated by the
 card stop being handled by Linux, so the ISR routine in the driver
 doesn't get called, and progress of all sorts stop.  The fact that
 this leads to a full system lockup is the fault of the driver, not
 Linux.  But Linux is definitely implicated in the loss of interrupts.
 I can work around the problem using "irqpoll", but that is a bit
 heavy handed, and shouldn't be needed.

 This is my first real introduction to the IRQ handling code in Linux,
 so please forgive any little errors.  I'm fairly sure the big picture
 is right, partly because the patch helps so much.

 To explain what I think is happening, let me start with a very simple
 case.  A number of PCI devices (this one included) have a number of
 events which can trigger an interrupt.  The events which are current
 are presented as bits in a register, and are ORed together (and
 possibly masked by another register) to make the IRQ line.
 When 1's are written to any bits in this register, it acknowledges
 the event and clears the bit.
 A typical code fragment is 
   events = read_register(INTERRUPTS);
   write_register(INTERRUPTS, events);
   ... handle each 1 bits in events ....

 This would normally clear all pending events and cause the interrupt
 line to go low (or at least to not be asserted).

 However there is room for a race here.  If an event occurs between
 the read and the write, then this will NOT de-assert the IRQ line.
 It will remain asserted throughout.

 Now if the IRQ is handled as an edge-triggered line (which I believe
 they are in Linux), then losing this race will mean that we don't see
 any more interrupts on this line.

 This is a race that would be fairly hard to hit, and is easily fixed
 by the driver.  After handling all of 'events', it should read the
 INTERRUPTS register again and see if there is anything more.  I tried
 this and it didn't fix my problem (though it might have made it
 happen less often, I'm not certain).

 In my particular case (and probably quite commonly with PCMCIA
 cards), the IRQ line is shared:

 10:      66178          XT-PIC  yenta, yenta, ohci_hcd:usb2, ohci_hcd:usb3, ehci_hcd:usb4, eth0

 So now the race is somewhat larger.
 If yenta-1 handles its interrupt events and then another event
 happens for yenta-1 before eth0 gets around to clearing its
 interrupts, the shared IRQ line will remain asserted after all IRQ
 handlers have been called, and so another edge will never be seen.

 This race - with a shared IRQ line - cannot be handled within any one
 driver.  There needs to be some sort of end-to-end check.

 In particular, the list of 'action' handlers needs to be called
 repeatedly until a full pass has been made through all, and none of
 them have reported that there was anything to do.  Only then can we
 be sure that the IRQ line has been de-asserted.

 The following patch does that, and "works for me".  It included a
 kernel-parameter which allows me to disable the fix so I can test
 that the fix really makes a difference.  It does.

 I am presenting this patch for comment at the moment rather than
 inclusion.  If it is generally acceptable, I'll remove the __setup
 stuff and put a proper change-log at the top etc.

 It might be appropriate to put a loop limit in so it doesn't more
 than 1000 times or something, just in case someone claims to always
 handle the interrupt.  Is that needed?

 Looking forward to your comments,
 NeilBrown


------------------------------
Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./kernel/irq/handle.c |   28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff ./kernel/irq/handle.c~current~ ./kernel/irq/handle.c
--- ./kernel/irq/handle.c~current~	2006-04-08 13:26:46.000000000 +1000
+++ ./kernel/irq/handle.c	2006-04-08 13:27:05.000000000 +1000
@@ -73,23 +73,47 @@ irqreturn_t no_action(int cpl, void *dev
 	return IRQ_NONE;
 }
 
+static int safeirq = 1;
+int __init unsafeirq_setup(char *str)
+{
+	safeirq = 0;
+	printk(KERN_INFO "IRQ safety-line removed - good luck\n");
+	return 1;
+}
+__setup("unsafeirq", unsafeirq_setup);
 /*
  * Have got an event to handle:
  */
 fastcall int handle_IRQ_event(unsigned int irq, struct pt_regs *regs,
-				struct irqaction *action)
+				struct irqaction *actionlist)
 {
 	int ret, retval = 0, status = 0;
+	struct irqaction *action = actionlist;
+	int repeat=0;
 
 	if (!(action->flags & SA_INTERRUPT))
 		local_irq_enable();
 
 	do {
 		ret = action->handler(irq, action->dev_id, regs);
-		if (ret == IRQ_HANDLED)
+		if (ret == IRQ_HANDLED) {
 			status |= action->flags;
+			repeat = 1;
+		}
 		retval |= ret;
 		action = action->next;
+		if (!action &&
+		    repeat &&
+		    safeirq &&
+		    (actionlist->flags & SA_SHIRQ)) {
+			/* at least one handler on the list did something,
+			 * and the interrupt is sharable, so give
+			 * every handler another chance, incase a new event
+			 * came in and is holding the irq line asserted.
+			 */
+			action = actionlist;
+			repeat = 0;
+		}
 	} while (action);
 
 	if (status & SA_SAMPLE_RANDOM)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
       [not found] <5Zd5E-3vi-7@gated-at.bofh.it>
@ 2006-04-08 14:10 ` Robert Hancock
  2006-04-09  9:48   ` How to correct ELCR? - was " Neil Brown
       [not found] ` <5ZoDL-3rE-7@gated-at.bofh.it>
  1 sibling, 1 reply; 14+ messages in thread
From: Robert Hancock @ 2006-04-08 14:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Neil Brown

Neil Brown wrote:
>  However there is room for a race here.  If an event occurs between
>  the read and the write, then this will NOT de-assert the IRQ line.
>  It will remain asserted throughout.
> 
>  Now if the IRQ is handled as an edge-triggered line (which I believe
>  they are in Linux), then losing this race will mean that we don't see
>  any more interrupts on this line.

PCI interrupts should always be level triggered, not edge triggered 
(except maybe in a few special cases - non-native-mode PCI IDE maybe? 
and in those cases I don't think the interrupt is considered sharable). 
With a level triggered interrupt the ISR will simply be triggered again 
and the event handled in this case so there is no race. I think this 
patch is going to double interrupt overhead and only covers up some 
other problem.

I think that in cases where the interrupt is edge triggered and is 
shared (for example on ISA cards that support it) the kernel already has 
such logic as you describe.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-08  4:10 Neil Brown
@ 2006-04-08 16:31 ` Lee Revell
  2006-04-09  6:02   ` Neil Brown
  2006-04-11 17:07 ` Pavel Machek
  1 sibling, 1 reply; 14+ messages in thread
From: Lee Revell @ 2006-04-08 16:31 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel

On Sat, 2006-04-08 at 14:10 +1000, Neil Brown wrote:
> 
>  To explain what I think is happening, let me start with a very simple
>  case.  A number of PCI devices (this one included) have a number of
>  events which can trigger an interrupt.  The events which are current
>  are presented as bits in a register, and are ORed together (and
>  possibly masked by another register) to make the IRQ line.
>  When 1's are written to any bits in this register, it acknowledges
>  the event and clears the bit.
>  A typical code fragment is 
>    events = read_register(INTERRUPTS);
>    write_register(INTERRUPTS, events);
>    ... handle each 1 bits in events ....
> 

Isn't a more typical IRQ handler:

while (events = read_register(INTERRUPTS) != 0) {
	...handle each bit in events and ACK it...
}

Lee


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-08 16:31 ` Lee Revell
@ 2006-04-09  6:02   ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-09  6:02 UTC (permalink / raw)
  To: Lee Revell; +Cc: linux-kernel

On Saturday April 8, rlrevell@joe-job.com wrote:
> On Sat, 2006-04-08 at 14:10 +1000, Neil Brown wrote:
> > 
> >  To explain what I think is happening, let me start with a very simple
> >  case.  A number of PCI devices (this one included) have a number of
> >  events which can trigger an interrupt.  The events which are current
> >  are presented as bits in a register, and are ORed together (and
> >  possibly masked by another register) to make the IRQ line.
> >  When 1's are written to any bits in this register, it acknowledges
> >  the event and clears the bit.
> >  A typical code fragment is 
> >    events = read_register(INTERRUPTS);
> >    write_register(INTERRUPTS, events);
> >    ... handle each 1 bits in events ....
> > 
> 
> Isn't a more typical IRQ handler:
> 
> while (events = read_register(INTERRUPTS) != 0) {
> 	...handle each bit in events and ACK it...
> }

Maybe... I admit that I generalised from 2 examples: rt2500 and yenta.
However I don't think it makes a big difference to the problem with
shared interrupts (assuming my analysis is right).

The loop isn't really necessary if you are sure that unserviced
interrupts will be re-asserted (as level-triggered interrupts would
be) (though it may still help performance) and the loop isn't
sufficient if you need to be sure that all events get services (as
there may be more in the shared-chain).

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-08 14:10 ` [PATCH 2.6.16] Shared interrupts sometimes lost Robert Hancock
@ 2006-04-09  9:48   ` Neil Brown
  2006-04-09 15:48     ` Francois Romieu
  2006-04-11 17:07     ` Pavel Machek
  0 siblings, 2 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-09  9:48 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

On Saturday April 8, hancockr@shaw.ca wrote:
> Neil Brown wrote:
> >  However there is room for a race here.  If an event occurs between
> >  the read and the write, then this will NOT de-assert the IRQ line.
> >  It will remain asserted throughout.
> > 
> >  Now if the IRQ is handled as an edge-triggered line (which I believe
> >  they are in Linux), then losing this race will mean that we don't see
> >  any more interrupts on this line.
> 
> PCI interrupts should always be level triggered, not edge triggered 

Ok... so I guess I jumped to the wrong conclusion. Thanks for
straightening me out.
But it is behaving like edge-triggered..

So I have explored about the i8259 (wikipedia helped) and discovered
the ELCR (Edge/Level Control Register).  Apparently this is meant to
be set up by the BIOS to the correct values.  It seems that this isn't
happening. 

It seems to get the value 0x0800 which corresponds to IRQ11 being the
only level-triggered interrupt.  But I need IRQ10 to be level
triggered.  I hacked the code to set the 0x0400 bit, and it seems to
work OK without my other patch.

Now I just need a way to set this correctly at boot time without a
hack.

I currently have Linux compiled without ACPI support (as I don't
really want that and being an oldish notebook I gather it has a good
chance of causing problems) so that isn't fiddling with the ELCR.

So thank you for helping me a step further in understand, but now I
have a new question:

 How can I make sure the ELCR is set correctly?
and I guess,
 What is the correct setting?

My /proc/interrupts is below.

Thanks.

NeilBrown

           CPU0       
  0:     505852          XT-PIC  timer
  1:         10          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  4:         10          XT-PIC  serial
  8:          4          XT-PIC  rtc
 10:      16442          XT-PIC  yenta, yenta, ohci_hcd:usb1, ohci_hcd:usb2, ehci_hcd:usb4, eth0
 11:          0          XT-PIC  uhci_hcd:usb3
 12:        110          XT-PIC  i8042
 14:       5114          XT-PIC  ide0
 15:         38          XT-PIC  ide1
NMI:          0 
ERR:          0

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-09  9:48   ` How to correct ELCR? - was " Neil Brown
@ 2006-04-09 15:48     ` Francois Romieu
  2006-04-09 22:28       ` Neil Brown
  2006-04-11 17:07     ` Pavel Machek
  1 sibling, 1 reply; 14+ messages in thread
From: Francois Romieu @ 2006-04-09 15:48 UTC (permalink / raw)
  To: Neil Brown; +Cc: Robert Hancock, linux-kernel

Neil Brown <neilb@suse.de> :
[...]

Can you send an url for the exact rt2500 sources that you are
actually using ?

Just curious :o)

-- 
Ueimor

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
       [not found] ` <5ZoDL-3rE-7@gated-at.bofh.it>
@ 2006-04-09 18:12   ` Robert Hancock
  2006-04-09 18:24     ` Lee Revell
  0 siblings, 1 reply; 14+ messages in thread
From: Robert Hancock @ 2006-04-09 18:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Lee Revell

Lee Revell wrote:
> On Sat, 2006-04-08 at 14:10 +1000, Neil Brown wrote:
>>  To explain what I think is happening, let me start with a very simple
>>  case.  A number of PCI devices (this one included) have a number of
>>  events which can trigger an interrupt.  The events which are current
>>  are presented as bits in a register, and are ORed together (and
>>  possibly masked by another register) to make the IRQ line.
>>  When 1's are written to any bits in this register, it acknowledges
>>  the event and clears the bit.
>>  A typical code fragment is 
>>    events = read_register(INTERRUPTS);
>>    write_register(INTERRUPTS, events);
>>    ... handle each 1 bits in events ....
>>
> 
> Isn't a more typical IRQ handler:
> 
> while (events = read_register(INTERRUPTS) != 0) {
> 	...handle each bit in events and ACK it...
> }

That would be less efficient, it would read the register twice or more 
if any events have been set, and reading device registers can be 
expensive. In the unlikely event the event was set while inside the ISR 
the interrupt should be asserted again so there is no need to do this.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-09 18:12   ` Robert Hancock
@ 2006-04-09 18:24     ` Lee Revell
  0 siblings, 0 replies; 14+ messages in thread
From: Lee Revell @ 2006-04-09 18:24 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

On Sun, 2006-04-09 at 12:12 -0600, Robert Hancock wrote:
> Lee Revell wrote: 
> > Isn't a more typical IRQ handler:
> > 
> > while (events = read_register(INTERRUPTS) != 0) {
> >       ...handle each bit in events and ACK it...
> > }
> 
> That would be less efficient, it would read the register twice or more
> if any events have been set, and reading device registers can be 
> expensive. In the unlikely event the event was set while inside the
> ISR the interrupt should be asserted again so there is no need to do
> this. 

OK.  FWIW I am looking at the emu10k1 driver (though I've seen this in
others).  The OSS driver has this comment:

    /*
     ** NOTE :
     ** We do a 'while loop' here cos on certain machines, with both
     ** playback and recording going on at the same time, IRQs will
     ** stop coming in after a while. Checking IPND indeed shows that
     ** there are interrupts pending but the PIC says no IRQs pending.
     ** I suspect that some boards need edge-triggered IRQs but are not
     ** getting that condition if we don't completely clear the IPND
     ** (make sure no more interrupts are pending).
     ** - Eric
     */

The ALSA driver preserves the while loop but omits the comment :-/

Lee




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-09 15:48     ` Francois Romieu
@ 2006-04-09 22:28       ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-09 22:28 UTC (permalink / raw)
  To: Francois Romieu; +Cc: Robert Hancock, linux-kernel

On Sunday April 9, romieu@fr.zoreil.com wrote:
> Neil Brown <neilb@suse.de> :
> [...]
> 
> Can you send an url for the exact rt2500 sources that you are
> actually using ?
> 
> Just curious :o)

http://rt2x00.serialmonkey.com/rt2500-cvs-daily.tar.gz

(though that may change from day to day... I just checked, and *now*
it is the same as what I started with)

plus a bunch of printks, which I attach for your amusement :-)

NeilBrown


Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .assoc.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .auth.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .auth_rsp.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .connect.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .eeprom.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .md5.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .mlme.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rt2500.ko.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rt2500.mod.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rt2500.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_data.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_info.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_init.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_main.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_tkip.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .rtmp_wep.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .sanity.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .sync.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .tmp_versions
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: .wpa.o.cmd
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: assoc.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: auth.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: auth_rsp.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: connect.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: eeprom.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: md5.o
diff -ur rt2500-cvs-2006040703/Module/mlme.c /root/rt-latest/rt2500-cvs-2006040703/Module/mlme.c
--- rt2500-cvs-2006040703/Module/mlme.c	2006-04-05 13:52:43.000000000 +1000
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/mlme.c	2006-04-07 21:54:14.000000000 +1000
@@ -350,12 +350,19 @@
         This task guarantee only one MlmeHandler will run. 
     ==========================================================================
  */
+extern void dump_prio(PRTMP_ADAPTER A);
 VOID MlmeHandler(
     IN PRTMP_ADAPTER pAd) 
 {
     MLME_QUEUE_ELEM        *Elem = NULL;
     unsigned long flags;
-
+    int loops=0;
+    { int static done = 0;
+    if (!done) {
+	    done = 1;
+	    dump_prio(pAd);
+    }
+    }
     // Only accept MLME and Frame from peer side, no other (control/data) frame should
     // get into this state machine
 
@@ -411,6 +418,14 @@
         {
             printk(KERN_ERR DRV_NAME "ERROR: empty Elem in MlmeQueue\n");
         }
+	loops++;
+	if (loops > 10) {
+		static int done = 0;
+		if (done) break;
+		done=1;
+		dump_prio(pAd);
+		printk("BREAK\n"); break;
+	}
     }
 
     spin_lock_irqsave(&pAd->Mlme.TaskLock,flags);
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: mlme.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.ko
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.mod.c
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.mod.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2500.o
diff -ur rt2500-cvs-2006040703/Module/rt2560.h /root/rt-latest/rt2500-cvs-2006040703/Module/rt2560.h
--- rt2500-cvs-2006040703/Module/rt2560.h	2005-10-06 12:53:08.000000000 +1000
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/rt2560.h	2006-04-07 23:42:18.000000000 +1000
@@ -517,6 +517,15 @@
         ULONG		TwakeExpire:1;		// Wakeup timer expired interrupt
 		ULONG		TbcnExpire:1;		// Beacon timer expired interrupt
 #else
+/* 0x169
+ TbcnExpire
+ TxRingTxDone
+ PrioRingTxDone
+ RxDone
+ ExcryptionDone
+ fe14
+  1eb
+*/
         ULONG       TbcnExpire:1;       // Beacon timer expired interrupt
         ULONG       TwakeExpire:1;      // Wakeup timer expired interrupt
         ULONG       TatimwExpire:1;     // Timer of atim window expired interrupt
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rt2560.h.~1.4.~
diff -ur rt2500-cvs-2006040703/Module/rtmp_data.c /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_data.c
--- rt2500-cvs-2006040703/Module/rtmp_data.c	2005-11-25 22:30:47.000000000 +1100
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_data.c	2006-04-08 11:10:21.000000000 +1000
@@ -905,6 +905,25 @@
 	}
 }
 
+void dump_prio(PRTMP_ADAPTER A)
+{
+	int i;
+	u32 u;
+	PTXD_STRUC T;
+	printk("Prio: Index=%d Done=%d\n", A->CurPrioIndex, A->NextPrioDoneIndex);
+	for (i=0; i<PRIO_RING_SIZE; i++) {
+		T=(PTXD_STRUC)(A->PrioRing[i].va_addr);
+		printk("%3d: size=%d FT=%d own=%d vld=%d, rty=%d\n",
+		       i, A->PrioRing[i].size, A->PrioRing[i].FrameType,
+		       T->Owner, T->Valid, T->RetryCount);
+	}
+	printk("\n");
+	RTMP_IO_READ32(A, CSR7, &u);
+	printk("CSR7 is 0x%x\n", u);
+	RTMP_IO_READ32(A, CSR8, &u);
+	printk("CSR8 is 0x%x\n", u);
+}
+
 /*
 	========================================================================
 
@@ -947,10 +966,11 @@
         pTxD = &TxD;
         RTMPDescriptorEndianChange((PUCHAR)pTxD, TYPE_TXD);
 #endif
-
+	//printk("PrioDone for %d\n", pAdapter->NextPrioDoneIndex);
 		// Check for the descriptor ownership
 		if ((pTxD->Owner == DESC_OWN_NIC) || (pTxD->Valid == FALSE))
 		{
+			//printk("..no\n");
 #ifdef BIG_ENDIAN
             RTMPDescriptorEndianChange((PUCHAR)pTxD, TYPE_TXD);
             *pDestTxD = TxD;
@@ -2120,13 +2140,14 @@
     pTxD = &TxD;
     RTMPDescriptorEndianChange((PUCHAR)pTxD, TYPE_TXD);
 #endif
-		
+    printk("Queuing prio for %d\n", pAdapter->CurPrioIndex);
 	if (pTxD->Owner == DESC_OWN_NIC)
 	{
 		// Descriptor owned by NIC. No descriptor avaliable
 		// This should not happen since caller guaranteed.
 		// Make sure to release Prio ring resource
 		spin_unlock_irqrestore(&pAdapter->PrioRingLock, irqflag);
+		printk("...not mine\n");
 		return;
 	}
 	if (pTxD->Valid == TRUE)
@@ -2135,6 +2156,7 @@
 		// This should not happen since caller guaranteed.
 		// Make sure to release Prio ring resource
 		spin_unlock_irqrestore(&pAdapter->PrioRingLock, irqflag);
+		printk("...still valid\n");
 		return;
 	}
 	if (pBuffer == NULL)
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_data.c.~1.35.~
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_data.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_info.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_init.c.~1.27.~
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_init.o
diff -ur rt2500-cvs-2006040703/Module/rtmp_main.c /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_main.c
--- rt2500-cvs-2006040703/Module/rtmp_main.c	2006-02-25 20:45:51.000000000 +1100
+++ /root/rt-latest/rt2500-cvs-2006040703/Module/rtmp_main.c	2006-04-08 11:09:41.000000000 +1000
@@ -194,7 +194,7 @@
     // register_netdev() will call dev_alloc_name() for us
     // TODO: Remove the following line to keep the default eth%d name
     if (ifname == NULL)
-       strcpy(net_dev->name, "ra%d");
+       strcpy(net_dev->name, "eth%d");
     else
        strncpy(net_dev->name, ifname, IFNAMSIZ);
 
@@ -456,7 +456,7 @@
     int         ret = 0;
 
     DBGPRINT(RT_DEBUG_INFO, "====> RTMPHandleInterrupt\n");
-
+ again:
     // 1. Disable interrupt
 	if (RTMP_TEST_FLAG(pAdapter, fRTMP_ADAPTER_INTERRUPT_IN_USE) && RTMP_TEST_FLAG(pAdapter, fRTMP_ADAPTER_INTERRUPT_ACTIVE))
 	{
@@ -523,6 +523,7 @@
     if (IntSource.field.PrioRingTxDone)
     {
         DBGPRINT(RT_DEBUG_INFO, "====> RTMPHandlePrioRingTxDoneInterrupt\n");
+	printk("Word = %d\n", IntSource.word);
         RTMPHandlePrioRingTxDoneInterrupt(pAdapter);
         ret = 1;
     }
@@ -546,6 +547,7 @@
     if (RTMP_TEST_FLAG(pAdapter, fRTMP_ADAPTER_RESET_IN_PROGRESS))
     {
         ret = 1;
+	printk("Don't enable interrupt\n");
         goto out;
     }
 
@@ -554,12 +556,19 @@
     //
     NICEnableInterrupt(pAdapter);
 
+    RTMP_IO_READ32(pAdapter, CSR7, &IntSource.word);
+    if (IntSource.word & ~0xFE14) {
+	    /*printk("word now %x\n", IntSource.word); */
+	    // goto again;
+    }
     DBGPRINT(RT_DEBUG_INFO, "<==== RTMPHandleInterrupt\n");
 out:
 	if(ret)
 		return IRQ_RETVAL(IRQ_HANDLED);
-	else
+	else {
+//		printk("Nothing handled: %x\n", IntSource.word);
 		return IRQ_RETVAL(IRQ_NONE);
+	}
 }
 
 int rt2500_set_mac_address(struct net_device *net_dev, void *addr)
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_main.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_tkip.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: rtmp_wep.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: sanity.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: sync.o
Only in /root/rt-latest/rt2500-cvs-2006040703/Module: wpa.o


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-08  4:10 Neil Brown
  2006-04-08 16:31 ` Lee Revell
@ 2006-04-11 17:07 ` Pavel Machek
  2006-04-12  0:01   ` Neil Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2006-04-11 17:07 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel

Hi!

>  This is my first real introduction to the IRQ handling code in Linux,
>  so please forgive any little errors.  I'm fairly sure the big picture
>  is right, partly because the patch helps so much.
> 
>  To explain what I think is happening, let me start with a very simple
>  case.  A number of PCI devices (this one included) have a number of
>  events which can trigger an interrupt.  The events which are current
>  are presented as bits in a register, and are ORed together (and
>  possibly masked by another register) to make the IRQ line.
>  When 1's are written to any bits in this register, it acknowledges
>  the event and clears the bit.
>  A typical code fragment is 
>    events = read_register(INTERRUPTS);
>    write_register(INTERRUPTS, events);
>    ... handle each 1 bits in events ....
> 
>  This would normally clear all pending events and cause the interrupt
>  line to go low (or at least to not be asserted).
> 
>  However there is room for a race here.  If an event occurs between
>  the read and the write, then this will NOT de-assert the IRQ line.
>  It will remain asserted throughout.
> 
>  Now if the IRQ is handled as an edge-triggered line (which I believe
>  they are in Linux), then losing this race will mean that we don't see
>  any more interrupts on this line.

I believe that

a) any shared interrupts should be level-triggered. It is not okay to
share edge-triggered interrupt

b) your patch does not fix that issue. It only makes race window
smaller.

>  	if (!(action->flags & SA_INTERRUPT))
>  		local_irq_enable();
>  
>  	do {
>  		ret = action->handler(irq, action->dev_id, regs);
> -		if (ret == IRQ_HANDLED)
> +		if (ret == IRQ_HANDLED) {
>  			status |= action->flags;
> +			repeat = 1;
> +		}
>  		retval |= ret;
>  		action = action->next;
> +		if (!action &&
> +		    repeat &&
> +		    safeirq &&
> +		    (actionlist->flags & SA_SHIRQ)) {
> +			/* at least one handler on the list did something,
> +			 * and the interrupt is sharable, so give
> +			 * every handler another chance, incase a new event
> +			 * came in and is holding the irq line asserted.
> +			 */
> +			action = actionlist;
> +			repeat = 0;
> +		}
>  	} while (action);

I think it is still racy. What if another interrupt comes here?

>  	if (status & SA_SAMPLE_RANDOM)

								Pavel
-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-09  9:48   ` How to correct ELCR? - was " Neil Brown
  2006-04-09 15:48     ` Francois Romieu
@ 2006-04-11 17:07     ` Pavel Machek
  2006-04-12  4:01       ` Neil Brown
  1 sibling, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2006-04-11 17:07 UTC (permalink / raw)
  To: Neil Brown; +Cc: Robert Hancock, linux-kernel

Hi!

> > >  However there is room for a race here.  If an event occurs between
> > >  the read and the write, then this will NOT de-assert the IRQ line.
> > >  It will remain asserted throughout.
> > > 
> > >  Now if the IRQ is handled as an edge-triggered line (which I believe
> > >  they are in Linux), then losing this race will mean that we don't see
> > >  any more interrupts on this line.
> > 
> > PCI interrupts should always be level triggered, not edge triggered 
> 
> Ok... so I guess I jumped to the wrong conclusion. Thanks for
> straightening me out.
> But it is behaving like edge-triggered..
> 
> So I have explored about the i8259 (wikipedia helped) and discovered
> the ELCR (Edge/Level Control Register).  Apparently this is meant to
> be set up by the BIOS to the correct values.  It seems that this isn't
> happening. 
> 
> It seems to get the value 0x0800 which corresponds to IRQ11 being the
> only level-triggered interrupt.  But I need IRQ10 to be level
> triggered.  I hacked the code to set the 0x0400 bit, and it seems to
> work OK without my other patch.
> 
> Now I just need a way to set this correctly at boot time without a
> hack.
> 
> I currently have Linux compiled without ACPI support (as I don't
> really want that and being an oldish notebook I gather it has a good
> chance of causing problems) so that isn't fiddling with the ELCR.

Can you try to enable ACPI and/or APIC? Some machines are known to
require APIC...
								Pavel

-- 
Thanks for all the (sleeping) penguins.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-11 17:07 ` Pavel Machek
@ 2006-04-12  0:01   ` Neil Brown
  2006-04-13  5:41     ` Pavel Machek
  0 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2006-04-12  0:01 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

On Tuesday April 11, pavel@suse.cz wrote:
> > 
> >  Now if the IRQ is handled as an edge-triggered line (which I believe
> >  they are in Linux), then losing this race will mean that we don't see
> >  any more interrupts on this line.
> 
> I believe that
> 
> a) any shared interrupts should be level-triggered. It is not okay to
> share edge-triggered interrupt

I see that now, thanks.

> 
> b) your patch does not fix that issue. It only makes race window
> smaller.
> 
> >  	if (!(action->flags & SA_INTERRUPT))
> >  		local_irq_enable();
> >  
> >  	do {
> >  		ret = action->handler(irq, action->dev_id, regs);
> > -		if (ret == IRQ_HANDLED)
> > +		if (ret == IRQ_HANDLED) {
> >  			status |= action->flags;
> > +			repeat = 1;
> > +		}
> >  		retval |= ret;
> >  		action = action->next;
> > +		if (!action &&
> > +		    repeat &&
> > +		    safeirq &&
> > +		    (actionlist->flags & SA_SHIRQ)) {
> > +			/* at least one handler on the list did something,
> > +			 * and the interrupt is sharable, so give
> > +			 * every handler another chance, incase a new event
> > +			 * came in and is holding the irq line asserted.
> > +			 */
> > +			action = actionlist;
> > +			repeat = 0;
> > +		}
> >  	} while (action);
> 
> I think it is still racy. What if another interrupt comes here?

Well... at this point we are certain that the irq line was low at some
point since the loop was first entered, otherwise we would not have
been able to pass through all actions without getting a single
IRQ_HANDLED.
So if an interrupt happens now, it will generate an edge.
However I concede that I don't know how the PIC works exactly.  The
interrupt is still disabled in the PIC at this point.  If this means
that the edge will be ignored, then you are right.  If it just means
that the edge will be "delayed" and we will see the interrupt when
they are re-enabled at the PIC, then we should be safe.


> 
> >  	if (status & SA_SAMPLE_RANDOM)
> 
> 								Pavel
> -- 
> Thanks for all the (sleeping) penguins.

Do you have a gallery somewhere ???

Thanks for your input.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: How to correct ELCR? - was Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-11 17:07     ` Pavel Machek
@ 2006-04-12  4:01       ` Neil Brown
  0 siblings, 0 replies; 14+ messages in thread
From: Neil Brown @ 2006-04-12  4:01 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Robert Hancock, linux-kernel

On Tuesday April 11, pavel@suse.cz wrote:
> > 
> > I currently have Linux compiled without ACPI support (as I don't
> > really want that and being an oldish notebook I gather it has a good
> > chance of causing problems) so that isn't fiddling with the ELCR.
> 
> Can you try to enable ACPI and/or APIC? Some machines are known to
> require APIC...

Thanks for the suggestions.

I now have

CONFIG_X86_GOOD_APIC=y
CONFIG_X86_UP_APIC=y
CONFIG_X86_UP_IOAPIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y

in my .config, and it doesn't make any apparent difference.
I haven't tried ACPI yet... maybe in a couple of days.

Thanks again,
NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2.6.16] Shared interrupts sometimes lost
  2006-04-12  0:01   ` Neil Brown
@ 2006-04-13  5:41     ` Pavel Machek
  0 siblings, 0 replies; 14+ messages in thread
From: Pavel Machek @ 2006-04-13  5:41 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-kernel

Hi!

> > >  	if (status & SA_SAMPLE_RANDOM)
> > 
> > 								Pavel
> > -- 
> > Thanks for all the (sleeping) penguins.
> 
> Do you have a gallery somewhere ???

http://atrey.karlin.mff.cuni.cz/~pavel/swsusp/

								Pavel
-- 
Thanks, Sharp!

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-04-13  5:41 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <5Zd5E-3vi-7@gated-at.bofh.it>
2006-04-08 14:10 ` [PATCH 2.6.16] Shared interrupts sometimes lost Robert Hancock
2006-04-09  9:48   ` How to correct ELCR? - was " Neil Brown
2006-04-09 15:48     ` Francois Romieu
2006-04-09 22:28       ` Neil Brown
2006-04-11 17:07     ` Pavel Machek
2006-04-12  4:01       ` Neil Brown
     [not found] ` <5ZoDL-3rE-7@gated-at.bofh.it>
2006-04-09 18:12   ` Robert Hancock
2006-04-09 18:24     ` Lee Revell
2006-04-08  4:10 Neil Brown
2006-04-08 16:31 ` Lee Revell
2006-04-09  6:02   ` Neil Brown
2006-04-11 17:07 ` Pavel Machek
2006-04-12  0:01   ` Neil Brown
2006-04-13  5:41     ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).