better leve triggered IRQ management needed

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* better leve triggered IRQ management needed
@ 2006-04-24 18:41 Stephen Hemminger
  2006-04-24 18:59 ` linux-os (Dick Johnson)
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Stephen Hemminger @ 2006-04-24 18:41 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds; +Cc: linux-kernel

I am seeing repeated problems with misconfigured systems that have shared IRQ
devices configured for edge-triggered. Also, network devices using NAPI won't
work reliably on edge-triggered IRQ's.  The kernel IRQ architecture doesn't
have sufficient information to detect this at boot time.  
We should fail request_irq() if the SA_SHIRQ but the irq is edge-triggered.

Right now the concept of level vs edge triggered is buried in things like ELCR for old
PIC, and other stuff for IO-APIC.  There is a IRQ_LEVEL flag in the descriptor field
but nothing sets it or uses it.

Haven't even looked at non i386 arch's but probably even more confusion there.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 18:41 better leve triggered IRQ management needed Stephen Hemminger
@ 2006-04-24 18:59 ` linux-os (Dick Johnson)
  2006-04-24 19:02 ` Linus Torvalds
  2006-04-29 21:25 ` Alan Cox
  2 siblings, 0 replies; 33+ messages in thread
From: linux-os (Dick Johnson) @ 2006-04-24 18:59 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

On Mon, 24 Apr 2006, Stephen Hemminger wrote:

> I am seeing repeated problems with misconfigured systems that have shared IRQ
> devices configured for edge-triggered. Also, network devices using NAPI won't
> work reliably on edge-triggered IRQ's.  The kernel IRQ architecture doesn't
> have sufficient information to detect this at boot time.
> We should fail request_irq() if the SA_SHIRQ but the irq is edge-triggered.
>
> Right now the concept of level vs edge triggered is buried in things like ELCR for old
> PIC, and other stuff for IO-APIC.  There is a IRQ_LEVEL flag in the descriptor field
> but nothing sets it or uses it.
>
> Haven't even looked at non i386 arch's but probably even more confusion there.

Well ALL IRQs from the PCI are level so there is no way to misconfigure
it; or have you found some hidden method?

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 18:41 better leve triggered IRQ management needed Stephen Hemminger
  2006-04-24 18:59 ` linux-os (Dick Johnson)
@ 2006-04-24 19:02 ` Linus Torvalds
  2006-04-24 19:08   ` Linus Torvalds
                     ` (3 more replies)
  2006-04-29 21:25 ` Alan Cox
  2 siblings, 4 replies; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 19:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Morton, linux-kernel

On Mon, 24 Apr 2006, Stephen Hemminger wrote:
>
> We should fail request_irq() if the SA_SHIRQ but the irq is edge-triggered.

That would be HORRIBLE.

Edge-triggered works perfectly fine for SA_SHIRQ, as long as there is just 
one user and the driver is properly written. Making request_irq() fail 
would break existing and working setups.

If you have a driver that requires level-triggered interrupts, then your 
driver is arguably buggy. NAPI or no NAPI, doesn't matter. Edge-triggered 
interrupts is a fact of life, and deciding that you don't like them is not 
an excuse for saying "they should not work".

You can get an edge by having your driver make sure that it clears the 
interrupt source at some point where it requires an edge.

And yes, that may mean that when you're ready to start taking interrupts 
again, you are required to first read all pending packets, instead of just 
assuming that a level-triggered interrupt will "just happen", but that's 
the harsh reality for writing a driver that actually WORKS.

For a driver writer, there is one rule above _all_ other rules:

	"Reality sucks, deal with it"

That rule is inviolate, and no amount of "I wish", and "it _should_ work 
this way" or "..but the documentation says" matters at all.

If you can't take that rule, don't write drivers, and don't design 
infrastructure for them.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:02 ` Linus Torvalds
@ 2006-04-24 19:08   ` Linus Torvalds
  2006-04-24 19:53     ` Arjan van de Ven
  2006-04-24 19:15   ` Russell King
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 19:08 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Morton, linux-kernel

On Mon, 24 Apr 2006, Linus Torvalds wrote:
> 
> You can get an edge by having your driver make sure that it clears the 
> interrupt source at some point where it requires an edge.

Btw, this is why we do end up saying that having _two_ devices share 
an edge-triggered setup really is something we cannot necessarily 
fix. That said, it is better to limp along and work as well as you can 
than to just throw up your hands.

So even then, we should at least give the user the _chance_ of being able 
to log in and give a bug-report, rather than "oops, the harddisk won't 
work, because the BIOS sets it up to share an edge-triggered interrupt 
with the network".

However, I'm all for printing out a honking huge warning if we have two 
devices sharing the same edge-triggered interrupt. But a single device 
should work, or the driver should be considered broken.

[ Btw, the "disable_irq()/enable_irq()" subsystem has been written so that 
  when you disable an edge-triggered interrupt, and the edge happens while 
  the interrupt is disabled, we will re-play the interrupt at enable time. 
  Exactly so that drivers can have an easier time and don't have to 
  normally worry about whether something is edge or level-triggered.

  However, if you're within an interrupt, that doesn't mean that you can 
  just disable the irq and hope that it acts as if it was level-triggered 
  when you enable it again. ]

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:08   ` Linus Torvalds
@ 2006-04-24 19:53     ` Arjan van de Ven
  2006-04-24 20:16       ` Alan Cox
  0 siblings, 1 reply; 33+ messages in thread
From: Arjan van de Ven @ 2006-04-24 19:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: alan, Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, 2006-04-24 at 12:08 -0700, Linus Torvalds wrote:
> 
> On Mon, 24 Apr 2006, Linus Torvalds wrote:
> > 
> > You can get an edge by having your driver make sure that it clears the 
> > interrupt source at some point where it requires an edge.
> 
> Btw, this is why we do end up saying that having _two_ devices share 
> an edge-triggered setup really is something we cannot necessarily 
> fix. That said, it is better to limp along and work as well as you can 
> than to just throw up your hands.

we now have that neat polling thing Alan did for interrupts (but which
is optional). To limp along better the kernel could auto-enable that for
any such shared interrupt automatically as a "safe fallback"...
(or heck, if things are this broken, you probably want it for all
interrupts at that point just to be sure)


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:53     ` Arjan van de Ven
@ 2006-04-24 20:16       ` Alan Cox
  2006-04-24 20:43         ` Arjan van de Ven
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Cox @ 2006-04-24 20:16 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, alan, Stephen Hemminger, Andrew Morton,
	linux-kernel

On Mon, Apr 24, 2006 at 09:53:22PM +0200, Arjan van de Ven wrote:
> we now have that neat polling thing Alan did for interrupts (but which
> is optional). To limp along better the kernel could auto-enable that for
> any such shared interrupt automatically as a "safe fallback"...
> (or heck, if things are this broken, you probably want it for all
> interrupts at that point just to be sure)

That is really something drivers should handle themselves if they are doing
shared edge trigger. For one the kernel core has no idea the right polling
time and for two its often possible to pull dirty tricks to avoid the race.

Alan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 20:16       ` Alan Cox
@ 2006-04-24 20:43         ` Arjan van de Ven
  2006-04-24 21:07           ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Arjan van de Ven @ 2006-04-24 20:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, 2006-04-24 at 16:16 -0400, Alan Cox wrote:
> On Mon, Apr 24, 2006 at 09:53:22PM +0200, Arjan van de Ven wrote:
> > we now have that neat polling thing Alan did for interrupts (but which
> > is optional). To limp along better the kernel could auto-enable that for
> > any such shared interrupt automatically as a "safe fallback"...
> > (or heck, if things are this broken, you probably want it for all
> > interrupts at that point just to be sure)
> 
> That is really something drivers should handle themselves if they are doing
> shared edge trigger.

but the issue is .. drivers don't know. They didn't *want* edge trigger
in the first place generally

>  For one the kernel core has no idea the right polling
> time

well... the corner case (as rmk described) is full starvation; even
polling once per second is better than not polling at tall there.,..


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 20:43         ` Arjan van de Ven
@ 2006-04-24 21:07           ` Linus Torvalds
  2006-04-24 21:20             ` Alan Cox
                               ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 21:07 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, 24 Apr 2006, Arjan van de Ven wrote:
> 
> well... the corner case (as rmk described) is full starvation; even
> polling once per second is better than not polling at tall there.,..

I can confirm that even just polling once a second - or even less often - 
is a huge improvement.

A long time ago, I had a machine with a 3c509 card that would sometimes 
miss receive interrupts for some reason (it may actually have been a bug 
in the disable_irq/enable_irq thing on SMP, I forget - this is at least 
five years ago). 

The 3c509 driver had some five-second timeout thing, which meant that it 
would end up polling itself every five seconds regardless, and it made the 
difference between a machine that was totally undebuggable over a network, 
and one that actually worked surprisingly well (ie you could ssh into it, 
and things would work, but have these long pauses every once in a while, 
with burst of data).

Of course, the machine would have been totally useless for real work, but 
it made it _much_ easier to see what was going on when things went south.

So "limping along" when things don't work can be a huge time-saver from a 
debugging standpoint. So even if it's just that every registered SA_SHIRQ 
would get a heartbeat at least once every five seconds (and we'd limit it 
to SA_SHIRQ exactly because a driver that doesn't have that set may get 
confused if it gets extra interrupts), that might sound totally useless, 
but it might actually help somebody who otherwise might just make a pretty
useless "the machine hung" bug-report.

The fake interrupt could even print out a warning if somebody returns 
SA_HANDLED (since normally there _shouldn't_ have been any work to handle 
for it), and if that means that for somebody, things go from "the machine 
hung" to "the machine got very slow, and printed out 'fake interrupt for 
ide0 returned SA_HANDLED!'", that would potentially be a big debug aid.

We've had our ass saved quite a few times now by the irq storm detector 
("irq X: nobody cared" and friends), which has helped debug irqs that 
haven't been set up properly, that I'm convinced things like this might 
well make a huge deal.

Of course, "things like this" does not necessarily cover the above 
schenario. Maybe that is totally useless ;)

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 21:07           ` Linus Torvalds
@ 2006-04-24 21:20             ` Alan Cox
  2006-04-24 22:26               ` Linus Torvalds
  2006-04-24 21:22             ` [RFC 1/2] irq: record edge-level setting Stephen Hemminger
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: Alan Cox @ 2006-04-24 21:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Alan Cox, Stephen Hemminger, Andrew Morton,
	linux-kernel

On Mon, Apr 24, 2006 at 02:07:01PM -0700, Linus Torvalds wrote:
> debugging standpoint. So even if it's just that every registered SA_SHIRQ 
> would get a heartbeat at least once every five seconds (and we'd limit it 
> to SA_SHIRQ exactly because a driver that doesn't have that set may get 
> confused if it gets extra interrupts), that might sound totally useless, 
> but it might actually help somebody who otherwise might just make a pretty
> useless "the machine hung" bug-report.

Have to watch enable/disable_irq and the other races here.

> The fake interrupt could even print out a warning if somebody returns 
> SA_HANDLED (since normally there _shouldn't_ have been any work to handle 
> for it), and if that means that for somebody, things go from "the machine 
> hung" to "the machine got very slow, and printed out 'fake interrupt for 
> ide0 returned SA_HANDLED!'", that would potentially be a big debug aid.

There are high rate IRQ sources that would trigger that erratically due to
races but it could be useful in some kind of "linux irqdebug" mode

> We've had our ass saved quite a few times now by the irq storm detector 
> ("irq X: nobody cared" and friends), which has helped debug irqs that 
> haven't been set up properly, that I'm convinced things like this might 
> well make a huge deal.

Yep

Alan
--
  "... and for $64000 question, could you get yourself vaguely familiar with
		the notion of on-topic posting?"
				-- Al Viro


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 21:20             ` Alan Cox
@ 2006-04-24 22:26               ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 22:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Arjan van de Ven, Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, 24 Apr 2006, Alan Cox wrote:
> 
> > The fake interrupt could even print out a warning if somebody returns 
> > SA_HANDLED (since normally there _shouldn't_ have been any work to handle 
> > for it), and if that means that for somebody, things go from "the machine 
> > hung" to "the machine got very slow, and printed out 'fake interrupt for 
> > ide0 returned SA_HANDLED!'", that would potentially be a big debug aid.
> 
> There are high rate IRQ sources that would trigger that erratically due to
> races but it could be useful in some kind of "linux irqdebug" mode

I was thinking that an interrupt actually happening on that irq would set 
the "already done" flag, so that if it's a high-rate irq, then we'd not 
inject any new fake ones.

So the algorithm would be something like "clear the 'already done' flag 
every five seconds, and sending the fake one if it was already cleared", 
with normal interrupts always setting the flag.

And yes, you could still hit it in a blue moon (nothing happened for five 
seconds, and then it happens _just_ as we send a fake event), but if the 
only thing we do is do a printk() on it, no big deal if you get a false 
positive every five years.

(The bigger problem is that some drivers just return IRQ_HANDLED whether 
they had work to do or not, because people - including me - were lazy in 
some of the conversion of the irq_handler_t stuff).

I dunno. It _sounds_ simple enough..

			Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC 1/2] irq: record edge-level setting
  2006-04-24 21:07           ` Linus Torvalds
  2006-04-24 21:20             ` Alan Cox
@ 2006-04-24 21:22             ` Stephen Hemminger
  2006-04-24 21:49               ` Alan Cox
       [not found]             ` <20060424141926.3872f921@localhost.localdomain>
  2006-04-25 15:23             ` better leve triggered IRQ management needed Michael Buesch
  3 siblings, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2006-04-24 21:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Record the level vs edge-triggered status of IRQ to allow for error checks later.

Note: this is only done fir i386/x86_64.

--- irq.orig/arch/i386/kernel/i8259.c
+++ irq/arch/i386/kernel/i8259.c
@@ -128,11 +128,22 @@ int i8259A_irq_pending(unsigned int irq)
 	return ret;
 }
 
+static int i8259A_trigger(unsigned int irq)
+{
+	if (irq & 8)
+		return inb(0x4d1) & (1<< (irq-8));
+	else
+		return inb(0x4d0) & (1<<irq);
+}
+
+
 void make_8259A_irq(unsigned int irq)
 {
 	disable_irq_nosync(irq);
 	io_apic_irqs &= ~(1<<irq);
 	irq_desc[irq].handler = &i8259A_irq_type;
+	if (i8259A_trigger(irq))
+		irq_desc[irq].status |= IRQ_LEVEL;
 	enable_irq(irq);
 }
 
--- irq.orig/arch/i386/kernel/io_apic.c
+++ irq/arch/i386/kernel/io_apic.c
@@ -1186,17 +1186,23 @@ static inline void ioapic_register_intr(
 {
 	if (use_pci_vector() && !platform_legacy_irq(irq)) {
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
-				trigger == IOAPIC_LEVEL)
+		    trigger == IOAPIC_LEVEL) {
 			irq_desc[vector].handler = &ioapic_level_type;
-		else
+			irq_desc[vector].status |= IRQ_LEVEL;
+		} else {
 			irq_desc[vector].handler = &ioapic_edge_type;
+			irq_desc[vector].status &= ~IRQ_LEVEL;
+		}
 		set_intr_gate(vector, interrupt[vector]);
 	} else	{
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
-				trigger == IOAPIC_LEVEL)
+		    trigger == IOAPIC_LEVEL) {
 			irq_desc[irq].handler = &ioapic_level_type;
-		else
+			irq_desc[irq].status |= IRQ_LEVEL;
+		} else {
 			irq_desc[irq].handler = &ioapic_edge_type;
+			irq_desc[irq].status &= ~IRQ_LEVEL;
+		}
 		set_intr_gate(vector, interrupt[irq]);
 	}
 }
--- irq.orig/arch/x86_64/kernel/i8259.c
+++ irq/arch/x86_64/kernel/i8259.c
@@ -231,11 +231,22 @@ int i8259A_irq_pending(unsigned int irq)
 	return ret;
 }
 
+static int i8259A_trigger(unsigned int irq)
+{
+	if (irq & 8)
+		return inb(0x4d1) & (1<< (irq-8));
+	else
+		return inb(0x4d0) & (1<<irq);
+}
+
+
 void make_8259A_irq(unsigned int irq)
 {
 	disable_irq_nosync(irq);
 	io_apic_irqs &= ~(1<<irq);
 	irq_desc[irq].handler = &i8259A_irq_type;
+	if (i8259A_trigger(irq))
+		irq_desc[irq].status |= IRQ_LEVEL;
 	enable_irq(irq);
 }
 
--- irq.orig/arch/x86_64/kernel/io_apic.c
+++ irq/arch/x86_64/kernel/io_apic.c
@@ -848,17 +848,23 @@ static inline void ioapic_register_intr(
 {
 	if (use_pci_vector() && !platform_legacy_irq(irq)) {
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
-				trigger == IOAPIC_LEVEL)
+		    trigger == IOAPIC_LEVEL) {
 			irq_desc[vector].handler = &ioapic_level_type;
-		else
+			irq_desc[vector].status |= IRQ_LEVEL;
+		} else {
 			irq_desc[vector].handler = &ioapic_edge_type;
+			irq_desc[vector].status &= ~IRQ_LEVEL;
+		}
 		set_intr_gate(vector, interrupt[vector]);
 	} else	{
 		if ((trigger == IOAPIC_AUTO && IO_APIC_irq_trigger(irq)) ||
-				trigger == IOAPIC_LEVEL)
+		    trigger == IOAPIC_LEVEL) {
 			irq_desc[irq].handler = &ioapic_level_type;
-		else
+			irq_desc[irq].status |= IRQ_LEVEL;
+		} else {
 			irq_desc[irq].handler = &ioapic_edge_type;
+			irq_desc[irq].status &= ~IRQ_LEVEL;
+		}
 		set_intr_gate(vector, interrupt[irq]);
 	}
 }

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 1/2] irq: record edge-level setting
  2006-04-24 21:22             ` [RFC 1/2] irq: record edge-level setting Stephen Hemminger
@ 2006-04-24 21:49               ` Alan Cox
  2006-04-24 21:41                 ` Stephen Hemminger
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Cox @ 2006-04-24 21:49 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Linus Torvalds, linux-kernel

On Llu, 2006-04-24 at 14:22 -0700, Stephen Hemminger wrote:
> Record the level vs edge-triggered status of IRQ to allow for error checks later.
> 
> Note: this is only done fir i386/x86_64.

This doesn't work for IRQ's routed via the EISA IRQ routing or for MCA
that I can see. It also seems to assume the chip state at boot is right.
For EISA you need to real the EISA irq register to see what is level and
what is edge (and work out what is EISA), for MCA it is board dependant.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 1/2] irq: record edge-level setting
  2006-04-24 21:49               ` Alan Cox
@ 2006-04-24 21:41                 ` Stephen Hemminger
  2006-04-24 22:34                   ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Stephen Hemminger @ 2006-04-24 21:41 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel

On Mon, 24 Apr 2006 22:49:54 +0100
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> On Llu, 2006-04-24 at 14:22 -0700, Stephen Hemminger wrote:
> > Record the level vs edge-triggered status of IRQ to allow for error checks later.
> > 
> > Note: this is only done fir i386/x86_64.
> 
> This doesn't work for IRQ's routed via the EISA IRQ routing or for MCA
> that I can see. It also seems to assume the chip state at boot is right.
> For EISA you need to real the EISA irq register to see what is level and
> what is edge (and work out what is EISA), for MCA it is board dependant.

Maybe that's why it never was done in the past, too much work and historical
baggage.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 1/2] irq: record edge-level setting
  2006-04-24 21:41                 ` Stephen Hemminger
@ 2006-04-24 22:34                   ` Linus Torvalds
  2006-04-24 22:58                     ` Stephen Hemminger
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 22:34 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Alan Cox, linux-kernel

On Mon, 24 Apr 2006, Stephen Hemminger wrote:
> 
> Maybe that's why it never was done in the past, too much work and historical
> baggage.

It's messy. That whole ELCR register was mis-designed: you can change the 
edge/level detection with it, but since it _also_ changes the polarity of 
the signal, you can't actually do so from a sw angle, and it has to match 
the hardware. So you can't say "I want to treat this interrupt as level 
triggered", and just set the bit ;^/

To make matters worse, I wouldn't be in the least surprised if the ELCR 
register is totally ignored by many south-bridges for the internally 
generated interrupts (ie devices that are embedded in the SB), since the 
register really doesn't matter for them.

And it doesn't help that Intel mis-designed the edge-detection logic on 
the IO-APIC. On the old i8259, if you masked an interrupt and unmasked it, 
an active interrupt would always be seen as an edge, because the 
edge-detection was done _after_ masking. On the IO-APIC crap, the masking 
is done after edge-detection, so if you mask the APIC hardware level, and 
an edge happens, you'll never ever learn of it ever again.

I'm sure other system architectures have similar problems, but it's 
irritating.

			Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC 1/2] irq: record edge-level setting
  2006-04-24 22:34                   ` Linus Torvalds
@ 2006-04-24 22:58                     ` Stephen Hemminger
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2006-04-24 22:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, linux-kernel

On Mon, 24 Apr 2006 15:34:20 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> 
> 
> On Mon, 24 Apr 2006, Stephen Hemminger wrote:
> > 
> > Maybe that's why it never was done in the past, too much work and historical
> > baggage.
> 
> It's messy. That whole ELCR register was mis-designed: you can change the 
> edge/level detection with it, but since it _also_ changes the polarity of 
> the signal, you can't actually do so from a sw angle, and it has to match 
> the hardware. So you can't say "I want to treat this interrupt as level 
> triggered", and just set the bit ;^/
> 
> To make matters worse, I wouldn't be in the least surprised if the ELCR 
> register is totally ignored by many south-bridges for the internally 
> generated interrupts (ie devices that are embedded in the SB), since the 
> register really doesn't matter for them.
> 
> And it doesn't help that Intel mis-designed the edge-detection logic on 
> the IO-APIC. On the old i8259, if you masked an interrupt and unmasked it, 
> an active interrupt would always be seen as an edge, because the 
> edge-detection was done _after_ masking. On the IO-APIC crap, the masking 
> is done after edge-detection, so if you mask the APIC hardware level, and 
> an edge happens, you'll never ever learn of it ever again.

That is the kind of crap that makes NAPI difficult.
See Documentation/networking/NAPI_HOWTO.txt for rotting packet..

> I'm sure other system architectures have similar problems, but it's 
> irritating.
> 
> 			Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

[parent not found: <20060424141926.3872f921@localhost.localdomain>]

* [RFC 2/2] warn on shared edge-triggered irq
       [not found]             ` <20060424141926.3872f921@localhost.localdomain>
@ 2006-04-24 21:22               ` Stephen Hemminger
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2006-04-24 21:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel


Put out a warning if setting up shared irq that is edge triggered.
If this happens, interrupts can be lost, but perhaps it is laptop
with unused device, so let it go till later.


--- irq.orig/kernel/irq/manage.c
+++ irq/kernel/irq/manage.c
@@ -234,6 +234,9 @@ int setup_irq(unsigned int irq, struct i
 			desc->handler->startup(irq);
 		else
 			desc->handler->enable(irq);
+	} else if (!(desc->status & IRQ_LEVEL)) {
+		printk(KERN_CRIT "Irq %d (%s) is shared but not level triggered\n",
+		       irq, desc->handler->typename);
 	}
 	spin_unlock_irqrestore(&desc->lock,flags);
 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 21:07           ` Linus Torvalds
                               ` (2 preceding siblings ...)
       [not found]             ` <20060424141926.3872f921@localhost.localdomain>
@ 2006-04-25 15:23             ` Michael Buesch
  3 siblings, 0 replies; 33+ messages in thread
From: Michael Buesch @ 2006-04-25 15:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel,
	Arjan van de Ven

[-- Attachment #1: Type: text/plain, Size: 810 bytes --]

On Monday 24 April 2006 23:07, you wrote:
> A long time ago, I had a machine with a 3c509 card that would sometimes 

Heh, I still have this one in my server. :)

> The fake interrupt could even print out a warning if somebody returns 
> SA_HANDLED (since normally there _shouldn't_ have been any work to handle 
> for it),

Are you sure this can't race against the hardware?
Something like this:
Kernel                               Hardware
- generate fake IRQ
- enter the low level IRQ handling
                                     - hardware generates an IRQ and
                                       sets it's IRQ reason registers
                                       to "I have smthng to do"
- enter the handler and service
  the IRQ
- return SA_HANDLED

-- 
Greetings Michael.

[-- Attachment #2: Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:02 ` Linus Torvalds
  2006-04-24 19:08   ` Linus Torvalds
@ 2006-04-24 19:15   ` Russell King
  2006-04-24 20:18     ` Linus Torvalds
  2006-04-24 19:25   ` Stephen Hemminger
  2006-04-24 19:35   ` linux-os (Dick Johnson)
  3 siblings, 1 reply; 33+ messages in thread
From: Russell King @ 2006-04-24 19:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, Apr 24, 2006 at 12:02:47PM -0700, Linus Torvalds wrote:
> On Mon, 24 Apr 2006, Stephen Hemminger wrote:
> > We should fail request_irq() if the SA_SHIRQ but the irq is edge-triggered.
> 
> That would be HORRIBLE.
> 
> Edge-triggered works perfectly fine for SA_SHIRQ, as long as there is just 
> one user and the driver is properly written. Making request_irq() fail 
> would break existing and working setups.

Sorry, untrue.  If you take a serial port and a network card on the same
edge triggered interrupt line, take the following sequence of events:

1. serial port receives characters, asserts interrupt.

2. interrupt handlers get called, serial starts reading characters from
   the port.  Interrupt does not change state because there's still
   characters in the FIFO to be read.

3. meanwhile, the network interface receives a packet and asserts it's
   interrupt.  Interrupt does not change state since it's already asserted.

4. serial interrupt handler continues to service serial ports until it is
   damned sure all serial ports have released the interrupt line, and
   returns.  Interrupt does not change state because the network device
   is holding it asserted.

5. network interrupt handler gets invoked next (it's next in the chain)
   but hasn't acknowledged the interrupt.  Hence, the interrupt line has
   remained asserted since step 1.

6. serial port receives another character, asserting it's interrupt output.

7. network interrupt handler services network device.  Network device is
   no longer holding the interrupt line in the asserted state.  But the
   serial device is, so still no change in interrupt line state since
   step 1.

8. all handlers complete, kernel returns to foreground task.

9. No further serial or network interrupts because there's _no_ edge to
   trigger the interrupt.  Your serial and network are dead.

If you allow shared interrupts, no matter how hard you try in a driver,
you can NOT get around this problem.  It has to be handled at a higher
level.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 Serial core

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:15   ` Russell King
@ 2006-04-24 20:18     ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 20:18 UTC (permalink / raw)
  To: Russell King; +Cc: Stephen Hemminger, Andrew Morton, linux-kernel



On Mon, 24 Apr 2006, Russell King wrote:

> On Mon, Apr 24, 2006 at 12:02:47PM -0700, Linus Torvalds wrote:
> > On Mon, 24 Apr 2006, Stephen Hemminger wrote:
> > > We should fail request_irq() if the SA_SHIRQ but the irq is edge-triggered.
> > 
> > That would be HORRIBLE.
> > 
> > Edge-triggered works perfectly fine for SA_SHIRQ, as long as there is just 
> > one user and the driver is properly written. Making request_irq() fail 
> > would break existing and working setups.
> 
> Sorry, untrue.  If you take a serial port and a network card on the same
> edge triggered interrupt line [ ... ]

Read what I wrote!

	"..as long as there is just one user .."

In other words, you MUST NOT disallow request_irq(), just because the one 
user happens to use SA_SHIRQ.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:02 ` Linus Torvalds
  2006-04-24 19:08   ` Linus Torvalds
  2006-04-24 19:15   ` Russell King
@ 2006-04-24 19:25   ` Stephen Hemminger
  2006-04-24 19:35   ` linux-os (Dick Johnson)
  3 siblings, 0 replies; 33+ messages in thread
From: Stephen Hemminger @ 2006-04-24 19:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andrew Morton, linux-kernel

On Mon, 24 Apr 2006 12:02:47 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> 
> 
> On Mon, 24 Apr 2006, Stephen Hemminger wrote:
> >
> > We should fail request_irq() if the SA_SHIRQ but the irq is edge-triggered.
> 
> That would be HORRIBLE.
> 
> Edge-triggered works perfectly fine for SA_SHIRQ, as long as there is just 
> one user and the driver is properly written. Making request_irq() fail 
> would break existing and working setups.

Couldn't we at least warn. Because you will loose irq's if two devices
are sharing an edge triggered irq. If A and B are sharing a edge triggered
IRQ; and both cause a transition, then when A clears it's IRQ the 
shared IRQ will disappear and B's IRQ will be lost.

> If you have a driver that requires level-triggered interrupts, then your 
> driver is arguably buggy. NAPI or no NAPI, doesn't matter. Edge-triggered 
> interrupts is a fact of life, and deciding that you don't like them is not 
> an excuse for saying "they should not work".

Driver's need to be able to depend on not losing interrupts.

> You can get an edge by having your driver make sure that it clears the 
> interrupt source at some point where it requires an edge.

The problem is that IRQ system doesn't tell the driver the trigger status.
If the driver knew that the IRQ was edge triggered, it could do
the necessary workaround.

> For a driver writer, there is one rule above _all_ other rules:
> 
> 	"Reality sucks, deal with it"
> 
> That rule is inviolate, and no amount of "I wish", and "it _should_ work 
> this way" or "..but the documentation says" matters at all.

The kernel should make the driver writer's problems easier not harder.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:02 ` Linus Torvalds
                     ` (2 preceding siblings ...)
  2006-04-24 19:25   ` Stephen Hemminger
@ 2006-04-24 19:35   ` linux-os (Dick Johnson)
  2006-04-24 20:19     ` Linus Torvalds
  3 siblings, 1 reply; 33+ messages in thread
From: linux-os (Dick Johnson) @ 2006-04-24 19:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, 24 Apr 2006, Linus Torvalds wrote:

>
>
> On Mon, 24 Apr 2006, Stephen Hemminger wrote:
>>
>> We should fail request_irq() if the SA_SHIRQ but the irq is edge-triggered.
>
> That would be HORRIBLE.
>
> Edge-triggered works perfectly fine for SA_SHIRQ, as long as there is just
> one user and the driver is properly written. Making request_irq() fail
   ^^^^^^^^_______ Must be a trick!
> would break existing and working setups.
>

If there is just one user then it isn't shared! Get real.

> If you have a driver that requires level-triggered interrupts, then your
> driver is arguably buggy. NAPI or no NAPI, doesn't matter. Edge-triggered
> interrupts is a fact of life, and deciding that you don't like them is not
> an excuse for saying "they should not work".
>

It's trivial to write a driver where the ISR completely handles the
interrupt so that another edge can happen. It is impossible to write
a driver that shares such an edge-driven interrupt with another.

> You can get an edge by having your driver make sure that it clears the
> interrupt source at some point where it requires an edge.
>
> And yes, that may mean that when you're ready to start taking interrupts
> again, you are required to first read all pending packets, instead of just
> assuming that a level-triggered interrupt will "just happen", but that's
> the harsh reality for writing a driver that actually WORKS.
>
> For a driver writer, there is one rule above _all_ other rules:
>
> 	"Reality sucks, deal with it"
>
> That rule is inviolate, and no amount of "I wish", and "it _should_ work
> this way" or "..but the documentation says" matters at all.
>
> If you can't take that rule, don't write drivers, and don't design
> infrastructure for them.
>
> 		Linus

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 19:35   ` linux-os (Dick Johnson)
@ 2006-04-24 20:19     ` Linus Torvalds
  2006-04-24 20:50       ` linux-os (Dick Johnson)
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 20:19 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Stephen Hemminger, Andrew Morton, linux-kernel



On Mon, 24 Apr 2006, linux-os (Dick Johnson) wrote:
> > one user and the driver is properly written. Making request_irq() fail
>    ^^^^^^^^_______ Must be a trick!
> > would break existing and working setups.
> >
> 
> If there is just one user then it isn't shared! Get real.

SA_SHIRQ does NOT mean that the irq is shared.

It means that it's not exclusive, and that the driver is _ok_ with it 
being shared if that makes sense.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 20:19     ` Linus Torvalds
@ 2006-04-24 20:50       ` linux-os (Dick Johnson)
  2006-04-24 21:09         ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: linux-os (Dick Johnson) @ 2006-04-24 20:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, 24 Apr 2006, Linus Torvalds wrote:

>
>
> On Mon, 24 Apr 2006, linux-os (Dick Johnson) wrote:
>>> one user and the driver is properly written. Making request_irq() fail
>>    ^^^^^^^^_______ Must be a trick!
>>> would break existing and working setups.
>>>
>>
>> If there is just one user then it isn't shared! Get real.
>
> SA_SHIRQ does NOT mean that the irq is shared.
>
> It means that it's not exclusive, and that the driver is _ok_ with it
> being shared if that makes sense.
>
> 		Linus
> -

Yeah. You have been talking to too many lawyers! You are getting a
forked tongue!

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.89 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 20:50       ` linux-os (Dick Johnson)
@ 2006-04-24 21:09         ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2006-04-24 21:09 UTC (permalink / raw)
  To: linux-os (Dick Johnson); +Cc: Stephen Hemminger, Andrew Morton, linux-kernel

On Mon, 24 Apr 2006, linux-os (Dick Johnson) wrote:
> On Mon, 24 Apr 2006, Linus Torvalds wrote:
> >
> > SA_SHIRQ does NOT mean that the irq is shared.
> >
> > It means that it's not exclusive, and that the driver is _ok_ with it
> > being shared if that makes sense.
> 
> Yeah. You have been talking to too many lawyers! You are getting a
> forked tongue!

No, it's just legacy from some _really_ really old code. As in 1991.

The very original Linux irq system didn't share interrupts at all (hey, 
PCI was newfangled, and ISA interrupts ruled), so when interrupt sharing 
was added, the default was to not do it.

These days, that doesn't make any sense, and if somebody did the flags 
today, you'd do it the other way around (default to shared, and if 
somebody wants a really exclusive interrupt, they should say so with 
SA_EXCLUSIVE or something). 

But Linux grew from humble and stupid roots.

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-24 18:41 better leve triggered IRQ management needed Stephen Hemminger
  2006-04-24 18:59 ` linux-os (Dick Johnson)
  2006-04-24 19:02 ` Linus Torvalds
@ 2006-04-29 21:25 ` Alan Cox
  2006-04-29 21:58   ` Linus Torvalds
  2 siblings, 1 reply; 33+ messages in thread
From: Alan Cox @ 2006-04-29 21:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Andrew Morton, Linus Torvalds, linux-kernel

On Llu, 2006-04-24 at 11:41 -0700, Stephen Hemminger wrote:
> I am seeing repeated problems with misconfigured systems that have shared IRQ
> devices configured for edge-triggered.

I've been thinking about this a chunk more. The embedded folks have been
having a related argument about SA_EDGE and SA_LEVEL or similar. On some
embedded platforms the driver really has to pass this information
according to the board configuration.

Trying to guess the current IRQ level v edge on a PC is very hard.
Trying to set it correctly from the driver is rather easier.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-29 21:25 ` Alan Cox
@ 2006-04-29 21:58   ` Linus Torvalds
  2006-04-30  4:48     ` Neil Brown
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2006-04-29 21:58 UTC (permalink / raw)
  To: Alan Cox; +Cc: Stephen Hemminger, Andrew Morton, linux-kernel

On Sat, 29 Apr 2006, Alan Cox wrote:
> 
> Trying to guess the current IRQ level v edge on a PC is very hard.
> Trying to set it correctly from the driver is rather easier.

I disagree. It's not any easier at all.

On PC's (x86 and x86-64) we actually already set the ELCR as well as we 
can (look for "eisa_set_level_irq()"). And a driver _literally_ cannot 
change it from the system value, because of the polarity confusion.

In the other cases (IO-APIC) we usually have it level, but when we have it 
marked as an edge, there is almost always a real reason for that too (ie 
legacy interrupt, it really _is_ edge-high, not level-low).

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-29 21:58   ` Linus Torvalds
@ 2006-04-30  4:48     ` Neil Brown
  2006-04-30  5:19       ` Linus Torvalds
  2006-04-30  7:36       ` Arjan van de Ven
  0 siblings, 2 replies; 33+ messages in thread
From: Neil Brown @ 2006-04-30  4:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel

On Saturday April 29, torvalds@osdl.org wrote:
> 
> 
> On Sat, 29 Apr 2006, Alan Cox wrote:
> > 
> > Trying to guess the current IRQ level v edge on a PC is very hard.
> > Trying to set it correctly from the driver is rather easier.
> 
> I disagree. It's not any easier at all.
> 
> On PC's (x86 and x86-64) we actually already set the ELCR as well as we 
> can (look for "eisa_set_level_irq()"). And a driver _literally_ cannot 
> change it from the system value, because of the polarity confusion.
> 
> In the other cases (IO-APIC) we usually have it level, but when we have it 
> marked as an edge, there is almost always a real reason for that too (ie 
> legacy interrupt, it really _is_ edge-high, not level-low).

So what do you propose should be done to better handle such poorly
built machines?

As a concrete example I have a notebook which definitely assigns
shared interrupts to IRQ-10 (See /proc/interrupts below) yet the ELCR
only flags IRQ-11 as being level triggered and the rest are edge
triggered.
And with this configuration I definitely lose interrupts to the
wireless ethernet (ra0).

How do I make this work reliably?
I could:

1/ modify handle_IRQ_event so that it is more resilient to the
  possibility that shared interrupts are edge triggered.  This can be
  done be iterating over all action->handlers until they all return
  IRQ_NONE.

2/ Arrange that the ELCR bit is set for any IRQ for which a shared
  interrupt is registered (on the basis that the code for handling
  shared interrupts is not resilient against them being edge triggered).

3/ Have a kernel parameter, or sysfs variable, or magic
  write-to-/proc/interrupts of something that allows the ELCR to be read
  and set, and leave it up to user-space to perform the risky task of
  fiddling with ELCR

4/ As userspace can do inb/outb itself simply leave it all to
  userspace to worry about.

5/ Something I haven't thought of.

I don't much care which (those 2 seems best based on my limited
understanding) but I would be good to know how you think this should
be handled so that progress can be made.

Thanks,
NeilBrown

           CPU0       
  0:  180230371          XT-PIC  timer
  1:         91          XT-PIC  i8042
  2:          0          XT-PIC  cascade
  4:         10          XT-PIC  serial
  8:          4          XT-PIC  rtc
 10:    3812362          XT-PIC  yenta, yenta, ohci_hcd:usb2, ohci_hcd:usb3, ehci_hcd:usb4, ra0
 11:          0          XT-PIC  uhci_hcd:usb1
 12:       3290          XT-PIC  i8042
 14:      63804          XT-PIC  ide0
 15:         37          XT-PIC  ide1
NMI:          0 
LOC:          0 
ERR:          0
MIS:          0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-30  4:48     ` Neil Brown
@ 2006-04-30  5:19       ` Linus Torvalds
  2006-04-30  6:13         ` Neil Brown
  2006-04-30  7:36       ` Arjan van de Ven
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2006-04-30  5:19 UTC (permalink / raw)
  To: Neil Brown; +Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel

On Sun, 30 Apr 2006, Neil Brown wrote:
> 
> So what do you propose should be done to better handle such poorly
> built machines?

Well, the thing is, there's not a lot we _can_ do.

We can try to report it. We can also try to handle it as gracefully as we 
can.

> As a concrete example I have a notebook which definitely assigns
> shared interrupts to IRQ-10 (See /proc/interrupts below) yet the ELCR
> only flags IRQ-11 as being level triggered and the rest are edge
> triggered.

Also, do you have the option to enable the IO-APIC? Maybe it's already 
enabled, and your BIOS has just disabled it, but your /proc/interrupts 
implies that you may have compiled your kernel without UP_APIC support.

With the APIC, we might be able to do better. Worth trying out.

> And with this configuration I definitely lose interrupts to the
> wireless ethernet (ra0).
> 
> How do I make this work reliably?
> I could:
> 
> 1/ modify handle_IRQ_event so that it is more resilient to the
>   possibility that shared interrupts are edge triggered.  This can be
>   done be iterating over all action->handlers until they all return
>   IRQ_NONE.

Well, yes. It's worth trying, but as mentioned, we have some drivers that 
return IRQ_HANDLED just because the driver conversion has been lazy. So 
limit it to a few things.

And we really should have some flag that says whether the interrupt 
descriptor ends up beign edge, so that we could do this for edge-triggered 
interrupts _only_.

Anyway, I also do wonder if your irq lossage is due to something else.

On the XT-PIC, disabling the irq will cause an edge when it's re-enabled, 
so you can get the "level" behaviour by disabling the irq over the irq 
handler.

And that's exactly what we do, if I recall correctly. It's been years 
since I worked with that code, but looking at it quickly, it seems to 
match my recollection.

> 2/ Arrange that the ELCR bit is set for any IRQ for which a shared
>   interrupt is registered (on the basis that the code for handling
>   shared interrupts is not resilient against them being edge triggered).

NO.

How many times do I have to say this?

Yes, ELCR sets edge vs level.

BUT IT ALSO SETS THE POLARITY.  If you switch the bit around, it will also 
switch the polarity, and IT WILL NOT WORK. Because you'll end up with a 
level-triggered interrupt that is level-triggered for the wrong polarity, 
and will trigger whenever there is _not_ an interrupt pending.

Now, I will almost guarantee you that there is an exception to this rule 
(hey, it's PC hardware, there's _always_ an exception to any rule ;), and 
on some situations, the ELCR thing will truly only affect edge vs level.

But the point is, we can't just switch to level triggered. There simply is 
no such hardware in general for the old PC interrupts.

(Now, _if_ you use the APIC, you can actually switch polarity and trigger 
mode independently. Which is one reason why I'd like to hear whether you 
perhaps have just disabled the APIC by mistake, rather than have a nasty 
BIOS that disables it for you).

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-30  5:19       ` Linus Torvalds
@ 2006-04-30  6:13         ` Neil Brown
  2006-04-30  6:59           ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Neil Brown @ 2006-04-30  6:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel

On Saturday April 29, torvalds@osdl.org wrote:
> 
> 
> On Sun, 30 Apr 2006, Neil Brown wrote:
> > 
> > So what do you propose should be done to better handle such poorly
> > built machines?
> 
> Well, the thing is, there's not a lot we _can_ do.
> 
> We can try to report it. We can also try to handle it as gracefully as we 
> can.
> 
> > As a concrete example I have a notebook which definitely assigns
> > shared interrupts to IRQ-10 (See /proc/interrupts below) yet the ELCR
> > only flags IRQ-11 as being level triggered and the rest are edge
> > triggered.
> 
> Also, do you have the option to enable the IO-APIC? Maybe it's already 
> enabled, and your BIOS has just disabled it, but your /proc/interrupts 
> implies that you may have compiled your kernel without UP_APIC support.
> 
> With the APIC, we might be able to do better. Worth trying out.
> 

I have tried compiling with APIC and ACPI support (in various
combinations) and neither make a noticeable difference.  I haven't
looked at the BIOS setting yet, though I don't remember seeing
anything like that (it's been a while though).


> > And with this configuration I definitely lose interrupts to the
> > wireless ethernet (ra0).
> > 
> > How do I make this work reliably?
> > I could:
> > 
> > 1/ modify handle_IRQ_event so that it is more resilient to the
> >   possibility that shared interrupts are edge triggered.  This can be
> >   done be iterating over all action->handlers until they all return
> >   IRQ_NONE.
> 
> Well, yes. It's worth trying, but as mentioned, we have some drivers that 
> return IRQ_HANDLED just because the driver conversion has been lazy. So 
> limit it to a few things.

I tried it and it solved my problem.  However I appreciate that would
be a risky change for the reasons you mention.

> 
> Anyway, I also do wonder if your irq lossage is due to something else.
> 

Maybe.  But all the symptoms I have found are completely consistent
with them being edge triggered.  That's no proof of course....


> On the XT-PIC, disabling the irq will cause an edge when it's re-enabled, 
> so you can get the "level" behaviour by disabling the irq over the irq 
> handler.
> 
> And that's exactly what we do, if I recall correctly. It's been years 
> since I worked with that code, but looking at it quickly, it seems to 
> match my recollection.
> 
> > 2/ Arrange that the ELCR bit is set for any IRQ for which a shared
> >   interrupt is registered (on the basis that the code for handling
> >   shared interrupts is not resilient against them being edge triggered).
> 
> NO.
> 
> How many times do I have to say this?
> 
> Yes, ELCR sets edge vs level.
> 
> BUT IT ALSO SETS THE POLARITY.  If you switch the bit around, it will also 
> switch the polarity, and IT WILL NOT WORK. Because you'll end up with a 
> level-triggered interrupt that is level-triggered for the wrong polarity, 
> and will trigger whenever there is _not_ an interrupt pending.

The thing is: This is exactly what I am currently doing to solve the
problem.
I hacked my kernel to flip the '10' bit, and the problem went away.


> 
> Now, I will almost guarantee you that there is an exception to this rule 
> (hey, it's PC hardware, there's _always_ an exception to any rule ;), and 
> on some situations, the ELCR thing will truly only affect edge vs level.
> 
> But the point is, we can't just switch to level triggered. There simply is 
> no such hardware in general for the old PC interrupts.
> 
> (Now, _if_ you use the APIC, you can actually switch polarity and trigger 
> mode independently. Which is one reason why I'd like to hear whether you 
> perhaps have just disabled the APIC by mistake, rather than have a nasty 
> BIOS that disables it for you).
> 

I'll see what I can find, and report back if I find anything
interesting.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-30  6:13         ` Neil Brown
@ 2006-04-30  6:59           ` Linus Torvalds
  2006-05-02  5:10             ` Neil Brown
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2006-04-30  6:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel

On Sun, 30 Apr 2006, Neil Brown wrote:
> 
> The thing is: This is exactly what I am currently doing to solve the
> problem.

I'm not entirely surprised. As mentioned, the ELCR register _originally_ 
selected between EISA (level) and ISA (edge) interrupts.

But EISA (and later PCI) interrupts were not just level, they were level 
active _low_. While old ISA interrupts are edge-triggered, active _high_.

Which explains why it not only changes the trigger, but also the polarity.

Now, fast-forward a decade or two, and imagine that the world is 99% PCI, 
and nobody really has any devices that are _electrically_ ISA any more, 
but there are some legacy stuff that _looks_ like ISA. What would you do 
to simplify your life from a hw perspective?

I suspect that the thing to do is to internally just say that all 
interrupts are active low. There's no reason to _really_ have active high, 
because there are no real devices left that drive the irq line that way. 

Now, the _sane_ thing to do would be to also make all interrupts be 
level-triggered, and make the whole ELCR register be a total dummy 
register. But you can't really do that without being worried about 
breaking compatibility (for example, the timer interrupt is a 50% 
duty-cycle on/off thing, so it really _does_ end up being edge-triggered). 

So you leave the ELCR register mattering for a edge/level thing, but the 
polarity issue is just gone.

But then on _other_ southbridges, you'll have the old behaviour, and there 
simply is no way for the OS to know. Yeah, we could look at the 
nortbridge and southbridge combination, and perhaps know that some of them 
always have a "active low" polarity regardless of ELCR. But nobody even 
_documents_ these things, exactly because it's not supposed to matter.

So we're kind of screwed. We have to _act_ as if we still lived in the 
middle ages, and people still used edge-triggered active-high interrupts. 
Even when it's not necessarily the case any more..

Gaah.

That said, I'm surprised that the kernel doesn't set ELCR for you. If it 
sees a PCI device, it really should know that it's a PCI interrupt. I 
wonder if we should do the following.. (Does this automatically make it do 
the right thing on your machine?)

			Linus

---
diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
index 7323544..6e3eaef 100644
--- a/arch/i386/pci/irq.c
+++ b/arch/i386/pci/irq.c
@@ -881,6 +881,7 @@ static int pcibios_lookup_irq(struct pci
 	((!(pci_probe & PCI_USE_PIRQ_MASK)) || ((1 << irq) & mask)) ) {
 		DBG(" -> got IRQ %d\n", irq);
 		msg = "Found";
+		eisa_set_level_irq(newirq);
 	} else if (newirq && r->set && (dev->class >> 8) != PCI_CLASS_DISPLAY_VGA) {
 		DBG(" -> assigning IRQ %d", newirq);
 		if (r->set(pirq_router_dev, dev, pirq, newirq)) {

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-30  6:59           ` Linus Torvalds
@ 2006-05-02  5:10             ` Neil Brown
  2006-05-02 15:05               ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Neil Brown @ 2006-05-02  5:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel

On Saturday April 29, torvalds@osdl.org wrote:
> 
> That said, I'm surprised that the kernel doesn't set ELCR for you. If it 
> sees a PCI device, it really should know that it's a PCI interrupt. I 
> wonder if we should do the following.. (Does this automatically make it do 
> the right thing on your machine?)
> 
> 			Linus
> 
> ---
> diff --git a/arch/i386/pci/irq.c b/arch/i386/pci/irq.c
> index 7323544..6e3eaef 100644
> --- a/arch/i386/pci/irq.c
> +++ b/arch/i386/pci/irq.c
> @@ -881,6 +881,7 @@ static int pcibios_lookup_irq(struct pci
>  	((!(pci_probe & PCI_USE_PIRQ_MASK)) || ((1 << irq) & mask)) ) {
>  		DBG(" -> got IRQ %d\n", irq);
>  		msg = "Found";
> +		eisa_set_level_irq(newirq);
>  	} else if (newirq && r->set && (dev->class >> 8) != PCI_CLASS_DISPLAY_VGA) {
>  		DBG(" -> assigning IRQ %d", newirq);
>  		if (r->set(pirq_router_dev, dev, pirq, newirq)) {
> -

Yes, this helps.  It sets the offending IRQ to be level triggered, so
the wireless card works nicely.

My only concern is that dmesg contains:

[354446.223241] PCI: Using IRQ router PIIX/ICH [8086/7110] at 0000:00:07.0
[354446.223302] PCI: IRQ 0 for device 0000:00:04.0 doesn't match PIRQ mask - try
 pci=usepirqmask
[354446.223363] PCI: setting IRQ 0 as level-triggered
[354446.223401] PCI: Found IRQ 10 for device 0000:00:04.0
[354446.223446] PCI: Sharing IRQ 10 with 0000:00:04.1


Setting IRQ 0 to level-triggered doesn't seem healthy as it is the
timer interrupt.

It definitely gets IRQ 10 (the problematic one - 0:04 is the PCMCIA
controller) and IRQ 11 (which was already level-triggered).
e.g.


[354446.228016] PCI: setting IRQ 10 as level-triggered
[354446.228060] PCI: Found IRQ 10 for device 0000:00:04.0
[354446.228140] PCI: Sharing IRQ 10 with 0000:00:04.1
[354446.228218] PCI: Found IRQ 10 for device 0000:00:04.1
[354446.228268] PCI: Sharing IRQ 10 with 0000:00:04.0

A subsequent printout of the ELCR show the two bytes to be
 00 and 0c

representing IRQ10 and IRQ11 - so it seems the setting of IRQ-0 to
level triggered didn't have a lasting effect.

Maybe the eisa_set_level_irq should be passed 'irq' rather than
'newirq' ??

NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-05-02  5:10             ` Neil Brown
@ 2006-05-02 15:05               ` Linus Torvalds
  0 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2006-05-02 15:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: Alan Cox, Stephen Hemminger, Andrew Morton, linux-kernel

On Tue, 2 May 2006, Neil Brown wrote:
> 
> Maybe the eisa_set_level_irq should be passed 'irq' rather than 'newirq' 
> ??

Yeah, stupid cut-and-paste error (the eisa_set_level_irq() call _is_ 
already there in the PCI irq setting, for the case where we actually have 
to set up routing that didn't exist before).

That's also why I'm a bit nervous even about my stupid one-liner patch: if 
the irq routing is already set up, and we just use the irq we're told to 
use, I'm not sure we should touch ELCR even if it "looks wrong". It 
obviously works on your machine, but I wonder what could break on others..

		Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: better leve triggered IRQ management needed
  2006-04-30  4:48     ` Neil Brown
  2006-04-30  5:19       ` Linus Torvalds
@ 2006-04-30  7:36       ` Arjan van de Ven
  1 sibling, 0 replies; 33+ messages in thread
From: Arjan van de Ven @ 2006-04-30  7:36 UTC (permalink / raw)
  To: Neil Brown
  Cc: Linus Torvalds, Alan Cox, Stephen Hemminger, Andrew Morton,
	linux-kernel

> .
> 
> 5/ Something I haven't thought of.


do a background poll and if that gets a lot of "hits" maybe increase the
frequency of it


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2006-05-02 15:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-24 18:41 better leve triggered IRQ management needed Stephen Hemminger
2006-04-24 18:59 ` linux-os (Dick Johnson)
2006-04-24 19:02 ` Linus Torvalds
2006-04-24 19:08   ` Linus Torvalds
2006-04-24 19:53     ` Arjan van de Ven
2006-04-24 20:16       ` Alan Cox
2006-04-24 20:43         ` Arjan van de Ven
2006-04-24 21:07           ` Linus Torvalds
2006-04-24 21:20             ` Alan Cox
2006-04-24 22:26               ` Linus Torvalds
2006-04-24 21:22             ` [RFC 1/2] irq: record edge-level setting Stephen Hemminger
2006-04-24 21:49               ` Alan Cox
2006-04-24 21:41                 ` Stephen Hemminger
2006-04-24 22:34                   ` Linus Torvalds
2006-04-24 22:58                     ` Stephen Hemminger
     [not found]             ` <20060424141926.3872f921@localhost.localdomain>
2006-04-24 21:22               ` [RFC 2/2] warn on shared edge-triggered irq Stephen Hemminger
2006-04-25 15:23             ` better leve triggered IRQ management needed Michael Buesch
2006-04-24 19:15   ` Russell King
2006-04-24 20:18     ` Linus Torvalds
2006-04-24 19:25   ` Stephen Hemminger
2006-04-24 19:35   ` linux-os (Dick Johnson)
2006-04-24 20:19     ` Linus Torvalds
2006-04-24 20:50       ` linux-os (Dick Johnson)
2006-04-24 21:09         ` Linus Torvalds
2006-04-29 21:25 ` Alan Cox
2006-04-29 21:58   ` Linus Torvalds
2006-04-30  4:48     ` Neil Brown
2006-04-30  5:19       ` Linus Torvalds
2006-04-30  6:13         ` Neil Brown
2006-04-30  6:59           ` Linus Torvalds
2006-05-02  5:10             ` Neil Brown
2006-05-02 15:05               ` Linus Torvalds
2006-04-30  7:36       ` Arjan van de Ven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox