public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: peter swain <pswain@akamai.com>
To: "Udo A. Steinberg" <us15@os.inf.tu-dresden.de>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Len Brown <len.brown@intel.com>,
	andrews@gate.ort.spb.ru, scott.feldman@intel.com
Subject: [patch] e100 and shared interrupts [was: Spurious interrupts when SCI shared with e100]
Date: Mon, 08 Nov 2004 12:54:34 -0800	[thread overview]
Message-ID: <418FDD0A.5010308@akamai.com> (raw)
In-Reply-To: <20041108115955.1c8bf10f.us15@os.inf.tu-dresden.de>

[-- Attachment #1: Type: text/plain, Size: 3463 bytes --]

Udo A. Steinberg wrote:

>My laptop has IRQ9 configured as ACPI SCI. When IRQ9 is shared between ACPI
>and e100 an IRQ9 storm occurs when e100 is enabled, as can be seen in the
>dmesg output below. The kernel then disables IRQ9, thus preventing e100 from
>working properly. The problem does not occur, if I override the default PCI
>steering in the BIOS, e.g. by routing LNKA to IRQ11.
>
>Nonetheless it would be good if someone could figure out why sharing IRQ9 
>is problematic.
>  
>
Udo, please try the attached 2.6.9 patch.

I suspect intel e100 driver always had problems with shared interrupts, 
due to a classic case of
"executable comment syndrome" in the original intel driver.
I noticed races on some of our ancient boxes with e100s, sharing 
interrupts with other e100s, or other cards.

The intel e100 v2.0.40 e100intr() code says.....
        intr_status = readw(&bdp->scb->scb_status);
        /* If not my interrupt, just return */
        if (!(intr_status & SCB_STATUS_ACK_MASK) || (intr_status == 
0xffff)) {
                return IRQ_NONE;
        }

But the test above is *not* "not my interrupt" but "no interesting 
conditions".
Both tests are needed for reliable operation in a shared-irq scenario.

When the driver is meddling with e100 setup, causing intr_status bits to 
pop up, but has not yet enabled
e100 interrupts, a "stray" interrupt from a device sharing the IRQ will 
cause e100intr() to walk all over the
device even if the driver is in a critical section.  Chaos ensues -- in 
my case duplicate skb_free calls when
a parasitic interrupt "completed" processing of tx ring skbs which napi 
bh was still setting up.

Compare with becker's eepro100, where there are (IIRC) 3 locks -- a lock 
on tx resources, a lock on rx resources, and an implicit lock by way of 
the card's interrupt-enable bit.  If you enable interrupts, you're
allowing the irq-handler to play with tx,rx resources without any other 
locks.  So a courteous irq-handler will
check to see if it has been granted the privilege, by inspecting the 
interrupt-enable bit.
Other drivers follow this model, but e100 driver merely has a misleading 
comment instead of the check.

I'd guess that Udo's problem appears when...
- e100 driver is meddling with card, with e100 interrupt enable *off*
- ACPI interrupt causes e100intr to be invoked parasitcally, cleaning up 
(or breaking) some half-finished
   work intended to be guarded by interrupt-enable bit
- driver enables e100 interrupt
- e100-driven IRQ calls e100intr(), finds no work to do
- (conjecture) 2.6.10-rc1 actually checks the IRQ_NONE return, notes 
spurious interrupt & disables IRQ

Our e100 problems went away on hundreds of old linux-2.4 dual-e100 boxes 
when the attached e100-v2-irq-share.patch was applied. It applies 
cleanly to linux-2.4.27, but is untested there.
I'd seen no lkml mention of shared-irq e100 problems, so was letting it 
bake for a couple of weeks before
declaring success, porting it to recent 2.6 and publishing.

But Udo has a problem, so let's see if this fixes it...
My linux-2.6.9 patch is an untested transliteration to intel's e100-v3 
driver structure.
(v3 driver has one e100.c, v2 has e100/*.c tree, cutover was about 
2.6.4? 5?)

I'm not sure if this is implicated in the widespread mistrust of e100 
multiport cards under linux, the e100-eepro100 wars, or the recent 
e100-suspend problems. Could be the missing piece in all.


^..^ feedback very welcome
(oo)




[-- Attachment #2: e100-v2-irq-share.patch --]
[-- Type: text/plain, Size: 1709 bytes --]

## add a module param own_irq=1 to forbid interrupt sharing
## add a check in e100intr for device interrupt masking -- if the driver has masked irqs off,
##  don't execute the usual irq service, but simply report the clash

--- Linux/drivers/net/e100/e100_main.c	Thu Oct 21 17:25:24 2004
+++ linux/drivers/net/e100/e100_main.c	Fri Oct 22 11:45:27 2004
@@ -424,6 +424,9 @@ E100_PARAM(BundleMax, "Maximum number fo
 E100_PARAM(IFS, "Disable or enable the adaptive IFS algorithm");
 E100_PARAM(weight, "rx packets processed per poll");
 
+int own_irq = 0; /* every card gets *own* IRQ? */
+MODULE_PARM(own_irq, "i");
+
 /**
  * e100_exec_cmd - issue a comand
  * @bdp: atapter's private data struct
@@ -1153,7 +1156,7 @@ e100_open(struct net_device *dev)
 		netif_start_queue(dev);
 
 	e100_start_ru(bdp);
-	if ((rc = request_irq(dev->irq, &e100intr, SA_SHIRQ,
+	if ((rc = request_irq(dev->irq, &e100intr, own_irq ? 0 : SA_SHIRQ,
 			      dev->name, dev)) != 0) {
 		del_timer_sync(&bdp->watchdog_timer);
 		goto err_exit;
@@ -2062,11 +2065,20 @@ e100intr(int irq, void *dev_inst, struct
 	dev = dev_inst;
 	bdp = dev->priv;
 
-	intr_status = readw(&bdp->scb->scb_status);
 	/* If not my interrupt, just return */
-	if (!(intr_status & SCB_STATUS_ACK_MASK) || (intr_status == 0xffff)) {
+	if (readb(&bdp->scb->scb_cmd_hi) & SCB_INT_MASK) {
+		static int once = 0;
+
+		if (!once)
+			printk(KERN_ERR "e100intr ignoring disabled interrupt, suspect irq-sharing\n");
+		once = 1;
 		return IRQ_NONE;
 	}
+
+	/* If no pending action, just return */
+	intr_status = readw(&bdp->scb->scb_status);
+	if (!(intr_status & SCB_STATUS_ACK_MASK) || (intr_status == 0xffff))
+		return IRQ_NONE;
 
 
 #ifdef CONFIG_E100_NAPI

[-- Attachment #3: e100-linux-2.6.9-irq-sharing.patch --]
[-- Type: text/plain, Size: 807 bytes --]

--- drivers/net/e100.c-pre-swine	Mon Nov  8 12:19:08 2004
+++ drivers/net/e100.c	Mon Nov  8 12:25:36 2004
@@ -1580,11 +1580,18 @@ static irqreturn_t e100_intr(int irq, vo
 {
 	struct net_device *netdev = dev_id;
 	struct nic *nic = netdev_priv(netdev);
-	u8 stat_ack = readb(&nic->csr->scb.stat_ack);
+	u8 stat_ack, cmd_hi;
 
+	cmd_hi = readb(&nic->csr->scb.cmd_hi);
+	DPRINTK(INTR, DEBUG, "cmd_hi = 0x%02X\n", cmd_hi);
+
+	if(cmd_hi & irq_mask_all)	/* Not our interrupt */
+		return IRQ_NONE;
+
+	stat_ack = readb(&nic->csr->scb.stat_ack);
 	DPRINTK(INTR, DEBUG, "stat_ack = 0x%02X\n", stat_ack);
 
-	if(stat_ack == stat_ack_not_ours ||	/* Not our interrupt */
+	if(stat_ack == stat_ack_not_ours ||	/* nothing to do */
 	   stat_ack == stat_ack_not_present)	/* Hardware is ejected */
 		return IRQ_NONE;
 

  reply	other threads:[~2004-11-08 20:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-11-08 10:59 Spurious interrupts when SCI shared with e100 Udo A. Steinberg
2004-11-08 20:54 ` peter swain [this message]
2004-11-09 22:22   ` [patch] e100 and shared interrupts [was: Spurious interrupts when SCI shared with e100] J.A. Magallon
2004-11-09  5:42 ` Spurious interrupts when SCI shared with e100 Len Brown
2004-11-09 16:21   ` Phil Oester

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=418FDD0A.5010308@akamai.com \
    --to=pswain@akamai.com \
    --cc=andrews@gate.ort.spb.ru \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scott.feldman@intel.com \
    --cc=us15@os.inf.tu-dresden.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox