netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
       [not found]       ` <4A06D8D2.4010505@googlemail.com>
@ 2009-05-11  0:29         ` David Dillow
  2009-05-11 20:48           ` Michael Buesch
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-05-11  0:29 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

cc'ing netdev, where networking discussions have a much higher
probability of getting a developer's attention.

On Sun, 2009-05-10 at 15:38 +0200, Michael Riepe wrote:
> Michael Buesch wrote:
> 
> > I'm currently testing 2.6.29.1 without any additional patches but
> > with the pci=nomsi boot option.
> > 
> > I didn't notice any hickups, yet. I'm running a stresstest on a GBit link for quite
> > some time now. Earlier tests with older kernels and MSI burped earlier.
> > 
> > I will do more testing. If it turns out this is stable I will test the same kernel
> > with Message Signaled Interrupts to see if that causes some breakage.
> 
> I've had this problem up to and including 2.6.29.2. Currently, I'm
> trying 2.6.29.2 with pci=nomsi, and it's stable so far. With MSI
> enabled, a single high-speed TCP transfer will stop after a few seconds,
> but without MSI, I can run four simultaneous transfers to two different
> hosts without a single hickup.
> 
> It seems to me that this particular chip really doesn't like MSI.
> 
> Kernel: 2.6.29.2 (x86_64)
> Board: Intel D945GCLF2
> BIOS version: LF94510J.86A.0099.2008.0731.0303

I'm not sure this is tied to the chip. I've got a similar problem on my
X58 based system; my device is detected as an RTL8168d/8111d by the
r8169 driver, and will go out to lunch under high TX loads under any
kernel after 2.6.28. It seems to be perfectly solid in 2.6.27, but is
detected as a generic RTL8169, as the MAC is unknown to that version of
the driver.

It uses MSI in both cases, so the chip seems happy with MSI in at least
some instances.

I've spent a good part of the weekend bisecting between 2.6.27 and
2.6.28, and it does seem to be working its way into the genirq changes.
It is too early to sure, as I've had a number of kernels that locked up
during boot, so the bisect is a mess, and may not be pointing me in the
right direction. For example, it is currently pointing me at 5fef06...
"Merge branch 'linus' into genirq", which I need to figure out how to
verify.

If the problem is related to changes in the IRQ handling, it could be
that the driver is doing something incorrect WRT interrupts, but I don't
really expect that to be the case.

I'll continue to look at getting a more clean bisection to point us at
the root cause, perhaps keeping the version of the driver constant to
eliminate one variable.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-11  0:29         ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too David Dillow
@ 2009-05-11 20:48           ` Michael Buesch
  2009-05-11 21:10             ` Michael Buesch
  2009-05-12 11:10             ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
  0 siblings, 2 replies; 77+ messages in thread
From: Michael Buesch @ 2009-05-11 20:48 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Monday 11 May 2009 02:29:14 David Dillow wrote:
> > Kernel: 2.6.29.2 (x86_64)
> > Board: Intel D945GCLF2
> > BIOS version: LF94510J.86A.0099.2008.0731.0303
> 
> I'm not sure this is tied to the chip. 

Yeah maybe other chips are affected as well.
I just want to note that my failing chip/board is an Intel D945GCLF2 board
with onboard G-LAN, too.

The device XID is: XID 3c4000c0
(see kernel log)

Somebody else (I think it was Rui Santos) already reported a failing chip with that XID, too.

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-11 20:48           ` Michael Buesch
@ 2009-05-11 21:10             ` Michael Buesch
  2009-05-11 21:29               ` David Dillow
  2009-05-12 11:10             ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
  1 sibling, 1 reply; 77+ messages in thread
From: Michael Buesch @ 2009-05-11 21:10 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Monday 11 May 2009 22:48:44 Michael Buesch wrote:
> On Monday 11 May 2009 02:29:14 David Dillow wrote:
> > > Kernel: 2.6.29.2 (x86_64)
> > > Board: Intel D945GCLF2
> > > BIOS version: LF94510J.86A.0099.2008.0731.0303
> > 
> > I'm not sure this is tied to the chip. 
> 
> Yeah maybe other chips are affected as well.
> I just want to note that my failing chip/board is an Intel D945GCLF2 board
> with onboard G-LAN, too.
> 
> The device XID is: XID 3c4000c0
> (see kernel log)
> 
> Somebody else (I think it was Rui Santos) already reported a failing chip with that XID, too.
> 

Here's a patch which blacklists MSI support by XID.
It's currently only compile-tested, because I can't reboot the machine right now.

Signed-off-by: Michael Buesch <mb@bu3sch.de>

Index: linux-2.6.29/drivers/net/r8169.c
===================================================================
--- linux-2.6.29.orig/drivers/net/r8169.c	2009-05-11 23:00:12.000000000 +0200
+++ linux-2.6.29/drivers/net/r8169.c	2009-05-11 23:07:32.000000000 +0200
@@ -1917,14 +1917,26 @@
 
 /* Cfg9346_Unlock assumed. */
 static unsigned rtl_try_msi(struct pci_dev *pdev, void __iomem *ioaddr,
-			    const struct rtl_cfg_info *cfg)
+			    const struct rtl_cfg_info *cfg, u32 xid)
 {
-	unsigned msi = 0;
+	unsigned i, blacklisted = 0, msi = 0;
 	u8 cfg2;
 
+	static const u32 xid_blacklist[] = {
+		/* Blacklisted devices with (currently) broken MSI support.
+		 * Devices are identified by XID. */
+		0x3c4000c0,
+	};
+
 	cfg2 = RTL_R8(Config2) & ~MSIEnable;
+	for (i = 0; i < ARRAY_SIZE(xid_blacklist); i++) {
+		if (xid == xid_blacklist[i])
+			blacklisted = 1;
+	}
 	if (cfg->features & RTL_FEATURE_MSI) {
-		if (pci_enable_msi(pdev)) {
+		if (blacklisted) {
+			dev_info(&pdev->dev, "MSI blacklisted. Back to INTx.\n");
+		} else if (pci_enable_msi(pdev)) {
 			dev_info(&pdev->dev, "no MSI. Back to INTx.\n");
 		} else {
 			cfg2 |= MSIEnable;
@@ -1974,6 +1986,7 @@
 	void __iomem *ioaddr;
 	unsigned int i;
 	int rc;
+	u32 xid;
 
 	if (netif_msg_drv(&debug)) {
 		printk(KERN_INFO "%s Gigabit Ethernet driver %s loaded\n",
@@ -2109,6 +2122,7 @@
 	}
 	tp->chipset = i;
 
+	xid = RTL_R32(TxConfig) & 0x7cf0f8ff;
 	RTL_W8(Cfg9346, Cfg9346_Unlock);
 	RTL_W8(Config1, RTL_R8(Config1) | PMEnable);
 	RTL_W8(Config5, RTL_R8(Config5) & PMEStatus);
@@ -2116,7 +2130,7 @@
 		tp->features |= RTL_FEATURE_WOL;
 	if ((RTL_R8(Config5) & (UWF | BWF | MWF)) != 0)
 		tp->features |= RTL_FEATURE_WOL;
-	tp->features |= rtl_try_msi(pdev, ioaddr, cfg);
+	tp->features |= rtl_try_msi(pdev, ioaddr, cfg, xid);
 	RTL_W8(Cfg9346, Cfg9346_Lock);
 
 	if ((tp->mac_version <= RTL_GIGA_MAC_VER_06) &&
@@ -2175,8 +2189,6 @@
 	pci_set_drvdata(pdev, dev);
 
 	if (netif_msg_probe(tp)) {
-		u32 xid = RTL_R32(TxConfig) & 0x7cf0f8ff;
-
 		printk(KERN_INFO "%s: %s at 0x%lx, "
 		       "%2.2x:%2.2x:%2.2x:%2.2x:%2.2x:%2.2x, "
 		       "XID %08x IRQ %d\n",


-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-11 21:10             ` Michael Buesch
@ 2009-05-11 21:29               ` David Dillow
  2009-05-11 21:59                 ` Michael Buesch
  2009-05-12 20:29                 ` Michael Riepe
  0 siblings, 2 replies; 77+ messages in thread
From: David Dillow @ 2009-05-11 21:29 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Michael Riepe, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Mon, 2009-05-11 at 23:10 +0200, Michael Buesch wrote:
> On Monday 11 May 2009 22:48:44 Michael Buesch wrote:
> > On Monday 11 May 2009 02:29:14 David Dillow wrote:
> > > > Kernel: 2.6.29.2 (x86_64)
> > > > Board: Intel D945GCLF2
> > > > BIOS version: LF94510J.86A.0099.2008.0731.0303
> > > 
> > > I'm not sure this is tied to the chip. 
> > 
> > Yeah maybe other chips are affected as well.
> > I just want to note that my failing chip/board is an Intel D945GCLF2 board
> > with onboard G-LAN, too.
> > 
> > The device XID is: XID 3c4000c0
> > (see kernel log)
> > 
> > Somebody else (I think it was Rui Santos) already reported a failing chip with that XID, too.
> > 
> 
> Here's a patch which blacklists MSI support by XID.
> It's currently only compile-tested, because I can't reboot the machine right now.

I was saying that I don't think the timeouts are necessarily the NIC
chipset -- or the bridge chip for that matter --  having issues with
MSI. There were some substantial IRQ handling changes in 2.6.28 and my
bisection of the problem seem to lead into that code. I'll try this
later tonight hopefully, but can you try to run 2.6.27 with the current
r8169 driver and see if it is solid for you? That way it is using the
same driver code, but avoids the IRQ changes.

Of course, my XID is 281000c0, so it looks like I have a different NIC
chip as well, so we may be chasing different problems. Still, it'd be
nice to rule out the IRQ changes before adding a blacklist.

Dave

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-11 21:29               ` David Dillow
@ 2009-05-11 21:59                 ` Michael Buesch
  2009-05-12 20:29                 ` Michael Riepe
  1 sibling, 0 replies; 77+ messages in thread
From: Michael Buesch @ 2009-05-11 21:59 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Monday 11 May 2009 23:29:52 David Dillow wrote:
> On Mon, 2009-05-11 at 23:10 +0200, Michael Buesch wrote:
> > On Monday 11 May 2009 22:48:44 Michael Buesch wrote:
> > > On Monday 11 May 2009 02:29:14 David Dillow wrote:
> > > > > Kernel: 2.6.29.2 (x86_64)
> > > > > Board: Intel D945GCLF2
> > > > > BIOS version: LF94510J.86A.0099.2008.0731.0303
> > > > 
> > > > I'm not sure this is tied to the chip. 
> > > 
> > > Yeah maybe other chips are affected as well.
> > > I just want to note that my failing chip/board is an Intel D945GCLF2 board
> > > with onboard G-LAN, too.
> > > 
> > > The device XID is: XID 3c4000c0
> > > (see kernel log)
> > > 
> > > Somebody else (I think it was Rui Santos) already reported a failing chip with that XID, too.
> > > 
> > 
> > Here's a patch which blacklists MSI support by XID.
> > It's currently only compile-tested, because I can't reboot the machine right now.
> 
> I was saying that I don't think the timeouts are necessarily the NIC
> chipset -- or the bridge chip for that matter --  having issues with
> MSI. There were some substantial IRQ handling changes in 2.6.28 and my
> bisection of the problem seem to lead into that code. I'll try this
> later tonight hopefully, but can you try to run 2.6.27 with the current
> r8169 driver and see if it is solid for you? That way it is using the
> same driver code, but avoids the IRQ changes.
> 
> Of course, my XID is 281000c0, so it looks like I have a different NIC
> chip as well, so we may be chasing different problems. Still, it'd be
> nice to rule out the IRQ changes before adding a blacklist.

Ok, that perfectly makes sense.
I can try 2.6.29's driver on 2.6.27, but I can't tell when, because I currently
only have infrequent physical access to the machine.

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-11 20:48           ` Michael Buesch
  2009-05-11 21:10             ` Michael Buesch
@ 2009-05-12 11:10             ` Krzysztof Halasa
  2009-05-12 21:45               ` Michael Riepe
  1 sibling, 1 reply; 77+ messages in thread
From: Krzysztof Halasa @ 2009-05-12 11:10 UTC (permalink / raw)
  To: Michael Buesch
  Cc: David Dillow, Michael Riepe, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

Michael Buesch <mb@bu3sch.de> writes:

> Yeah maybe other chips are affected as well.
> I just want to note that my failing chip/board is an Intel D945GCLF2 board
> with onboard G-LAN, too.
>
> The device XID is: XID 3c4000c0

I have an on-board RTL8111C (at least the docs say so) with this same
XID and it's working correctly.

$ uname -rm
2.6.29.1 x86_64

$ grep eth0 /proc/interrupts
 30:      49224      49283   PCI-MSI-edge      eth0

The board is MSI (nomen omen) P45 neo2.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-11 21:29               ` David Dillow
  2009-05-11 21:59                 ` Michael Buesch
@ 2009-05-12 20:29                 ` Michael Riepe
  2009-05-14  2:38                   ` David Dillow
  1 sibling, 1 reply; 77+ messages in thread
From: Michael Riepe @ 2009-05-12 20:29 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

Hi!

David Dillow wrote:

> I was saying that I don't think the timeouts are necessarily the NIC
> chipset -- or the bridge chip for that matter --  having issues with
> MSI. There were some substantial IRQ handling changes in 2.6.28 and my
> bisection of the problem seem to lead into that code. I'll try this
> later tonight hopefully, but can you try to run 2.6.27 with the current
> r8169 driver and see if it is solid for you? That way it is using the
> same driver code, but avoids the IRQ changes.

Unfortunately, 2.6.27 won't build with r8169.c copied from 2.6.29.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-12 11:10             ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
@ 2009-05-12 21:45               ` Michael Riepe
  2009-05-13  6:11                 ` Francois Romieu
  2009-05-13 19:34                 ` Krzysztof Halasa
  0 siblings, 2 replies; 77+ messages in thread
From: Michael Riepe @ 2009-05-12 21:45 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Michael Buesch, David Dillow, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev



Krzysztof Halasa wrote:
> Michael Buesch <mb@bu3sch.de> writes:
> 
> 
>>Yeah maybe other chips are affected as well.
>>I just want to note that my failing chip/board is an Intel D945GCLF2 board
>>with onboard G-LAN, too.
>>
>>The device XID is: XID 3c4000c0
> 
> 
> I have an on-board RTL8111C (at least the docs say so) with this same
> XID and it's working correctly.

How do lspci and dmesg report the chip? For mine, lspci says it's a
"RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)" while
the kernel reports "RTL8168c/8111c at 0xffffc200000f6000,
00:1c:c0:b6:1a:4a, XID 3c4000c0".

This reminds me of the early ATA/USB bridges from Genesys Logic... many
revisions, and each one broken in a different way. :-(

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-12 21:45               ` Michael Riepe
@ 2009-05-13  6:11                 ` Francois Romieu
  2009-05-13  6:27                   ` Michael Riepe
  2009-05-13 19:34                 ` Krzysztof Halasa
  1 sibling, 1 reply; 77+ messages in thread
From: Francois Romieu @ 2009-05-13  6:11 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Krzysztof Halasa, Michael Buesch, David Dillow, Rui Santos,
	Michael =?unknown-8bit?B?QsO8a2Vy?=, linux-kernel, netdev

Michael Riepe <michael.riepe@googlemail.com> :
[...]
> How do lspci and dmesg report the chip? For mine, lspci says it's a
> "RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)" while
> the kernel reports "RTL8168c/8111c at 0xffffc200000f6000,
> 00:1c:c0:b6:1a:4a, XID 3c4000c0".

The XID is more specific than the first kernel report. lspci is less
accurate.

> This reminds me of the early ATA/USB bridges from Genesys Logic... many
> revisions, and each one broken in a different way. :-(

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-13  6:11                 ` Francois Romieu
@ 2009-05-13  6:27                   ` Michael Riepe
  0 siblings, 0 replies; 77+ messages in thread
From: Michael Riepe @ 2009-05-13  6:27 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Krzysztof Halasa, Michael Buesch, David Dillow, Rui Santos,
	Michael Büker, linux-kernel, netdev

Hi!

Francois Romieu wrote:
> Michael Riepe <michael.riepe@googlemail.com> :
> [...]
> 
>>How do lspci and dmesg report the chip? For mine, lspci says it's a
>>"RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)" while
>>the kernel reports "RTL8168c/8111c at 0xffffc200000f6000,
>>00:1c:c0:b6:1a:4a, XID 3c4000c0".
> 
> 
> The XID is more specific than the first kernel report. lspci is less
> accurate.

Then the question is: Are there two different revisions with the same
XID, or is it always the same chip that sometimes works and sometimes
doesn't?

I can't remember anybody reporting that the Intel D945GCLF2 works for
them, so it might also be an interoperability issue.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-12 21:45               ` Michael Riepe
  2009-05-13  6:11                 ` Francois Romieu
@ 2009-05-13 19:34                 ` Krzysztof Halasa
  1 sibling, 0 replies; 77+ messages in thread
From: Krzysztof Halasa @ 2009-05-13 19:34 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, David Dillow, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

Michael Riepe <michael.riepe@googlemail.com> writes:

> How do lspci and dmesg report the chip? For mine, lspci says it's a
> "RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)" while
> the kernel reports "RTL8168c/8111c at 0xffffc200000f6000,
> 00:1c:c0:b6:1a:4a, XID 3c4000c0".

Sure, that's the same chip I think.

> This reminds me of the early ATA/USB bridges from Genesys Logic... many
> revisions, and each one broken in a different way. :-(

This seems like the same revision. I'm not very convinced it's the RTL
chip which is buggy.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-12 20:29                 ` Michael Riepe
@ 2009-05-14  2:38                   ` David Dillow
  2009-05-14 18:37                     ` Michael Riepe
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-05-14  2:38 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
> Hi!
> 
> David Dillow wrote:
> 
> > I was saying that I don't think the timeouts are necessarily the NIC
> > chipset -- or the bridge chip for that matter --  having issues with
> > MSI. There were some substantial IRQ handling changes in 2.6.28 and my
> > bisection of the problem seem to lead into that code. I'll try this
> > later tonight hopefully, but can you try to run 2.6.27 with the current
> > r8169 driver and see if it is solid for you? That way it is using the
> > same driver code, but avoids the IRQ changes.
> 
> Unfortunately, 2.6.27 won't build with r8169.c copied from 2.6.29.

You are correct, and I should have thought about that. The following
patch reverts the following commits:

288379 net: Remove redundant NAPI functions
908a7a net: Remove unused netdev arg from some NAPI interfaces.
008298 netdev: add more functions to netdevice ops
8b4ab2 r8169: convert to net_device_ops
babcda drivers/net: Kill now superfluous ->last_rx stores.

The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
than 90 seconds on 2.6.28+.

To use the patch, just copy r8169.c from HEAD and use patch -p1.


diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 0b6e8c8..2d751bf 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -490,7 +490,6 @@ struct rtl8169_private {
 	void (*hw_start)(struct net_device *);
 	unsigned int (*phy_reset_pending)(void __iomem *);
 	unsigned int (*link_ok)(void __iomem *);
-	int (*do_ioctl)(struct rtl8169_private *tp, struct mii_ioctl_data *data, int cmd);
 	int pcie_cap;
 	struct delayed_work task;
 	unsigned features;
@@ -1850,11 +1849,9 @@ static int rtl8169_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	struct rtl8169_private *tp = netdev_priv(dev);
 	struct mii_ioctl_data *data = if_mii(ifr);
 
-	return netif_running(dev) ? tp->do_ioctl(tp, data, cmd) : -ENODEV;
-}
+	if (!netif_running(dev))
+		return -ENODEV;
 
-static int rtl_xmii_ioctl(struct rtl8169_private *tp, struct mii_ioctl_data *data, int cmd)
-{
 	switch (cmd) {
 	case SIOCGMIIPHY:
 		data->phy_id = 32; /* Internal PHY */
@@ -1873,11 +1870,6 @@ static int rtl_xmii_ioctl(struct rtl8169_private *tp, struct mii_ioctl_data *dat
 	return -EOPNOTSUPP;
 }
 
-static int rtl_tbi_ioctl(struct rtl8169_private *tp, struct mii_ioctl_data *data, int cmd)
-{
-	return -EOPNOTSUPP;
-}
-
 static const struct rtl_cfg_info {
 	void (*hw_start)(struct net_device *);
 	unsigned int region;
@@ -1943,26 +1935,6 @@ static void rtl_disable_msi(struct pci_dev *pdev, struct rtl8169_private *tp)
 	}
 }
 
-static const struct net_device_ops rtl8169_netdev_ops = {
-	.ndo_open		= rtl8169_open,
-	.ndo_stop		= rtl8169_close,
-	.ndo_get_stats		= rtl8169_get_stats,
-	.ndo_start_xmit		= rtl8169_start_xmit,
-	.ndo_tx_timeout		= rtl8169_tx_timeout,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_change_mtu		= rtl8169_change_mtu,
-	.ndo_set_mac_address	= rtl_set_mac_address,
-	.ndo_do_ioctl		= rtl8169_ioctl,
-	.ndo_set_multicast_list	= rtl_set_rx_mode,
-#ifdef CONFIG_R8169_VLAN
-	.ndo_vlan_rx_register	= rtl8169_vlan_rx_register,
-#endif
-#ifdef CONFIG_NET_POLL_CONTROLLER
-	.ndo_poll_controller	= rtl8169_netpoll,
-#endif
-
-};
-
 static int __devinit
 rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
@@ -1989,7 +1961,6 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 	}
 
 	SET_NETDEV_DEV(dev, &pdev->dev);
-	dev->netdev_ops = &rtl8169_netdev_ops;
 	tp = netdev_priv(dev);
 	tp->dev = dev;
 	tp->pci_dev = pdev;
@@ -2126,7 +2097,6 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		tp->phy_reset_enable = rtl8169_tbi_reset_enable;
 		tp->phy_reset_pending = rtl8169_tbi_reset_pending;
 		tp->link_ok = rtl8169_tbi_link_ok;
-		tp->do_ioctl = rtl_tbi_ioctl;
 
 		tp->phy_1000_ctrl_reg = ADVERTISE_1000FULL; /* Implied by TBI */
 	} else {
@@ -2135,7 +2105,8 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		tp->phy_reset_enable = rtl8169_xmii_reset_enable;
 		tp->phy_reset_pending = rtl8169_xmii_reset_pending;
 		tp->link_ok = rtl8169_xmii_link_ok;
-		tp->do_ioctl = rtl_xmii_ioctl;
+
+		dev->do_ioctl = rtl8169_ioctl;
 	}
 
 	spin_lock_init(&tp->lock);
@@ -2147,15 +2118,28 @@ rtl8169_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 		dev->dev_addr[i] = RTL_R8(MAC0 + i);
 	memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
 
+	dev->open = rtl8169_open;
+	dev->hard_start_xmit = rtl8169_start_xmit;
+	dev->get_stats = rtl8169_get_stats;
 	SET_ETHTOOL_OPS(dev, &rtl8169_ethtool_ops);
+	dev->stop = rtl8169_close;
+	dev->tx_timeout = rtl8169_tx_timeout;
+	dev->set_multicast_list = rtl_set_rx_mode;
 	dev->watchdog_timeo = RTL8169_TX_TIMEOUT;
 	dev->irq = pdev->irq;
 	dev->base_addr = (unsigned long) ioaddr;
+	dev->change_mtu = rtl8169_change_mtu;
+	dev->set_mac_address = rtl_set_mac_address;
 
 	netif_napi_add(dev, &tp->napi, rtl8169_poll, R8169_NAPI_WEIGHT);
 
 #ifdef CONFIG_R8169_VLAN
 	dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
+	dev->vlan_rx_register = rtl8169_vlan_rx_register;
+#endif
+
+#ifdef CONFIG_NET_POLL_CONTROLLER
+	dev->poll_controller = rtl8169_netpoll;
 #endif
 
 	tp->intr_mask = 0xffff;
@@ -3513,6 +3497,7 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 			if (rtl8169_rx_vlan_skb(tp, desc, skb) < 0)
 				netif_receive_skb(skb);
 
+			dev->last_rx = jiffies;
 			dev->stats.rx_bytes += pkt_size;
 			dev->stats.rx_packets++;
 		}
@@ -3594,8 +3579,8 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
 		tp->intr_mask = ~tp->napi_event;
 
-		if (likely(napi_schedule_prep(&tp->napi)))
-			__napi_schedule(&tp->napi);
+		if (likely(netif_rx_schedule_prep(dev, &tp->napi)))
+			__netif_rx_schedule(dev, &tp->napi);
 		else if (netif_msg_intr(tp)) {
 			printk(KERN_INFO "%s: interrupt %04x in poll\n",
 			       dev->name, status);
@@ -3616,7 +3601,7 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 	rtl8169_tx_interrupt(dev, tp, ioaddr);
 
 	if (work_done < budget) {
-		napi_complete(napi);
+		netif_rx_complete(dev, napi);
 		tp->intr_mask = 0xffff;
 		/*
 		 * 20040426: the barrier is not strictly required but the



^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-14  2:38                   ` David Dillow
@ 2009-05-14 18:37                     ` Michael Riepe
  2009-05-14 19:14                       ` David Dillow
  0 siblings, 1 reply; 77+ messages in thread
From: Michael Riepe @ 2009-05-14 18:37 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev



David Dillow wrote:
> On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
> 
>>Hi!
>>
>>David Dillow wrote:
>>
>>
>>>I was saying that I don't think the timeouts are necessarily the NIC
>>>chipset -- or the bridge chip for that matter --  having issues with
>>>MSI. There were some substantial IRQ handling changes in 2.6.28 and my
>>>bisection of the problem seem to lead into that code. I'll try this
>>>later tonight hopefully, but can you try to run 2.6.27 with the current
>>>r8169 driver and see if it is solid for you? That way it is using the
>>>same driver code, but avoids the IRQ changes.
>>
>>Unfortunately, 2.6.27 won't build with r8169.c copied from 2.6.29.
> 
> 
> You are correct, and I should have thought about that. The following
> patch reverts the following commits:
> 
> 288379 net: Remove redundant NAPI functions
> 908a7a net: Remove unused netdev arg from some NAPI interfaces.
> 008298 netdev: add more functions to netdevice ops
> 8b4ab2 r8169: convert to net_device_ops
> babcda drivers/net: Kill now superfluous ->last_rx stores.
> 
> The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
> if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
> than 90 seconds on 2.6.28+.

Not on my system:

WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x258/0x270()
NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Modules linked in: nfsd lockd nfs_acl sunrpc exportfs autofs4 deflate
zlib_deflate ctr twofish twofish_common camellia serpent blowfish
des_generic cbc aes_x86_64 aes_generic xcbc rmd160 sha256_generic
sha1_generic crypto_null crypto_blkcipher af_key ipt_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables xt_tcpudp xt_conntrack
iptable_filter ip_tables x_tables sg nf_conntrack_irc nf_conntrack_ftp
nf_conntrack_ipv4 nf_conntrack rfcomm l2cap bluetooth dm_mod tun eeprom
smsc47m192 hwmon_vid smsc47m1 hwmon cpufreq_stats cpufreq_powersave
video backlight output fan container battery ac usbhid usb_storage
i2c_dev hid evdev intelfb fb i2c_algo_bit ff_memless parport_pc
cfbcopyarea r8169 snd_hda_intel i2c_i801 thermal cfbimgblt serio_raw
ehci_hcd mii button iTCO_wdt snd_pcm processor parport i2c_core
cfbfillrect intel_agp snd_timer snd_page_alloc snd_hwdep snd uhci_hcd
soundcore
Pid: 0, comm: swapper Not tainted 2.6.27-ai-x64-r8169 #1

Call Trace:
 <IRQ>  [<ffffffff802498f7>] warn_slowpath+0xb7/0xf0
 [<ffffffff804b98e0>] ? ip_output+0x90/0xf0
 [<ffffffff804b858f>] ? __ip_local_out+0x9f/0xb0
 [<ffffffff804b85c0>] ? ip_local_out+0x20/0x30
 [<ffffffff804b8e9c>] ? ip_queue_xmit+0x21c/0x3f0
 [<ffffffff80488ddc>] ? pskb_copy+0x1c/0x1a0
 [<ffffffff8048884e>] ? __alloc_skb+0x6e/0x150
 [<ffffffff80512972>] ? fib6_clean_node+0x42/0xc0
 [<ffffffff8054ad04>] ? _write_unlock_bh+0x24/0x30
 [<ffffffff8054aa0f>] ? _spin_lock_irqsave+0x3f/0x50
 [<ffffffff8037baca>] ? strlcpy+0x4a/0x60
 [<ffffffff8049fe78>] dev_watchdog+0x258/0x270
 [<ffffffff80512360>] ? fib6_gc_timer_cb+0x0/0x10
 [<ffffffff8054ad63>] ? _spin_unlock_bh+0x23/0x30
 [<ffffffff8049fc20>] ? dev_watchdog+0x0/0x270
 [<ffffffff80254650>] run_timer_softirq+0x170/0x250
 [<ffffffff8026adff>] ? clockevents_program_event+0x4f/0x90
 [<ffffffff8024fcc4>] __do_softirq+0x84/0x100
 [<ffffffff80213fdc>] call_softirq+0x1c/0x30
 [<ffffffff802161ad>] do_softirq+0x5d/0xa0
 [<ffffffff8024f92d>] irq_exit+0x9d/0xb0
 [<ffffffff80224b84>] smp_apic_timer_interrupt+0x84/0xc0
 [<ffffffff802138b3>] apic_timer_interrupt+0x83/0x90
 <EOI>  [<ffffffff8021b250>] ? mwait_idle+0x40/0x60
 [<ffffffff80211662>] ? enter_idle+0x22/0x30
 [<ffffffff802116dd>] ? cpu_idle+0x6d/0x120
 [<ffffffff80538038>] ? rest_init+0x88/0x90

This happened less than half a minute after the transfer had started.
And it's going to happen earlier if I increase the load. With four
connections to two other hosts, the transmission usually pauses after
less than ten seconds. Sometimes it lasts for only two or three seconds.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-14 18:37                     ` Michael Riepe
@ 2009-05-14 19:14                       ` David Dillow
  2009-05-14 19:42                         ` Michael Riepe
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-05-14 19:14 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Thu, 2009-05-14 at 20:37 +0200, Michael Riepe wrote:
> 
> David Dillow wrote:
> > On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
> > The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
> > if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
> > than 90 seconds on 2.6.28+.
> 
> Not on my system:

> This happened less than half a minute after the transfer had started.
> And it's going to happen earlier if I increase the load. With four
> connections to two other hosts, the transmission usually pauses after
> less than ten seconds. Sometimes it lasts for only two or three seconds.

Bummer, but a good data point; thanks for testing.

I added some code to print the irq status when it hangs, and it shows
0x0085, which is RxOK | TxOK | TxDescUnavail, which makes me think we've
lost an MSI-edge interrupt somehow. You being able to reproduce it on
2.6.27 where I cannot leads me to think that the bisection down into the
genirq tree just changed the timing and made it easier to hit after it
was merged.

So, I suppose a good review of the IRQ handling of r8169.c is in order,
though my SATA disks (AHCI w/ MSI irqs) also seem to have similar issues
with delays, though that is entirely unqualified and unmeasured.

Dave

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too
  2009-05-14 19:14                       ` David Dillow
@ 2009-05-14 19:42                         ` Michael Riepe
  2009-05-23  1:29                           ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
  0 siblings, 1 reply; 77+ messages in thread
From: Michael Riepe @ 2009-05-14 19:42 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev



David Dillow wrote:
> On Thu, 2009-05-14 at 20:37 +0200, Michael Riepe wrote:
> 
>>David Dillow wrote:
>>
>>>On Tue, 2009-05-12 at 22:29 +0200, Michael Riepe wrote:
>>>The patched driver runs on 2.6.27 and survives my 5 minutes 'dd
>>>if=/dev/zero bs=1024k | nc target 9000' test which usually dies in less
>>>than 90 seconds on 2.6.28+.
>>
>>Not on my system:
> 
> 
>>This happened less than half a minute after the transfer had started.
>>And it's going to happen earlier if I increase the load. With four
>>connections to two other hosts, the transmission usually pauses after
>>less than ten seconds. Sometimes it lasts for only two or three seconds.
> 
> 
> Bummer, but a good data point; thanks for testing.
> 
> I added some code to print the irq status when it hangs, and it shows
> 0x0085, which is RxOK | TxOK | TxDescUnavail, which makes me think we've
> lost an MSI-edge interrupt somehow. You being able to reproduce it on
> 2.6.27 where I cannot leads me to think that the bisection down into the
> genirq tree just changed the timing and made it easier to hit after it
> was merged.

Maybe. With a single connection, 2.6.27 with the 2.6.29 driver seemed to
be a little more stable (i.e. the transfers lasted a little longer under
low and medium loads) than 2.6.29, but that's nothing I could actually
quantify.

> So, I suppose a good review of the IRQ handling of r8169.c is in order,
> though my SATA disks (AHCI w/ MSI irqs) also seem to have similar issues
> with delays, though that is entirely unqualified and unmeasured.

Hey, MSI isn't bad in general. The e1000e driver on my Lenovo T60 uses
it as well, and it's as reliable as a rock.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-14 19:42                         ` Michael Riepe
@ 2009-05-23  1:29                           ` David Dillow
  2009-05-23  9:24                             ` Michael Buesch
                                               ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: David Dillow @ 2009-05-23  1:29 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

The 8169 chip only generates MSI interrupts when all enabled event
sources are quiescent and one or more sources transition to active. If
not all of the active events are acknowledged, or a new event becomes
active while the existing ones are cleared in the handler, we will not
see a new interrupt.

The current interrupt handler masks off the Rx and Tx events once the
NAPI handler has been scheduled, which opens a race window in which we
can get another Rx or Tx event and never ACK'ing it, stopping all
activity until the link is reset (ifconfig down/up). Fix this by always
ACK'ing all event sources, and loop in the handler until we have all
sources quiescent.

Signed-off-by: David Dillow <dave@thedillows.org>
---
This fixes the lockups I've seen. Both MSI and level-triggered interrupt
configurations survive over an hour of testing when it would lockup in
under 90 seconds before. I am certain of the analysis of the root cause,
but there may be better ways to fix it. There may also be a theoretical
race window between the ending of a NAPI poll cycle and a link change
interrupt coming in, but I'm not sure it would matter. 

Some variant of this should also be applied to the currently running
stable trees, as the problem is long-standing.

 r8169.c |  102 +++++++++++++++++++++++++++++++-------------------------
 1 file changed, 57 insertions(+), 45 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 0b6e8c8..bdc8d5a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3554,54 +3554,64 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	int handled = 0;
 	int status;
 
+	/* loop handling interrupts until we have no new ones or
+	 * we hit a invalid/hotplug case.
+	 */
 	status = RTL_R16(IntrStatus);
+	while (status && status != 0xffff) {
+		handled = 1;
 
-	/* hotplug/major error/no more work/shared irq */
-	if ((status == 0xffff) || !status)
-		goto out;
-
-	handled = 1;
+		/* Handle all of the error cases first. These will reset
+		 * the chip, so just exit the loop.
+		 */
+		if (unlikely(!netif_running(dev))) {
+			rtl8169_asic_down(ioaddr);
+			break;
+		}
 
-	if (unlikely(!netif_running(dev))) {
-		rtl8169_asic_down(ioaddr);
-		goto out;
-	}
+		/* Work around for rx fifo overflow */
+		if (unlikely(status & RxFIFOOver) &&
+	    	(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+			netif_stop_queue(dev);
+			rtl8169_tx_timeout(dev);
+			break;
+		}
 
-	status &= tp->intr_mask;
-	RTL_W16(IntrStatus,
-		(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		if (unlikely(status & SYSErr)) {
+			rtl8169_pcierr_interrupt(dev);
+			break;
+		}
 
-	if (!(status & tp->intr_event))
-		goto out;
+		if (status & LinkChg)
+			rtl8169_check_link_status(dev, tp, ioaddr);
 
-	/* Work around for rx fifo overflow */
-	if (unlikely(status & RxFIFOOver) &&
-	    (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
-		netif_stop_queue(dev);
-		rtl8169_tx_timeout(dev);
-		goto out;
-	}
+		/* We need to see the lastest version of tp->intr_mask to
+		 * avoid ignoring an MSI interrupt and having to wait for
+		 * another event which may never come.
+		 */
+		smp_rmb();
+		if (status & tp->intr_mask & tp->napi_event) {
+			RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
+			tp->intr_mask = ~tp->napi_event;
+
+			if (likely(napi_schedule_prep(&tp->napi)))
+				__napi_schedule(&tp->napi);
+			else if (netif_msg_intr(tp)) {
+				printk(KERN_INFO "%s: interrupt %04x in poll\n",
+			       	dev->name, status);
+			}
+		}
 
-	if (unlikely(status & SYSErr)) {
-		rtl8169_pcierr_interrupt(dev);
-		goto out;
+		/* We only get a new MSI interrupt when all active irq
+		 * sources on the chip have been acknowledged. So, ack
+		 * everything we've seen and check if new sources have become
+		 * active to avoid blocking all interrupts from the chip.
+		 */
+		RTL_W16(IntrStatus,
+			(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		status = RTL_R16(IntrStatus);
 	}
 
-	if (status & LinkChg)
-		rtl8169_check_link_status(dev, tp, ioaddr);
-
-	if (status & tp->napi_event) {
-		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
-		tp->intr_mask = ~tp->napi_event;
-
-		if (likely(napi_schedule_prep(&tp->napi)))
-			__napi_schedule(&tp->napi);
-		else if (netif_msg_intr(tp)) {
-			printk(KERN_INFO "%s: interrupt %04x in poll\n",
-			       dev->name, status);
-		}
-	}
-out:
 	return IRQ_RETVAL(handled);
 }
 
@@ -3617,13 +3627,15 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 
 	if (work_done < budget) {
 		napi_complete(napi);
-		tp->intr_mask = 0xffff;
-		/*
-		 * 20040426: the barrier is not strictly required but the
-		 * behavior of the irq handler could be less predictable
-		 * without it. Btw, the lack of flush for the posted pci
-		 * write is safe - FR
+
+		/* We need for force the visibility of tp->intr_mask
+		 * for other CPUs, as we can loose an MSI interrupt
+		 * and potentially wait for a retransmit timeout if we don't.
+		 * The posted write to IntrMask is safe, as it will
+		 * eventually make it to the chip and we won't loose anything
+		 * until it does.
 		 */
+		tp->intr_mask = 0xffff;
 		smp_wmb();
 		RTL_W16(IntrMask, tp->intr_event);
 	}




^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23  1:29                           ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
@ 2009-05-23  9:24                             ` Michael Buesch
  2009-05-23 14:35                               ` Michael Riepe
  2009-05-24 21:15                             ` Francois Romieu
                                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 77+ messages in thread
From: Michael Buesch @ 2009-05-23  9:24 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Saturday 23 May 2009 03:29:34 David Dillow wrote:
> The 8169 chip only generates MSI interrupts when all enabled event
> sources are quiescent and one or more sources transition to active. If
> not all of the active events are acknowledged, or a new event becomes
> active while the existing ones are cleared in the handler, we will not
> see a new interrupt.
> 
> The current interrupt handler masks off the Rx and Tx events once the
> NAPI handler has been scheduled, which opens a race window in which we
> can get another Rx or Tx event and never ACK'ing it, stopping all
> activity until the link is reset (ifconfig down/up). Fix this by always
> ACK'ing all event sources, and loop in the handler until we have all
> sources quiescent.
> 
> Signed-off-by: David Dillow <dave@thedillows.org>

Thanks a lot, Dave! This fixes the issue on my chip.
You can add my:

Tested-by: Michael Buesch <mb@bu3sch.de>



Here's a patch that cleanly applies to 2.6.29.4:

Index: linux-2.6.29/drivers/net/r8169.c
===================================================================
--- linux-2.6.29.orig/drivers/net/r8169.c	2009-05-23 11:06:22.000000000 +0200
+++ linux-2.6.29/drivers/net/r8169.c	2009-05-23 11:08:17.000000000 +0200
@@ -3554,54 +3554,64 @@
 	int handled = 0;
 	int status;
 
+	/* loop handling interrupts until we have no new ones or
+	 * we hit a invalid/hotplug case.
+	 */
 	status = RTL_R16(IntrStatus);
+	while (status && status != 0xffff) {
+		handled = 1;
 
-	/* hotplug/major error/no more work/shared irq */
-	if ((status == 0xffff) || !status)
-		goto out;
-
-	handled = 1;
+		/* Handle all of the error cases first. These will reset
+		 * the chip, so just exit the loop.
+		 */
+		if (unlikely(!netif_running(dev))) {
+			rtl8169_asic_down(ioaddr);
+			break;
+		}
 
-	if (unlikely(!netif_running(dev))) {
-		rtl8169_asic_down(ioaddr);
-		goto out;
-	}
+		/* Work around for rx fifo overflow */
+		if (unlikely(status & RxFIFOOver) &&
+	    	(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+			netif_stop_queue(dev);
+			rtl8169_tx_timeout(dev);
+			break;
+		}
 
-	status &= tp->intr_mask;
-	RTL_W16(IntrStatus,
-		(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		if (unlikely(status & SYSErr)) {
+			rtl8169_pcierr_interrupt(dev);
+			break;
+		}
 
-	if (!(status & tp->intr_event))
-		goto out;
+		if (status & LinkChg)
+			rtl8169_check_link_status(dev, tp, ioaddr);
 
-	/* Work around for rx fifo overflow */
-	if (unlikely(status & RxFIFOOver) &&
-	    (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
-		netif_stop_queue(dev);
-		rtl8169_tx_timeout(dev);
-		goto out;
-	}
+		/* We need to see the lastest version of tp->intr_mask to
+		 * avoid ignoring an MSI interrupt and having to wait for
+		 * another event which may never come.
+		 */
+		smp_rmb();
+		if (status & tp->intr_mask & tp->napi_event) {
+			RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
+			tp->intr_mask = ~tp->napi_event;
+
+			if (likely(netif_rx_schedule_prep(&tp->napi)))
+				__netif_rx_schedule(&tp->napi);
+			else if (netif_msg_intr(tp)) {
+				printk(KERN_INFO "%s: interrupt %04x in poll\n",
+			       	dev->name, status);
+			}
+		}
 
-	if (unlikely(status & SYSErr)) {
-		rtl8169_pcierr_interrupt(dev);
-		goto out;
+		/* We only get a new MSI interrupt when all active irq
+		 * sources on the chip have been acknowledged. So, ack
+		 * everything we've seen and check if new sources have become
+		 * active to avoid blocking all interrupts from the chip.
+		 */
+		RTL_W16(IntrStatus,
+			(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		status = RTL_R16(IntrStatus);
 	}
 
-	if (status & LinkChg)
-		rtl8169_check_link_status(dev, tp, ioaddr);
-
-	if (status & tp->napi_event) {
-		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
-		tp->intr_mask = ~tp->napi_event;
-
-		if (likely(netif_rx_schedule_prep(&tp->napi)))
-			__netif_rx_schedule(&tp->napi);
-		else if (netif_msg_intr(tp)) {
-			printk(KERN_INFO "%s: interrupt %04x in poll\n",
-			       dev->name, status);
-		}
-	}
-out:
 	return IRQ_RETVAL(handled);
 }
 
@@ -3617,13 +3627,15 @@
 
 	if (work_done < budget) {
 		netif_rx_complete(napi);
-		tp->intr_mask = 0xffff;
-		/*
-		 * 20040426: the barrier is not strictly required but the
-		 * behavior of the irq handler could be less predictable
-		 * without it. Btw, the lack of flush for the posted pci
-		 * write is safe - FR
+
+		/* We need for force the visibility of tp->intr_mask
+		 * for other CPUs, as we can loose an MSI interrupt
+		 * and potentially wait for a retransmit timeout if we don't.
+		 * The posted write to IntrMask is safe, as it will
+		 * eventually make it to the chip and we won't loose anything
+		 * until it does.
 		 */
+		tp->intr_mask = 0xffff;
 		smp_wmb();
 		RTL_W16(IntrMask, tp->intr_event);
 	}


-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23  9:24                             ` Michael Buesch
@ 2009-05-23 14:35                               ` Michael Riepe
  2009-05-23 14:44                                 ` Michael Buesch
  2009-05-23 14:51                                 ` David Dillow
  0 siblings, 2 replies; 77+ messages in thread
From: Michael Riepe @ 2009-05-23 14:35 UTC (permalink / raw)
  To: Michael Buesch
  Cc: David Dillow, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

Hi!

Michael Buesch wrote:

> Thanks a lot, Dave! This fixes the issue on my chip.

Yep, it's stable here as well. And even a little faster than pci=nomsi.
The only strangeness I observed is that the throughput (measured with
iperf and a single TCP connection) varies:

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    667 MBytes    559 Mbits/sec
[  3] 10.0-20.0 sec    803 MBytes    673 Mbits/sec
[  3] 20.0-30.0 sec    802 MBytes    673 Mbits/sec
[  3] 30.0-40.0 sec    714 MBytes    599 Mbits/sec
[  3] 40.0-50.0 sec    669 MBytes    561 Mbits/sec
[  3] 50.0-60.0 sec    791 MBytes    663 Mbits/sec
[  3]  0.0-60.0 sec  4.34 GBytes    622 Mbits/sec

In gkrellm, you can see that it actually alternates between two values.

With pci=nomsi (and without the patch) I get lower but more consistent
results:

[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    511 MBytes    429 Mbits/sec
[  3] 10.0-20.0 sec    465 MBytes    390 Mbits/sec
[  3] 20.0-30.0 sec    481 MBytes    404 Mbits/sec
[  3] 30.0-40.0 sec    466 MBytes    391 Mbits/sec
[  3] 40.0-50.0 sec    465 MBytes    390 Mbits/sec
[  3] 50.0-60.0 sec    463 MBytes    389 Mbits/sec
[  3]  0.0-60.0 sec  2.78 GBytes    399 Mbits/sec

I suppose it's a side effect of the MSI acknowledgement loop. But who am
I to complain about higher average throughput? ;-)

> You can add my:
> 
> Tested-by: Michael Buesch <mb@bu3sch.de>

Tested-by: Michael Riepe <michael.riepe@googlemail.com>

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 14:35                               ` Michael Riepe
@ 2009-05-23 14:44                                 ` Michael Buesch
  2009-05-23 15:01                                   ` Michael Riepe
  2009-05-23 14:51                                 ` David Dillow
  1 sibling, 1 reply; 77+ messages in thread
From: Michael Buesch @ 2009-05-23 14:44 UTC (permalink / raw)
  To: Michael Riepe
  Cc: David Dillow, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Saturday 23 May 2009 16:35:28 Michael Riepe wrote:
> Hi!
> 
> Michael Buesch wrote:
> 
> > Thanks a lot, Dave! This fixes the issue on my chip.
> 
> Yep, it's stable here as well. And even a little faster than pci=nomsi.
> The only strangeness I observed is that the throughput (measured with
> iperf and a single TCP connection) varies:
> 
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec    667 MBytes    559 Mbits/sec
> [  3] 10.0-20.0 sec    803 MBytes    673 Mbits/sec
> [  3] 20.0-30.0 sec    802 MBytes    673 Mbits/sec
> [  3] 30.0-40.0 sec    714 MBytes    599 Mbits/sec
> [  3] 40.0-50.0 sec    669 MBytes    561 Mbits/sec
> [  3] 50.0-60.0 sec    791 MBytes    663 Mbits/sec
> [  3]  0.0-60.0 sec  4.34 GBytes    622 Mbits/sec

Are you running the iperf server or client on the r8169?
I'm running the iperf server on the r8169 and get the following results:

mb@homer:~$ iperf -c 192.168.2.50 -t120 -i10
------------------------------------------------------------
Client connecting to 192.168.2.50, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.2.205 port 34739 connected with 192.168.2.50 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    789 MBytes    662 Mbits/sec
[  3] 10.0-20.0 sec    791 MBytes    664 Mbits/sec
[  3] 20.0-30.0 sec    799 MBytes    670 Mbits/sec
[  3] 30.0-40.0 sec    794 MBytes    666 Mbits/sec
[  3] 40.0-50.0 sec    795 MBytes    667 Mbits/sec
[  3] 50.0-60.0 sec    793 MBytes    665 Mbits/sec
[  3] 60.0-70.0 sec    800 MBytes    671 Mbits/sec
[  3] 70.0-80.0 sec    795 MBytes    667 Mbits/sec
[  3] 80.0-90.0 sec    800 MBytes    672 Mbits/sec
[  3] 90.0-100.0 sec    797 MBytes    669 Mbits/sec
[  3] 100.0-110.0 sec    789 MBytes    662 Mbits/sec
[  3] 110.0-120.0 sec    791 MBytes    663 Mbits/sec
[  3]  0.0-120.0 sec  9.31 GBytes    666 Mbits/sec

Looks acceptable to me.
The iperf client device is a BCM5780.

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 14:35                               ` Michael Riepe
  2009-05-23 14:44                                 ` Michael Buesch
@ 2009-05-23 14:51                                 ` David Dillow
  2009-05-23 16:12                                   ` Michael Riepe
  1 sibling, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-05-23 14:51 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Sat, 2009-05-23 at 16:35 +0200, Michael Riepe wrote:
> Hi!
> 
> Michael Buesch wrote:
> 
> > Thanks a lot, Dave! This fixes the issue on my chip.
> 
> Yep, it's stable here as well. And even a little faster than pci=nomsi.
> The only strangeness I observed is that the throughput (measured with
> iperf and a single TCP connection) varies:
> 
> [ ID] Interval       Transfer     Bandwidth
> [  3]  0.0-10.0 sec    667 MBytes    559 Mbits/sec
> [  3] 10.0-20.0 sec    803 MBytes    673 Mbits/sec
> [  3] 20.0-30.0 sec    802 MBytes    673 Mbits/sec
> [  3] 30.0-40.0 sec    714 MBytes    599 Mbits/sec
> [  3] 40.0-50.0 sec    669 MBytes    561 Mbits/sec
> [  3] 50.0-60.0 sec    791 MBytes    663 Mbits/sec
> [  3]  0.0-60.0 sec  4.34 GBytes    622 Mbits/sec
> 
[snip]
> I suppose it's a side effect of the MSI acknowledgement loop. But who am
> I to complain about higher average throughput? ;-)

I wonder if that is the TCP sawtooth pattern -- run up until we drop
packets, drop off, repeat. I thought newer congestion algorithms would
help with that, but I've not kept up, this may be another red-herring --
like the bisection into genirq.

A tcpdump may answer the question -- wireshark can do an analysis and
see if it is running up until it starts dropping or something.

Or it may be the loop, but I wouldn't expect it to make such a big
difference, or be as variable if it does.

Also, what does it look like with multiple streams?

Thanks for testing guys -- I'm glad it works for a sample size > 1!

Dave


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 14:44                                 ` Michael Buesch
@ 2009-05-23 15:01                                   ` Michael Riepe
  2009-05-23 16:40                                     ` Michael Buesch
  0 siblings, 1 reply; 77+ messages in thread
From: Michael Riepe @ 2009-05-23 15:01 UTC (permalink / raw)
  To: Michael Buesch
  Cc: David Dillow, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

Hi!

Michael Buesch wrote:
> On Saturday 23 May 2009 16:35:28 Michael Riepe wrote:
> 
>>Hi!
>>
>>Michael Buesch wrote:
>>
>>
>>>Thanks a lot, Dave! This fixes the issue on my chip.
>>
>>Yep, it's stable here as well. And even a little faster than pci=nomsi.
>>The only strangeness I observed is that the throughput (measured with
>>iperf and a single TCP connection) varies:
>>
>>[ ID] Interval       Transfer     Bandwidth
>>[  3]  0.0-10.0 sec    667 MBytes    559 Mbits/sec
>>[  3] 10.0-20.0 sec    803 MBytes    673 Mbits/sec
>>[  3] 20.0-30.0 sec    802 MBytes    673 Mbits/sec
>>[  3] 30.0-40.0 sec    714 MBytes    599 Mbits/sec
>>[  3] 40.0-50.0 sec    669 MBytes    561 Mbits/sec
>>[  3] 50.0-60.0 sec    791 MBytes    663 Mbits/sec
>>[  3]  0.0-60.0 sec  4.34 GBytes    622 Mbits/sec
> 
> 
> Are you running the iperf server or client on the r8169?

The client, since I wanted to measure write throughput. The server is a
Lenovo Thinkpad T60 (e1000e driver).

> I'm running the iperf server on the r8169 and get the following results:
> 
> mb@homer:~$ iperf -c 192.168.2.50 -t120 -i10

In that case, the r8169 is receiving data (which never was a problem, at
least for me - it only lost interrupts in TX mode).

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 14:51                                 ` David Dillow
@ 2009-05-23 16:12                                   ` Michael Riepe
  2009-05-23 16:45                                     ` Michael Buesch
  2009-05-23 16:46                                     ` David Dillow
  0 siblings, 2 replies; 77+ messages in thread
From: Michael Riepe @ 2009-05-23 16:12 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

Hi!

David Dillow wrote:

> I wonder if that is the TCP sawtooth pattern -- run up until we drop
> packets, drop off, repeat. I thought newer congestion algorithms would
> help with that, but I've not kept up, this may be another red-herring --
> like the bisection into genirq.

Actually, I just found out that things are much stranger. A freshly
booted system (I'm using 2.6.29.2 + the r8169 patch sent by Michael
Buesch, by the way) behaves like this:

[  3] local 192.168.178.206 port 44090 connected with 192.168.178.204
port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    483 MBytes    405 Mbits/sec
[  3] 10.0-20.0 sec    472 MBytes    396 Mbits/sec
[  3] 20.0-30.0 sec    482 MBytes    404 Mbits/sec
[  3] 30.0-40.0 sec    483 MBytes    405 Mbits/sec
[  3] 40.0-50.0 sec    480 MBytes    402 Mbits/sec
[  3] 50.0-60.0 sec    479 MBytes    402 Mbits/sec
[  3]  0.0-60.0 sec  2.81 GBytes    402 Mbits/sec

Then I've been running another test, something along the lines of

	for dest in host1 host1 host2 host2
	do ssh $dest dd of=/dev/null bs=8k count=10240000 </dev/zero &
	done

After a while, I killed the ssh processes and ran iperf again. And this
time, I got:

[  3] local 192.168.178.206 port 58029 connected with 192.168.178.204
port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    634 MBytes    531 Mbits/sec
[  3] 10.0-20.0 sec    740 MBytes    621 Mbits/sec
[  3] 20.0-30.0 sec    641 MBytes    538 Mbits/sec
[  3] 30.0-40.0 sec    738 MBytes    619 Mbits/sec
[  3] 40.0-50.0 sec    742 MBytes    622 Mbits/sec
[  3] 50.0-60.0 sec    743 MBytes    623 Mbits/sec
[  3]  0.0-60.0 sec  4.14 GBytes    592 Mbits/sec

Obviously, the high-load ssh test (which would kill the device within a
few seconds without the patch) triggers something here.

A few observations later, however, I was convinced that it's not a TCP
congestion or driver issue. Actually, the throughput depends on the CPU
the benchmark is running on. You can see that in gkrellm - whenever the
process jumps to another CPU, the throughput changes. On the four
(virtual) CPUs of the Atom 330, I get these results:

CPU 0:  0.0-60.0 sec  2.65 GBytes    380 Mbits/sec
CPU 1:  0.0-60.0 sec  4.12 GBytes    590 Mbits/sec
CPU 2:  0.0-60.0 sec  3.79 GBytes    543 Mbits/sec
CPU 3:  0.0-60.0 sec  4.13 GBytes    592 Mbits/sec

CPU 0+2 are on the first core, 1+3 on the second.

If I use two connections (iperf -P2) and nail iperf to both threads of a
single core with taskset (the program is multi-threaded, just in case
you wonder), I get this:

CPU 0+2:  0.0-60.0 sec  4.65 GBytes    665 Mbits/sec
CPU 1+3:  0.0-60.0 sec  6.43 GBytes    920 Mbits/sec

That's quite a difference, isn't it?

Now I wonder what CPU 0 is doing...

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 15:01                                   ` Michael Riepe
@ 2009-05-23 16:40                                     ` Michael Buesch
  0 siblings, 0 replies; 77+ messages in thread
From: Michael Buesch @ 2009-05-23 16:40 UTC (permalink / raw)
  To: Michael Riepe
  Cc: David Dillow, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Saturday 23 May 2009 17:01:36 Michael Riepe wrote:
> The client, since I wanted to measure write throughput. The server is a
> Lenovo Thinkpad T60 (e1000e driver).
> 
> > I'm running the iperf server on the r8169 and get the following results:
> > 
> > mb@homer:~$ iperf -c 192.168.2.50 -t120 -i10
> 
> In that case, the r8169 is receiving data (which never was a problem, at
> least for me - it only lost interrupts in TX mode).

Ok, I tested the other way around and it shows the problem:

mb@quimby:~/kernel$ iperf -c 192.168.2.205 -t120 -i10
------------------------------------------------------------
Client connecting to 192.168.2.205, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.2.50 port 34788 connected with 192.168.2.205 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec    501 MBytes    420 Mbits/sec
[  3] 10.0-20.0 sec    533 MBytes    447 Mbits/sec
[  3] 20.0-30.0 sec    699 MBytes    587 Mbits/sec
[  3] 30.0-40.0 sec    715 MBytes    600 Mbits/sec
[  3] 40.0-50.0 sec    469 MBytes    394 Mbits/sec
[  3] 50.0-60.0 sec    466 MBytes    391 Mbits/sec
[  3] 60.0-70.0 sec    464 MBytes    389 Mbits/sec
[  3] 70.0-80.0 sec    498 MBytes    418 Mbits/sec
[  3] 80.0-90.0 sec    477 MBytes    400 Mbits/sec
[  3] 90.0-100.0 sec    601 MBytes    504 Mbits/sec
[  3] 100.0-110.0 sec    696 MBytes    584 Mbits/sec
[  3] 110.0-120.0 sec    510 MBytes    428 Mbits/sec
[  3]  0.0-120.0 sec  6.47 GBytes    463 Mbits/sec


-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 16:12                                   ` Michael Riepe
@ 2009-05-23 16:45                                     ` Michael Buesch
  2009-05-23 16:46                                     ` David Dillow
  1 sibling, 0 replies; 77+ messages in thread
From: Michael Buesch @ 2009-05-23 16:45 UTC (permalink / raw)
  To: Michael Riepe
  Cc: David Dillow, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Saturday 23 May 2009 18:12:22 Michael Riepe wrote:

> CPU 0+2:  0.0-60.0 sec  4.65 GBytes    665 Mbits/sec
> CPU 1+3:  0.0-60.0 sec  6.43 GBytes    920 Mbits/sec
> 
> That's quite a difference, isn't it?
> 
> Now I wonder what CPU 0 is doing...

Is CPU0 the only one servicing interrupts?

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 16:12                                   ` Michael Riepe
  2009-05-23 16:45                                     ` Michael Buesch
@ 2009-05-23 16:46                                     ` David Dillow
  2009-05-23 16:50                                       ` Michael Buesch
  2009-05-23 16:53                                       ` Michael Riepe
  1 sibling, 2 replies; 77+ messages in thread
From: David Dillow @ 2009-05-23 16:46 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Sat, 2009-05-23 at 18:12 +0200, Michael Riepe wrote:
> If I use two connections (iperf -P2) and nail iperf to both threads of a
> single core with taskset (the program is multi-threaded, just in case
> you wonder), I get this:
> 
> CPU 0+2:  0.0-60.0 sec  4.65 GBytes    665 Mbits/sec
> CPU 1+3:  0.0-60.0 sec  6.43 GBytes    920 Mbits/sec
> 
> That's quite a difference, isn't it?
> 
> Now I wonder what CPU 0 is doing...

Where does /proc/interrupts say the irqs are going?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 16:46                                     ` David Dillow
@ 2009-05-23 16:50                                       ` Michael Buesch
  2009-05-23 16:53                                       ` Michael Riepe
  1 sibling, 0 replies; 77+ messages in thread
From: Michael Buesch @ 2009-05-23 16:50 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Saturday 23 May 2009 18:46:13 David Dillow wrote:
> On Sat, 2009-05-23 at 18:12 +0200, Michael Riepe wrote:
> > If I use two connections (iperf -P2) and nail iperf to both threads of a
> > single core with taskset (the program is multi-threaded, just in case
> > you wonder), I get this:
> > 
> > CPU 0+2:  0.0-60.0 sec  4.65 GBytes    665 Mbits/sec
> > CPU 1+3:  0.0-60.0 sec  6.43 GBytes    920 Mbits/sec
> > 
> > That's quite a difference, isn't it?
> > 
> > Now I wonder what CPU 0 is doing...
> 
> Where does /proc/interrupts say the irqs are going?
> 
> 
> 

For me it looks like this:
 24:   52606581          0          0          0   PCI-MSI-edge      eth0

So as I have the same CPU as Michael Riepe, I think CPU0 (core0) servicing
interrupts is related to the issue.

I'm wondering however, if this is expected behavior. iperf and interrupts will
pretty much saturate a core of the atom330. So it looks like a plain CPU bottleneck.

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 16:46                                     ` David Dillow
  2009-05-23 16:50                                       ` Michael Buesch
@ 2009-05-23 16:53                                       ` Michael Riepe
  2009-05-23 17:03                                         ` David Dillow
  1 sibling, 1 reply; 77+ messages in thread
From: Michael Riepe @ 2009-05-23 16:53 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev



David Dillow wrote:
> On Sat, 2009-05-23 at 18:12 +0200, Michael Riepe wrote:
> 
>>If I use two connections (iperf -P2) and nail iperf to both threads of a
>>single core with taskset (the program is multi-threaded, just in case
>>you wonder), I get this:
>>
>>CPU 0+2:  0.0-60.0 sec  4.65 GBytes    665 Mbits/sec
>>CPU 1+3:  0.0-60.0 sec  6.43 GBytes    920 Mbits/sec
>>
>>That's quite a difference, isn't it?
>>
>>Now I wonder what CPU 0 is doing...
> 
> 
> Where does /proc/interrupts say the irqs are going?

Oh well...

           CPU0       CPU1       CPU2       CPU3
  0:         66          0          0          0   IO-APIC-edge      timer
  1:          2          0          0          0   IO-APIC-edge      i8042
  7:          0          0          0          0   IO-APIC-edge
parport0
  8:          1          0          0          0   IO-APIC-edge      rtc0
  9:          0          0          0          0   IO-APIC-fasteoi   acpi
 12:          4          0          0          0   IO-APIC-edge      i8042
 14:          0          0          0          0   IO-APIC-edge
ata_piix
 15:          0          0          0          0   IO-APIC-edge
ata_piix
 16:          0          0          0          0   IO-APIC-fasteoi
uhci_hcd:usb4
 18:          0          0          0          0   IO-APIC-fasteoi
uhci_hcd:usb3
 19:       9696          0          0          0   IO-APIC-fasteoi
ata_piix, uhci_hcd:usb2
 22:        471          0          0          0   IO-APIC-fasteoi   HDA
Intel
 23:        100          0          0          0   IO-APIC-fasteoi
uhci_hcd:usb1, ehci_hcd:usb5
 27:   48463995          0          0          0   PCI-MSI-edge      eth0
NMI:          0          0          0          0   Non-maskable interrupts
LOC:     460965     294620     324902     358771   Local timer interrupts
RES:      30111     286150     113295     268387   Rescheduling interrupts
CAL:       1140       1143         52         34   Function call interrupts
TLB:       2223        952       1184       2573   TLB shootdowns
SPU:          0          0          0          0   Spurious interrupts
ERR:          0
MIS:          0


-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23 16:53                                       ` Michael Riepe
@ 2009-05-23 17:03                                         ` David Dillow
  0 siblings, 0 replies; 77+ messages in thread
From: David Dillow @ 2009-05-23 17:03 UTC (permalink / raw)
  To: Michael Riepe
  Cc: Michael Buesch, Francois Romieu, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Sat, 2009-05-23 at 18:53 +0200, Michael Riepe wrote:
> 
> David Dillow wrote:
> > On Sat, 2009-05-23 at 18:12 +0200, Michael Riepe wrote:
> > 
> >>If I use two connections (iperf -P2) and nail iperf to both threads of a
> >>single core with taskset (the program is multi-threaded, just in case
> >>you wonder), I get this:
> >>
> >>CPU 0+2:  0.0-60.0 sec  4.65 GBytes    665 Mbits/sec
> >>CPU 1+3:  0.0-60.0 sec  6.43 GBytes    920 Mbits/sec
> >>
> >>That's quite a difference, isn't it?
> >>
> >>Now I wonder what CPU 0 is doing...
> > 
> > 
> > Where does /proc/interrupts say the irqs are going?
> 
> Oh well...

>  27:   48463995          0          0          0   PCI-MSI-edge      eth0

What does it look like if you move the iperf around the CPUs while using
pci=nomsi? 

I'm looking to make sure I didn't cause a terrible regression in the
cost of IRQ handling...


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23  1:29                           ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
  2009-05-23  9:24                             ` Michael Buesch
@ 2009-05-24 21:15                             ` Francois Romieu
  2009-05-24 22:55                               ` David Dillow
  2009-05-26  5:55                             ` David Miller
  2009-08-21 20:57                             ` Eric W. Biederman
  3 siblings, 1 reply; 77+ messages in thread
From: Francois Romieu @ 2009-05-24 21:15 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Rui Santos,
	Michael =?unknown-8bit?B?QsO8a2Vy?=, linux-kernel, netdev

David Dillow <dave@thedillows.org> :
[...]
> This fixes the lockups I've seen. Both MSI and level-triggered interrupt
> configurations survive over an hour of testing when it would lockup in
> under 90 seconds before. I am certain of the analysis of the root cause,
> but there may be better ways to fix it. There may also be a theoretical
> race window between the ending of a NAPI poll cycle and a link change
> interrupt coming in, but I'm not sure it would matter. 

It makes sense.

If I understand correctly, one should expect to find some pending Tx
event in the ISR of a failed card when reading the registers with
ethtool.

Has someone noticed it ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-24 21:15                             ` Francois Romieu
@ 2009-05-24 22:55                               ` David Dillow
  0 siblings, 0 replies; 77+ messages in thread
From: David Dillow @ 2009-05-24 22:55 UTC (permalink / raw)
  To: Francois Romieu
  Cc: Michael Riepe, Michael Buesch, Rui Santos, Michael Büker,
	linux-kernel, netdev

On Sun, 2009-05-24 at 23:15 +0200, Francois Romieu wrote:
> David Dillow <dave@thedillows.org> :
> [...]
> > This fixes the lockups I've seen. Both MSI and level-triggered interrupt
> > configurations survive over an hour of testing when it would lockup in
> > under 90 seconds before. I am certain of the analysis of the root cause,
> > but there may be better ways to fix it. There may also be a theoretical
> > race window between the ending of a NAPI poll cycle and a link change
> > interrupt coming in, but I'm not sure it would matter. 
> 
> It makes sense.
> 
> If I understand correctly, one should expect to find some pending Tx
> event in the ISR of a failed card when reading the registers with
> ethtool.
> 
> Has someone noticed it ?

Yes, that's part of how I came to this conclusion, I put a debug patch
together that looked at the IRQ status 2 seconds after the last IRQ came
in. Then I waited for the chip to lock and the timer to fire. It showed
0x0085 in the IntrStatus register.

I didn't know I could do that with ethtool, but that would've been a
nice way to go, too. :)


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23  1:29                           ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
  2009-05-23  9:24                             ` Michael Buesch
  2009-05-24 21:15                             ` Francois Romieu
@ 2009-05-26  5:55                             ` David Miller
  2009-05-26 18:22                               ` Michael Buesch
  2009-08-21 20:57                             ` Eric W. Biederman
  3 siblings, 1 reply; 77+ messages in thread
From: David Miller @ 2009-05-26  5:55 UTC (permalink / raw)
  To: dave; +Cc: michael.riepe, mb, romieu, rsantos, m.bueker, linux-kernel,
	netdev

From: David Dillow <dave@thedillows.org>
Date: Fri, 22 May 2009 21:29:34 -0400

> The 8169 chip only generates MSI interrupts when all enabled event
> sources are quiescent and one or more sources transition to active. If
> not all of the active events are acknowledged, or a new event becomes
> active while the existing ones are cleared in the handler, we will not
> see a new interrupt.
> 
> The current interrupt handler masks off the Rx and Tx events once the
> NAPI handler has been scheduled, which opens a race window in which we
> can get another Rx or Tx event and never ACK'ing it, stopping all
> activity until the link is reset (ifconfig down/up). Fix this by always
> ACK'ing all event sources, and loop in the handler until we have all
> sources quiescent.
> 
> Signed-off-by: David Dillow <dave@thedillows.org>

I've applied this, thanks David.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26  5:55                             ` David Miller
@ 2009-05-26 18:22                               ` Michael Buesch
  2009-05-26 21:52                                 ` David Miller
  0 siblings, 1 reply; 77+ messages in thread
From: Michael Buesch @ 2009-05-26 18:22 UTC (permalink / raw)
  To: David Miller
  Cc: dave, michael.riepe, romieu, rsantos, m.bueker, linux-kernel,
	netdev

On Tuesday 26 May 2009 07:55:03 David Miller wrote:
> From: David Dillow <dave@thedillows.org>
> Date: Fri, 22 May 2009 21:29:34 -0400
> 
> > The 8169 chip only generates MSI interrupts when all enabled event
> > sources are quiescent and one or more sources transition to active. If
> > not all of the active events are acknowledged, or a new event becomes
> > active while the existing ones are cleared in the handler, we will not
> > see a new interrupt.
> > 
> > The current interrupt handler masks off the Rx and Tx events once the
> > NAPI handler has been scheduled, which opens a race window in which we
> > can get another Rx or Tx event and never ACK'ing it, stopping all
> > activity until the link is reset (ifconfig down/up). Fix this by always
> > ACK'ing all event sources, and loop in the handler until we have all
> > sources quiescent.
> > 
> > Signed-off-by: David Dillow <dave@thedillows.org>
> 
> I've applied this, thanks David.
> 
> 

I didn't notice a CC:stable.
I think this should also go to stable.
Does somebody take care?

(wiggle is able to apply the patch to stable without any problems, so it's easy
to do a patch)

-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26 18:22                               ` Michael Buesch
@ 2009-05-26 21:52                                 ` David Miller
  2009-05-26 22:14                                   ` David Miller
  0 siblings, 1 reply; 77+ messages in thread
From: David Miller @ 2009-05-26 21:52 UTC (permalink / raw)
  To: mb; +Cc: dave, michael.riepe, romieu, rsantos, m.bueker, linux-kernel,
	netdev

From: Michael Buesch <mb@bu3sch.de>
Date: Tue, 26 May 2009 20:22:23 +0200

> I didn't notice a CC:stable.
> I think this should also go to stable.
> Does somebody take care?

I'll queue it up.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26 21:52                                 ` David Miller
@ 2009-05-26 22:14                                   ` David Miller
  2009-05-26 22:40                                     ` Michael Riepe
                                                       ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: David Miller @ 2009-05-26 22:14 UTC (permalink / raw)
  To: mb; +Cc: dave, michael.riepe, romieu, rsantos, m.bueker, linux-kernel,
	netdev

From: David Miller <davem@davemloft.net>
Date: Tue, 26 May 2009 14:52:11 -0700 (PDT)

> From: Michael Buesch <mb@bu3sch.de>
> Date: Tue, 26 May 2009 20:22:23 +0200
> 
>> I didn't notice a CC:stable.
>> I think this should also go to stable.
>> Does somebody take care?
> 
> I'll queue it up.

Actually, this patch doesn't even remotely come close to applying
to 2.6.29.4

So if someone wants this in -stable, they need to respin this against
that tree and (if wanted) 2.6.27.x -stable as well.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26 22:14                                   ` David Miller
@ 2009-05-26 22:40                                     ` Michael Riepe
  2009-05-26 22:43                                       ` David Miller
  2009-05-27 16:19                                     ` Michael Buesch
  2009-06-16 19:32                                     ` Rui Santos
  2 siblings, 1 reply; 77+ messages in thread
From: Michael Riepe @ 2009-05-26 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: mb, dave, romieu, rsantos, m.bueker, linux-kernel, netdev



David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 26 May 2009 14:52:11 -0700 (PDT)
> 
> 
>>From: Michael Buesch <mb@bu3sch.de>
>>Date: Tue, 26 May 2009 20:22:23 +0200
>>
>>
>>>I didn't notice a CC:stable.
>>>I think this should also go to stable.
>>>Does somebody take care?
>>
>>I'll queue it up.
> 
> 
> Actually, this patch doesn't even remotely come close to applying
> to 2.6.29.4

Michael Buesch already provided a patch against 2.6.29.4 that we both
tested. I doubt that it will apply to 2.6.27 cleanly, though.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26 22:40                                     ` Michael Riepe
@ 2009-05-26 22:43                                       ` David Miller
  2009-05-26 23:10                                         ` David Miller
  0 siblings, 1 reply; 77+ messages in thread
From: David Miller @ 2009-05-26 22:43 UTC (permalink / raw)
  To: michael.riepe; +Cc: mb, dave, romieu, rsantos, m.bueker, linux-kernel, netdev

From: Michael Riepe <michael.riepe@googlemail.com>
Date: Wed, 27 May 2009 00:40:41 +0200

> Michael Buesch already provided a patch against 2.6.29.4 that we both
> tested. I doubt that it will apply to 2.6.27 cleanly, though.

Please forward a copy to me.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26 22:43                                       ` David Miller
@ 2009-05-26 23:10                                         ` David Miller
  0 siblings, 0 replies; 77+ messages in thread
From: David Miller @ 2009-05-26 23:10 UTC (permalink / raw)
  To: michael.riepe; +Cc: mb, dave, romieu, rsantos, m.bueker, linux-kernel, netdev

From: David Miller <davem@davemloft.net>
Date: Tue, 26 May 2009 15:43:40 -0700 (PDT)

> From: Michael Riepe <michael.riepe@googlemail.com>
> Date: Wed, 27 May 2009 00:40:41 +0200
> 
>> Michael Buesch already provided a patch against 2.6.29.4 that we both
>> tested. I doubt that it will apply to 2.6.27 cleanly, though.
> 
> Please forward a copy to me.

Nevermind, I was tired of waiting and fished it out of
patchwork.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26 22:14                                   ` David Miller
  2009-05-26 22:40                                     ` Michael Riepe
@ 2009-05-27 16:19                                     ` Michael Buesch
  2009-06-16 19:32                                     ` Rui Santos
  2 siblings, 0 replies; 77+ messages in thread
From: Michael Buesch @ 2009-05-27 16:19 UTC (permalink / raw)
  To: David Miller
  Cc: dave, michael.riepe, romieu, rsantos, m.bueker, linux-kernel,
	netdev

On Wednesday 27 May 2009 00:14:10 David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 26 May 2009 14:52:11 -0700 (PDT)
> 
> > From: Michael Buesch <mb@bu3sch.de>
> > Date: Tue, 26 May 2009 20:22:23 +0200
> > 
> >> I didn't notice a CC:stable.
> >> I think this should also go to stable.
> >> Does somebody take care?
> > 
> > I'll queue it up.
> 
> Actually, this patch doesn't even remotely come close to applying
> to 2.6.29.4

Wiggle is able to apply it cleanly.
I already posted a wiggle'd patch.
It's also attached to this mail.


Index: linux-2.6.29/drivers/net/r8169.c
===================================================================
--- linux-2.6.29.orig/drivers/net/r8169.c	2009-05-23 11:06:22.000000000 +0200
+++ linux-2.6.29/drivers/net/r8169.c	2009-05-23 11:08:17.000000000 +0200
@@ -3554,54 +3554,64 @@
 	int handled = 0;
 	int status;
 
+	/* loop handling interrupts until we have no new ones or
+	 * we hit a invalid/hotplug case.
+	 */
 	status = RTL_R16(IntrStatus);
+	while (status && status != 0xffff) {
+		handled = 1;
 
-	/* hotplug/major error/no more work/shared irq */
-	if ((status == 0xffff) || !status)
-		goto out;
-
-	handled = 1;
+		/* Handle all of the error cases first. These will reset
+		 * the chip, so just exit the loop.
+		 */
+		if (unlikely(!netif_running(dev))) {
+			rtl8169_asic_down(ioaddr);
+			break;
+		}
 
-	if (unlikely(!netif_running(dev))) {
-		rtl8169_asic_down(ioaddr);
-		goto out;
-	}
+		/* Work around for rx fifo overflow */
+		if (unlikely(status & RxFIFOOver) &&
+	    	(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+			netif_stop_queue(dev);
+			rtl8169_tx_timeout(dev);
+			break;
+		}
 
-	status &= tp->intr_mask;
-	RTL_W16(IntrStatus,
-		(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		if (unlikely(status & SYSErr)) {
+			rtl8169_pcierr_interrupt(dev);
+			break;
+		}
 
-	if (!(status & tp->intr_event))
-		goto out;
+		if (status & LinkChg)
+			rtl8169_check_link_status(dev, tp, ioaddr);
 
-	/* Work around for rx fifo overflow */
-	if (unlikely(status & RxFIFOOver) &&
-	    (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
-		netif_stop_queue(dev);
-		rtl8169_tx_timeout(dev);
-		goto out;
-	}
+		/* We need to see the lastest version of tp->intr_mask to
+		 * avoid ignoring an MSI interrupt and having to wait for
+		 * another event which may never come.
+		 */
+		smp_rmb();
+		if (status & tp->intr_mask & tp->napi_event) {
+			RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
+			tp->intr_mask = ~tp->napi_event;
+
+			if (likely(netif_rx_schedule_prep(&tp->napi)))
+				__netif_rx_schedule(&tp->napi);
+			else if (netif_msg_intr(tp)) {
+				printk(KERN_INFO "%s: interrupt %04x in poll\n",
+			       	dev->name, status);
+			}
+		}
 
-	if (unlikely(status & SYSErr)) {
-		rtl8169_pcierr_interrupt(dev);
-		goto out;
+		/* We only get a new MSI interrupt when all active irq
+		 * sources on the chip have been acknowledged. So, ack
+		 * everything we've seen and check if new sources have become
+		 * active to avoid blocking all interrupts from the chip.
+		 */
+		RTL_W16(IntrStatus,
+			(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		status = RTL_R16(IntrStatus);
 	}
 
-	if (status & LinkChg)
-		rtl8169_check_link_status(dev, tp, ioaddr);
-
-	if (status & tp->napi_event) {
-		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
-		tp->intr_mask = ~tp->napi_event;
-
-		if (likely(netif_rx_schedule_prep(&tp->napi)))
-			__netif_rx_schedule(&tp->napi);
-		else if (netif_msg_intr(tp)) {
-			printk(KERN_INFO "%s: interrupt %04x in poll\n",
-			       dev->name, status);
-		}
-	}
-out:
 	return IRQ_RETVAL(handled);
 }
 
@@ -3617,13 +3627,15 @@
 
 	if (work_done < budget) {
 		netif_rx_complete(napi);
-		tp->intr_mask = 0xffff;
-		/*
-		 * 20040426: the barrier is not strictly required but the
-		 * behavior of the irq handler could be less predictable
-		 * without it. Btw, the lack of flush for the posted pci
-		 * write is safe - FR
+
+		/* We need for force the visibility of tp->intr_mask
+		 * for other CPUs, as we can loose an MSI interrupt
+		 * and potentially wait for a retransmit timeout if we don't.
+		 * The posted write to IntrMask is safe, as it will
+		 * eventually make it to the chip and we won't loose anything
+		 * until it does.
 		 */
+		tp->intr_mask = 0xffff;
 		smp_wmb();
 		RTL_W16(IntrMask, tp->intr_event);
 	}


-- 
Greetings, Michael.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-26 22:14                                   ` David Miller
  2009-05-26 22:40                                     ` Michael Riepe
  2009-05-27 16:19                                     ` Michael Buesch
@ 2009-06-16 19:32                                     ` Rui Santos
  2 siblings, 0 replies; 77+ messages in thread
From: Rui Santos @ 2009-06-16 19:32 UTC (permalink / raw)
  To: David Miller
  Cc: mb, dave, michael.riepe, romieu, m.bueker, linux-kernel, netdev

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 26 May 2009 14:52:11 -0700 (PDT)
>
>   
>> From: Michael Buesch <mb@bu3sch.de>
>> Date: Tue, 26 May 2009 20:22:23 +0200
>>
>>     
>>> I didn't notice a CC:stable.
>>> I think this should also go to stable.
>>> Does somebody take care?
>>>       
>> I'll queue it up.
>>     
>
> Actually, this patch doesn't even remotely come close to applying
> to 2.6.29.4
>
> So if someone wants this in -stable, they need to respin this against
> that tree and (if wanted) 2.6.27.x -stable as well.
>   
Hi David,

Here is a patch for the 2.6.27.25 kernel. Could you please queue it up
to the -stable tree ?

With this patch applied I got no more problems on this XID 24a00000 chip.
Also, for this chip to work I also had to apply this patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=9389523a77be0a7e01d957c836733b5c9d5530a1
( with a few changes )
This patch was the first to be committed after the release of 2.6.27 and
it only seems to add support for a few extra NICs.

Thanks a lot for you initial patch.
>
>
>   

-- 

Regards,
Rui Santos



[-- Attachment #2: msi-r8169-2.6.27.25.patch --]
[-- Type: text/x-patch, Size: 3902 bytes --]

diff -upr linux-2.6.27.25.ori/drivers/net/r8169.c linux-2.6.27.25.new/drivers/net/r8169.c
--- linux-2.6.27.25.ori/drivers/net/r8169.c	2009-06-16 17:14:05.000000000 +0100
+++ linux-2.6.27.25.new/drivers/net/r8169.c	2009-06-16 17:41:03.000000000 +0100
@@ -2851,54 +2851,64 @@ static irqreturn_t rtl8169_interrupt(int
 	int handled = 0;
 	int status;
 
+	/* loop handling interrupts until we have no new ones or
+	 * we hit a invalid/hotplug case.
+	 */
 	status = RTL_R16(IntrStatus);
+	while (status && status != 0xffff) {
+		handled = 1;
 
-	/* hotplug/major error/no more work/shared irq */
-	if ((status == 0xffff) || !status)
-		goto out;
-
-	handled = 1;
-
-	if (unlikely(!netif_running(dev))) {
-		rtl8169_asic_down(ioaddr);
-		goto out;
-	}
+		/* Handle all of the error cases first. These will reset
+		 * the chip, so just exit the loop.
+		 */
+		if (unlikely(!netif_running(dev))) {
+			rtl8169_asic_down(ioaddr);
+			break;
+		}
 
-	status &= tp->intr_mask;
-	RTL_W16(IntrStatus,
-		(status & RxFIFOOver) ? (status | RxOverflow) : status);
-
-	if (!(status & tp->intr_event))
-		goto out;
-
-	/* Work around for rx fifo overflow */
-	if (unlikely(status & RxFIFOOver) &&
-	    (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
-		netif_stop_queue(dev);
-		rtl8169_tx_timeout(dev);
-		goto out;
-	}
+		/* Work around for rx fifo overflow */
+		if (unlikely(status & RxFIFOOver) &&
+		(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+			netif_stop_queue(dev);
+			rtl8169_tx_timeout(dev);
+			break;
+		}
 
-	if (unlikely(status & SYSErr)) {
-		rtl8169_pcierr_interrupt(dev);
-		goto out;
-	}
+		if (unlikely(status & SYSErr)) {
+			rtl8169_pcierr_interrupt(dev);
+			break;
+		}
 
-	if (status & LinkChg)
-		rtl8169_check_link_status(dev, tp, ioaddr);
+		if (status & LinkChg)
+			rtl8169_check_link_status(dev, tp, ioaddr);
 
-	if (status & tp->napi_event) {
-		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
-		tp->intr_mask = ~tp->napi_event;
-
-		if (likely(netif_rx_schedule_prep(dev, &tp->napi)))
-			__netif_rx_schedule(dev, &tp->napi);
-		else if (netif_msg_intr(tp)) {
-			printk(KERN_INFO "%s: interrupt %04x in poll\n",
-			       dev->name, status);
+		/* We need to see the lastest version of tp->intr_mask to
+		 * avoid ignoring an MSI interrupt and having to wait for
+		 * another event which may never come.
+		 */
+		smp_rmb();
+		if (status & tp->intr_mask & tp->napi_event) {
+			RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
+			tp->intr_mask = ~tp->napi_event;
+
+			if (likely(napi_schedule_prep(&tp->napi)))
+				__napi_schedule(&tp->napi);
+			else if (netif_msg_intr(tp)) {
+				printk(KERN_INFO "%s: interrupt %04x in poll\n",
+			       	dev->name, status);
+			}
 		}
+
+		/* We only get a new MSI interrupt when all active irq
+		 * sources on the chip have been acknowledged. So, ack
+		 * everything we've seen and check if new sources have become
+		 * active to avoid blocking all interrupts from the chip.
+		 */
+		RTL_W16(IntrStatus,
+			(status & RxFIFOOver) ? (status | RxOverflow) : status);
+		status = RTL_R16(IntrStatus);
 	}
-out:
+
 	return IRQ_RETVAL(handled);
 }
 
@@ -2914,13 +2924,15 @@ static int rtl8169_poll(struct napi_stru
 
 	if (work_done < budget) {
 		netif_rx_complete(dev, napi);
-		tp->intr_mask = 0xffff;
-		/*
-		 * 20040426: the barrier is not strictly required but the
-		 * behavior of the irq handler could be less predictable
-		 * without it. Btw, the lack of flush for the posted pci
-		 * write is safe - FR
+
+		/* We need for force the visibility of tp->intr_mask
+		 * for other CPUs, as we can loose an MSI interrupt
+		 * and potentially wait for a retransmit timeout if we don't.
+		 * The posted write to IntrMask is safe, as it will
+		 * eventually make it to the chip and we won't loose anything
+		 * until it does.
 		 */
+		tp->intr_mask = 0xffff;
 		smp_wmb();
 		RTL_W16(IntrMask, tp->intr_event);
 	}

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-05-23  1:29                           ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
                                               ` (2 preceding siblings ...)
  2009-05-26  5:55                             ` David Miller
@ 2009-08-21 20:57                             ` Eric W. Biederman
  2009-08-21 21:22                               ` Michael Riepe
  2009-08-21 22:59                               ` David Dillow
  3 siblings, 2 replies; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-21 20:57 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

> The 8169 chip only generates MSI interrupts when all enabled event
> sources are quiescent and one or more sources transition to active. If
> not all of the active events are acknowledged, or a new event becomes
> active while the existing ones are cleared in the handler, we will not
> see a new interrupt.
>
> The current interrupt handler masks off the Rx and Tx events once the
> NAPI handler has been scheduled, which opens a race window in which we
> can get another Rx or Tx event and never ACK'ing it, stopping all
> activity until the link is reset (ifconfig down/up). Fix this by always
> ACK'ing all event sources, and loop in the handler until we have all
> sources quiescent.
>
> Signed-off-by: David Dillow <dave@thedillows.org>
> ---
> This fixes the lockups I've seen. Both MSI and level-triggered interrupt
> configurations survive over an hour of testing when it would lockup in
> under 90 seconds before. I am certain of the analysis of the root cause,
> but there may be better ways to fix it. There may also be a theoretical
> race window between the ending of a NAPI poll cycle and a link change
> interrupt coming in, but I'm not sure it would matter. 
>
> Some variant of this should also be applied to the currently running
> stable trees, as the problem is long-standing.

I have what at first glance looks like a problem caused by this
patch.  For the last month since upgrading one of my machines from
2.6.28 to 2.6.30 it has been becomming inaccessible from the
network and I have a few:

NETDEV WATCHDOG: eth0 (r8169): transmit timed out

in my logs and a lot soft lockups that always have rtl8169_interrupt
as the thing that is running.   I suspect your patch has introduced
a near infinite loop in the interrupt handler and is causing these
soft lockups.

Any ideas?

Eric

BUG: soft lockup - CPU#3 stuck for 61s! [swapper:0]
CPU 3:
Pid: 0, comm: swapper Tainted: G        W  2.6.30-170263.2006.Arora.fc11.x86_64 #1 G33M-S2
RIP: 0010:[<ffffffffa01deacd>]  [<ffffffffa01deacd>] rtl8169_interrupt+0x26f/0x2b7 [r8169]
RSP: 0018:ffff880028070cb0  EFLAGS: 00000206
RAX: 0000000000000050 RBX: ffff880028070d10 RCX: ffff88002807b9e0
RDX: ffffc2000065c03e RSI: ffff88012d79a000 RDI: 0000000000000246
RBP: ffffffff8100c9d3 R08: ffff88012fae0000 R09: ffff880028070ec0
R10: 077321422cb06619 R11: 000000003c5efb73 R12: ffff880028070c30
R13: ffff88012d79a000 R14: ffff88012d79a600 R15: 077321422cb06619
FS:  0000000000000000(0000) GS:ffff88002806d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fc10010c000 CR3: 0000000000201000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
 <IRQ>  [<ffffffff81093f0b>] ? handle_IRQ_event+0x6a/0x13f
 [<ffffffff810219fa>] ? apic_write+0x24/0x3a
 [<ffffffff8109607a>] ? handle_edge_irq+0xdb/0x138
 [<ffffffff81012fbd>] ? native_sched_clock+0x2d/0x54
 [<ffffffff8100e996>] ? handle_irq+0x95/0xb7
 [<ffffffff8100df42>] ? do_IRQ+0x6a/0xe9
 [<ffffffff8100c853>] ? ret_from_intr+0x0/0x11
 [<ffffffff8104ba16>] ? __do_softirq+0x5e/0x1b0
 [<ffffffff8100cfcc>] ? call_softirq+0x1c/0x28
 [<ffffffff8100e721>] ? do_softirq+0x51/0xae
 [<ffffffff8104b6d2>] ? irq_exit+0x52/0xa3
 [<ffffffff81020f11>] ? smp_apic_timer_interrupt+0x94/0xb8
 [<ffffffff8100c9d3>] ? apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81014096>] ? mwait_idle+0x9b/0xcc
 [<ffffffff81014038>] ? mwait_idle+0x3d/0xcc
 [<ffffffff8100ae08>] ? enter_idle+0x33/0x49
 [<ffffffff8100aece>] ? cpu_idle+0xb0/0xf3
 [<ffffffff8136f30c>] ? start_secondary+0x19c/0x1b7

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-21 20:57                             ` Eric W. Biederman
@ 2009-08-21 21:22                               ` Michael Riepe
  2009-08-21 22:59                               ` David Dillow
  1 sibling, 0 replies; 77+ messages in thread
From: Michael Riepe @ 2009-08-21 21:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Dillow, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

Hi!

Eric W. Biederman wrote:

> I have what at first glance looks like a problem caused by this
> patch.  For the last month since upgrading one of my machines from
> 2.6.28 to 2.6.30 it has been becomming inaccessible from the
> network and I have a few:
> 
> NETDEV WATCHDOG: eth0 (r8169): transmit timed out
>
> in my logs and a lot soft lockups that always have rtl8169_interrupt
> as the thing that is running.   I suspect your patch has introduced
> a near infinite loop in the interrupt handler and is causing these
> soft lockups.

I've been running 2.6.30 for more than two months now and all is fine.
But this might be a different chip.

-- 
Michael "Tired" Riepe <michael.riepe@googlemail.com>
X-Tired: Each morning I get up I die a little

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-21 20:57                             ` Eric W. Biederman
  2009-08-21 21:22                               ` Michael Riepe
@ 2009-08-21 22:59                               ` David Dillow
  2009-08-21 23:34                                 ` David Dillow
  1 sibling, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-08-21 22:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

On Fri, 2009-08-21 at 13:57 -0700, Eric W. Biederman wrote:
> David Dillow <dave@thedillows.org> writes:
> I have what at first glance looks like a problem caused by this
> patch.  For the last month since upgrading one of my machines from
> 2.6.28 to 2.6.30 it has been becomming inaccessible from the
> network and I have a few:
> 
> NETDEV WATCHDOG: eth0 (r8169): transmit timed out
> 
> in my logs and a lot soft lockups that always have rtl8169_interrupt
> as the thing that is running.   I suspect your patch has introduced
> a near infinite loop in the interrupt handler and is causing these
> soft lockups.
> 
> Any ideas?

I would be surprised, but I suppose it is not out of the realm of
possibility. Can you send me a full dmesg, please?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-21 22:59                               ` David Dillow
@ 2009-08-21 23:34                                 ` David Dillow
  2009-08-22  0:24                                   ` Eric W. Biederman
  2009-08-22 11:48                                   ` Eric W. Biederman
  0 siblings, 2 replies; 77+ messages in thread
From: David Dillow @ 2009-08-21 23:34 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

On Fri, 2009-08-21 at 18:59 -0400, David Dillow wrote:
> On Fri, 2009-08-21 at 13:57 -0700, Eric W. Biederman wrote:
> > David Dillow <dave@thedillows.org> writes:
> > I have what at first glance looks like a problem caused by this
> > patch.  For the last month since upgrading one of my machines from
> > 2.6.28 to 2.6.30 it has been becomming inaccessible from the
> > network and I have a few:
> > 
> > NETDEV WATCHDOG: eth0 (r8169): transmit timed out
> > 
> > in my logs and a lot soft lockups that always have rtl8169_interrupt
> > as the thing that is running.   I suspect your patch has introduced
> > a near infinite loop in the interrupt handler and is causing these
> > soft lockups.
> > 
> > Any ideas?
> 
> I would be surprised, but I suppose it is not out of the realm of
> possibility. Can you send me a full dmesg, please?

Re-looking at the code, I'd guess that some IRQ status line is getting
stuck high, but I don't see why -- we should acknowledge all outstanding
interrupts each time through the loop, whether we care about them or
not.

Could reproduce a problem with the following patch applied, and send the
full dmesg, please?

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b82780d..46cb05a 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	void __iomem *ioaddr = tp->mmio_addr;
 	int handled = 0;
 	int status;
+	int count = 0;
 
 	/* loop handling interrupts until we have no new ones or
 	 * we hit a invalid/hotplug case.
@@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	while (status && status != 0xffff) {
 		handled = 1;
 
+		if (count++ > 100) {
+			printk_once("r8169 screaming irq status %08x "
+				"mask %08x event %08x napi %08x\n",
+				status, tp->intr_mask, tp->intr_event,
+				tp->napi_event);
+			break;
+		}
+
+
 		/* Handle all of the error cases first. These will reset
 		 * the chip, so just exit the loop.
 		 */

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-21 23:34                                 ` David Dillow
@ 2009-08-22  0:24                                   ` Eric W. Biederman
  2009-08-22 11:48                                   ` Eric W. Biederman
  1 sibling, 0 replies; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-22  0:24 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

> On Fri, 2009-08-21 at 18:59 -0400, David Dillow wrote:
>> On Fri, 2009-08-21 at 13:57 -0700, Eric W. Biederman wrote:
>> > David Dillow <dave@thedillows.org> writes:
>> > I have what at first glance looks like a problem caused by this
>> > patch.  For the last month since upgrading one of my machines from
>> > 2.6.28 to 2.6.30 it has been becomming inaccessible from the
>> > network and I have a few:
>> > 
>> > NETDEV WATCHDOG: eth0 (r8169): transmit timed out
>> > 
>> > in my logs and a lot soft lockups that always have rtl8169_interrupt
>> > as the thing that is running.   I suspect your patch has introduced
>> > a near infinite loop in the interrupt handler and is causing these
>> > soft lockups.
>> > 
>> > Any ideas?
>> 
>> I would be surprised, but I suppose it is not out of the realm of
>> possibility. Can you send me a full dmesg, please?
>
> Re-looking at the code, I'd guess that some IRQ status line is getting
> stuck high, but I don't see why -- we should acknowledge all outstanding
> interrupts each time through the loop, whether we care about them or
> not.
>
> Could reproduce a problem with the following patch applied, and send the
> full dmesg, please?

Will do.  This looks like a good way to test my hypothesis thanks.
I can't quite reproduce this problem so it may be a few days before
I know.

Eric


> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index b82780d..46cb05a 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	void __iomem *ioaddr = tp->mmio_addr;
>  	int handled = 0;
>  	int status;
> +	int count = 0;
>  
>  	/* loop handling interrupts until we have no new ones or
>  	 * we hit a invalid/hotplug case.
> @@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	while (status && status != 0xffff) {
>  		handled = 1;
>  
> +		if (count++ > 100) {
> +			printk_once("r8169 screaming irq status %08x "
> +				"mask %08x event %08x napi %08x\n",
> +				status, tp->intr_mask, tp->intr_event,
> +				tp->napi_event);
> +			break;
> +		}
> +
> +
>  		/* Handle all of the error cases first. These will reset
>  		 * the chip, so just exit the loop.
>  		 */

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-21 23:34                                 ` David Dillow
  2009-08-22  0:24                                   ` Eric W. Biederman
@ 2009-08-22 11:48                                   ` Eric W. Biederman
  2009-08-22 12:07                                     ` Eric W. Biederman
  1 sibling, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-22 11:48 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

>
> Re-looking at the code, I'd guess that some IRQ status line is getting
> stuck high, but I don't see why -- we should acknowledge all outstanding
> interrupts each time through the loop, whether we care about them or
> not.
>
> Could reproduce a problem with the following patch applied, and send the
> full dmesg, please?

Here is what I get.

r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d

Eric

> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index b82780d..46cb05a 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	void __iomem *ioaddr = tp->mmio_addr;
>  	int handled = 0;
>  	int status;
> +	int count = 0;
>  
>  	/* loop handling interrupts until we have no new ones or
>  	 * we hit a invalid/hotplug case.
> @@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	while (status && status != 0xffff) {
>  		handled = 1;
>  
> +		if (count++ > 100) {
> +			printk_once("r8169 screaming irq status %08x "
> +				"mask %08x event %08x napi %08x\n",
> +				status, tp->intr_mask, tp->intr_event,
> +				tp->napi_event);
> +			break;
> +		}
> +
> +
>  		/* Handle all of the error cases first. These will reset
>  		 * the chip, so just exit the loop.
>  		 */

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-22 11:48                                   ` Eric W. Biederman
@ 2009-08-22 12:07                                     ` Eric W. Biederman
  2009-08-22 20:43                                       ` David Dillow
  0 siblings, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-22 12:07 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

ebiederm@xmission.com (Eric W. Biederman) writes:

> David Dillow <dave@thedillows.org> writes:
>
>>
>> Re-looking at the code, I'd guess that some IRQ status line is getting
>> stuck high, but I don't see why -- we should acknowledge all outstanding
>> interrupts each time through the loop, whether we care about them or
>> not.
>>
>> Could reproduce a problem with the following patch applied, and send the
>> full dmesg, please?
>
> Here is what I get.
>
> r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d

And now that the machine has come out of it, that was followed by:
Looks like the soft lockup did not manage to trigger in this case.

------------[ cut here ]------------
WARNING: at /home/ebiederm/projects/linux/linux-2.6.30-Arora/net/sched/sch_generic.c:226 dev_watchdog+0xd9/0x14a()
Hardware name: G33M-S2
NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Modules linked in: tulip xt_tcpudp iptable_filter nfsd lockd nfs_acl auth_rpcgss bridge stp exportfs bnep sco l2cap bluetooth sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table ext4 jbd2 crc16 dm_mirror dm_region_hash dm_log dm_multipath dm_mod uinput kvm_intel kvm fuse xt_multiport iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 tun 8021q snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm ppdev ata_generic snd_timer firewire_ohci pata_acpi parport_pc snd firewire_core parport r8169 i2c_i801 pata_jmicron mii pcspkr soundcore sg iTCO_wdt floppy iTCO_vendor_support snd_page_alloc crc_itu_t ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd i915 drm i2c_algo_bit video output i2c_core [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.30eric-rtl8169-debug #36
Call Trace:
 <IRQ>  [<ffffffff8104583b>] warn_slowpath_common+0x88/0xb6
 [<ffffffff812ee53c>] ? dev_watchdog+0x0/0x14a
 [<ffffffff810458ec>] warn_slowpath_fmt+0x4b/0x61
 [<ffffffff812d607e>] ? netdev_drivername+0x52/0x70
 [<ffffffff812ee615>] dev_watchdog+0xd9/0x14a
 [<ffffffff810596de>] ? __queue_work+0x44/0x61
 [<ffffffff81050922>] run_timer_softirq+0x169/0x1f5
 [<ffffffff81020306>] ? apic_write+0x24/0x3a
 [<ffffffff81069ce2>] ? clockevents_program_event+0x88/0xa5
 [<ffffffff8104ba7b>] __do_softirq+0xc3/0x1b0
 [<ffffffff8100cfcc>] call_softirq+0x1c/0x28
 [<ffffffff8100e721>] do_softirq+0x51/0xae
 [<ffffffff8104b6d2>] irq_exit+0x52/0xa3
 [<ffffffff81020f11>] smp_apic_timer_interrupt+0x94/0xb8
 [<ffffffff8100c9d3>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81014096>] ? mwait_idle+0x9b/0xcc
 [<ffffffff81014038>] ? mwait_idle+0x3d/0xcc
 [<ffffffff8100ae08>] ? enter_idle+0x33/0x49
 [<ffffffff8100aece>] ? cpu_idle+0xb0/0xf3
 [<ffffffff8136f25c>] ? start_secondary+0x19c/0x1b7
---[ end trace 595b23907a063360 ]---

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-22 12:07                                     ` Eric W. Biederman
@ 2009-08-22 20:43                                       ` David Dillow
  2009-08-23 17:17                                         ` Jarek Poplawski
                                                           ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: David Dillow @ 2009-08-22 20:43 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
> 
> > David Dillow <dave@thedillows.org> writes:
> >
> >>
> >> Re-looking at the code, I'd guess that some IRQ status line is getting
> >> stuck high, but I don't see why -- we should acknowledge all outstanding
> >> interrupts each time through the loop, whether we care about them or
> >> not.
> >>
> >> Could reproduce a problem with the following patch applied, and send the
> >> full dmesg, please?
> >
> > Here is what I get.
> >
> > r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
> 
> And now that the machine has come out of it, that was followed by:
> Looks like the soft lockup did not manage to trigger in this case.

I need some more context, please. What is the network load through this
NIC when you have the issues? Light, heavy? Can you give me more details
about the machine? A full dmesg from boot until this happens would help
quite a bit. At a minimum it would help answer which version of the chip
we're dealing with and what the machine it is in looks like.

Can you reproduce this with pci=nomsi? I'm assuming it the chip running
in MSI mode.

Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
about a race between rtl8169_interrupt() and rtl8169_poll(), but it
isn't jumping out at me.

Also, I'm having connectivity troubles this weekend, so my response may
be spotty. :(

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-22 20:43                                       ` David Dillow
@ 2009-08-23 17:17                                         ` Jarek Poplawski
  2009-08-23 17:43                                           ` Michal Soltys
  2009-08-24  2:37                                         ` Eric W. Biederman
  2009-08-25  0:51                                         ` Eric W. Biederman
  2 siblings, 1 reply; 77+ messages in thread
From: Jarek Poplawski @ 2009-08-23 17:17 UTC (permalink / raw)
  To: David Dillow
  Cc: Eric W. Biederman, Michael Riepe, Michael Buesch, Francois Romieu,
	Rui Santos, Michael Büker, linux-kernel, netdev,
	Michal Soltys

David Dillow wrote, On 08/22/2009 10:43 PM:

> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>
>>> David Dillow <dave@thedillows.org> writes:
>>>
>>>> Re-looking at the code, I'd guess that some IRQ status line is getting
>>>> stuck high, but I don't see why -- we should acknowledge all outstanding
>>>> interrupts each time through the loop, whether we care about them or
>>>> not.
>>>>
>>>> Could reproduce a problem with the following patch applied, and send the
>>>> full dmesg, please?
>>> Here is what I get.
>>>
>>> r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>> And now that the machine has come out of it, that was followed by:
>> Looks like the soft lockup did not manage to trigger in this case.
> 
> I need some more context, please. What is the network load through this
> NIC when you have the issues? Light, heavy? Can you give me more details
> about the machine? A full dmesg from boot until this happens would help
> quite a bit. At a minimum it would help answer which version of the chip
> we're dealing with and what the machine it is in looks like.
> 
> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
> in MSI mode.
> 
> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
> isn't jumping out at me.
> 
> Also, I'm having connectivity troubles this weekend, so my response may
> be spotty. :(
> 


BTW, FYI, it seems Michal stopped tracking this problem, but he
found this commit problematic as well.

From: Michal Soltys <soltys@ziu.info>
Subject: Re: r8169 (+others ?) and note_interrupt performance hit on 2.6.30.x
Date: Wed, 05 Aug 2009 20:54:47 +0200
http://marc.info/?l=linux-netdev&m=124949848110710&w=2

Jarek P.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-23 17:17                                         ` Jarek Poplawski
@ 2009-08-23 17:43                                           ` Michal Soltys
  2009-08-23 17:54                                             ` Jarek Poplawski
  0 siblings, 1 reply; 77+ messages in thread
From: Michal Soltys @ 2009-08-23 17:43 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: David Dillow, Eric W. Biederman, Michael Riepe, Michael Buesch,
	Francois Romieu, Rui Santos, Michael Büker, linux-kernel,
	netdev

Jarek Poplawski wrote:
> David Dillow wrote, On 08/22/2009 10:43 PM:
> 
>> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>>
>>>> David Dillow <dave@thedillows.org> writes:
>>>>
>>>>> Re-looking at the code, I'd guess that some IRQ status line is getting
>>>>> stuck high, but I don't see why -- we should acknowledge all outstanding
>>>>> interrupts each time through the loop, whether we care about them or
>>>>> not.
>>>>>
>>>>> Could reproduce a problem with the following patch applied, and send the
>>>>> full dmesg, please?
>>>> Here is what I get.
>>>>
>>>> r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>>> And now that the machine has come out of it, that was followed by:
>>> Looks like the soft lockup did not manage to trigger in this case.
>> 
>> I need some more context, please. What is the network load through this
>> NIC when you have the issues? Light, heavy? Can you give me more details
>> about the machine? A full dmesg from boot until this happens would help
>> quite a bit. At a minimum it would help answer which version of the chip
>> we're dealing with and what the machine it is in looks like.
>> 
>> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
>> in MSI mode.
>> 
>> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
>> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
>> isn't jumping out at me.
>> 
>> Also, I'm having connectivity troubles this weekend, so my response may
>> be spotty. :(
>> 
> 
> 
> BTW, FYI, it seems Michal stopped tracking this problem, but he
> found this commit problematic as well.
> 
> From: Michal Soltys <soltys@ziu.info>
> Subject: Re: r8169 (+others ?) and note_interrupt performance hit on 2.6.30.x
> Date: Wed, 05 Aug 2009 20:54:47 +0200
> http://marc.info/?l=linux-netdev&m=124949848110710&w=2
> 

Well - not really stopped, but not sure what to look at before that 
particular commit (as cpu load for the tests I've done increased rather 
significantly as well before that, and after 2.6.29 - but it doesn't 
seem to be related to the driver). And I was away for over a week...

As fot the changes that commit introduced, here's is link to the mail 
with the oprofile I did back then:

http://www.spinics.net/lists/netdev/msg102709.html

I'm happy to assist any way I can.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-23 17:43                                           ` Michal Soltys
@ 2009-08-23 17:54                                             ` Jarek Poplawski
  0 siblings, 0 replies; 77+ messages in thread
From: Jarek Poplawski @ 2009-08-23 17:54 UTC (permalink / raw)
  To: Michal Soltys
  Cc: David Dillow, Eric W. Biederman, Michael Riepe, Michael Buesch,
	Francois Romieu, Rui Santos, Michael Büker, linux-kernel,
	netdev

On Sun, Aug 23, 2009 at 07:43:52PM +0200, Michal Soltys wrote:
> Jarek Poplawski wrote:
>> David Dillow wrote, On 08/22/2009 10:43 PM:
>>
>>> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>>>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>>>
>>>>> David Dillow <dave@thedillows.org> writes:
>>>>>
>>>>>> Re-looking at the code, I'd guess that some IRQ status line is getting
>>>>>> stuck high, but I don't see why -- we should acknowledge all outstanding
>>>>>> interrupts each time through the loop, whether we care about them or
>>>>>> not.
>>>>>>
>>>>>> Could reproduce a problem with the following patch applied, and send the
>>>>>> full dmesg, please?
>>>>> Here is what I get.
>>>>>
>>>>> r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>>>> And now that the machine has come out of it, that was followed by:
>>>> Looks like the soft lockup did not manage to trigger in this case.
>>>
>>> I need some more context, please. What is the network load through this
>>> NIC when you have the issues? Light, heavy? Can you give me more details
>>> about the machine? A full dmesg from boot until this happens would help
>>> quite a bit. At a minimum it would help answer which version of the chip
>>> we're dealing with and what the machine it is in looks like.
>>>
>>> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
>>> in MSI mode.
>>>
>>> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
>>> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
>>> isn't jumping out at me.
>>>
>>> Also, I'm having connectivity troubles this weekend, so my response may
>>> be spotty. :(
>>>
>>
>>
>> BTW, FYI, it seems Michal stopped tracking this problem, but he
>> found this commit problematic as well.
>>
>> From: Michal Soltys <soltys@ziu.info>
>> Subject: Re: r8169 (+others ?) and note_interrupt performance hit on 2.6.30.x
>> Date: Wed, 05 Aug 2009 20:54:47 +0200
>> http://marc.info/?l=linux-netdev&m=124949848110710&w=2
>>
>
> Well - not really stopped, but not sure what to look at before that  
> particular commit (as cpu load for the tests I've done increased rather  
> significantly as well before that, and after 2.6.29 - but it doesn't  
> seem to be related to the driver). And I was away for over a week...
>
> As fot the changes that commit introduced, here's is link to the mail  
> with the oprofile I did back then:
>
> http://www.spinics.net/lists/netdev/msg102709.html
>
> I'm happy to assist any way I can.

Very nice :-) I guess David might be interested in your dmesg etc.
(as above) too.

Jarek P.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-22 20:43                                       ` David Dillow
  2009-08-23 17:17                                         ` Jarek Poplawski
@ 2009-08-24  2:37                                         ` Eric W. Biederman
  2009-08-25  0:51                                         ` Eric W. Biederman
  2 siblings, 0 replies; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-24  2:37 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

[-- Attachment #1: Type: text/plain, Size: 2134 bytes --]

David Dillow <dave@thedillows.org> writes:

> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>> 
>> > David Dillow <dave@thedillows.org> writes:
>> >
>> >>
>> >> Re-looking at the code, I'd guess that some IRQ status line is getting
>> >> stuck high, but I don't see why -- we should acknowledge all outstanding
>> >> interrupts each time through the loop, whether we care about them or
>> >> not.
>> >>
>> >> Could reproduce a problem with the following patch applied, and send the
>> >> full dmesg, please?
>> >
>> > Here is what I get.
>> >
>> > r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>> 
>> And now that the machine has come out of it, that was followed by:
>> Looks like the soft lockup did not manage to trigger in this case.
>
> I need some more context, please. What is the network load through this
> NIC when you have the issues? Light, heavy? Can you give me more details
> about the machine? A full dmesg from boot until this happens would help
> quite a bit. At a minimum it would help answer which version of the chip
> we're dealing with and what the machine it is in looks like.

dmesg attached.

What seems to reproduce the problem is a download of about a gigabyte.
Which this machine does every hour or two.  The switch and the upstream
server are all 10Gig. So at least in bursts I expect I am saturating
the network adapter with traffic coming in as fast as it can come.

Last night I reverted your patch and the machine seems to be happy
and not having problems since then.

> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
> in MSI mode.

It is.

> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
> isn't jumping out at me.
>
> Also, I'm having connectivity troubles this weekend, so my response may
> be spotty. :(

No problem.

I haven't wrapped my head around the device specific bits but I suspect
we are simply receiving more packets while the interrupt handler is
running.

Eric


[-- Attachment #2: bs5-dmesg1.txt --]
[-- Type: text/plain, Size: 49418 bytes --]

Initializing cgroup subsys cpuset
Linux version 2.6.30eric-rtl8169-debug (ebiederm@maxwell.aristanetworks.com.) (gcc version 4.4.1 20090725 (Red Hat 4.4.1-2) (GCC) ) #36 SMP Fri Aug 21 18:30:36 PDT 2009
Command line: ro root=LABEL=/ rhgb quiet 8250.nr_uarts=16 crashkernel=128M
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009e800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000cf5e0000 (usable)
 BIOS-e820: 00000000cf5e0000 - 00000000cf5e3000 (ACPI NVS)
 BIOS-e820: 00000000cf5e3000 - 00000000cf5f0000 (ACPI data)
 BIOS-e820: 00000000cf5f0000 - 00000000cf600000 (reserved)
 BIOS-e820: 00000000d0000000 - 00000000e0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000130000000 (usable)
DMI 2.4 present.
last_pfn = 0x130000 max_arch_pfn = 0x100000000
MTRR default type: uncachable
MTRR fixed ranges enabled:
  00000-9FFFF write-back
  A0000-BFFFF uncachable
  C0000-CAFFF write-protect
  CB000-EFFFF uncachable
  F0000-FFFFF write-through
MTRR variable ranges enabled:
  0 base 100000000 mask FC0000000 write-back
  1 base 130000000 mask FF0000000 uncachable
  2 base 0C0000000 mask FF0000000 write-back
  3 base 0CF800000 mask FFF800000 uncachable
  4 base 0CF700000 mask FFFF00000 uncachable
  5 base 080000000 mask FC0000000 write-back
  6 base 000000000 mask F80000000 write-back
  7 base 0CF600000 mask FFFF00000 write-through
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
last_pfn = 0xcf5e0 max_arch_pfn = 0x100000000
init_memory_mapping: 0000000000000000-00000000cf5e0000
 0000000000 - 00cf400000 page 2M
 00cf400000 - 00cf5e0000 page 4k
kernel direct mapping tables up to cf5e0000 @ 8000-e000
init_memory_mapping: 0000000100000000-0000000130000000
 0100000000 - 0130000000 page 2M
kernel direct mapping tables up to 130000000 @ c000-12000
RAMDISK: 37cd1000 - 37fef789
ACPI: RSDP 00000000000f6f40 00014 (v00 GBT   )
ACPI: RSDT 00000000cf5e3040 0003C (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
ACPI: FACP 00000000cf5e30c0 00074 (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
ACPI: DSDT 00000000cf5e3180 054DD (v01 GBT    GBTUACPI 00001000 MSFT 0100000C)
ACPI: FACS 00000000cf5e0000 00040
ACPI: HPET 00000000cf5e87c0 00038 (v01 GBT    GBTUACPI 42302E31 GBTU 00000098)
ACPI: MCFG 00000000cf5e8840 0003C (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
ACPI: APIC 00000000cf5e86c0 00084 (v01 GBT    GBTUACPI 42302E31 GBTU 01010101)
ACPI: SSDT 00000000cf5e88c0 0015C (v01  PmRef  Cpu0Ist 00003000 INTL 20040311)
ACPI: SSDT 00000000cf5e8e70 00275 (v01  PmRef    CpuPm 00003000 INTL 20040311)
ACPI: Local APIC address 0xfee00000
No NUMA configuration found
Faking a node at 0000000000000000-0000000130000000
Bootmem setup node 0 0000000000000000-0000000130000000
  NODE_DATA [000000000000d000 - 0000000000021fff]
  bootmap [0000000000022000 -  0000000000047fff] pages 26
(8 early reservations) ==> bootmem [0000000000 - 0130000000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
  #2 [0000200000 - 0000961968]    TEXT DATA BSS ==> [0000200000 - 0000961968]
  #3 [0037cd1000 - 0037fef789]          RAMDISK ==> [0037cd1000 - 0037fef789]
  #4 [000009e800 - 0000100000]    BIOS reserved ==> [000009e800 - 0000100000]
  #5 [0000962000 - 00009620ee]              BRK ==> [0000962000 - 00009620ee]
  #6 [0000008000 - 000000c000]          PGTABLE ==> [0000008000 - 000000c000]
  #7 [000000c000 - 000000d000]          PGTABLE ==> [000000c000 - 000000d000]
found SMP MP-table at [ffff8800000f55a0] f55a0
Reserving 128MB of memory at 16MB for crashkernel (System RAM: 4864MB)
 [ffffe20000000000-ffffe200043fffff] PMD -> [ffff880028200000-ffff88002bbfffff] on node 0
Zone PFN ranges:
  DMA      0x00000000 -> 0x00001000
  DMA32    0x00001000 -> 0x00100000
  Normal   0x00100000 -> 0x00130000
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
    0: 0x00000000 -> 0x0000009e
    0: 0x00000100 -> 0x000cf5e0
    0: 0x00100000 -> 0x00130000
On node 0 totalpages: 1045886
  DMA zone: 56 pages used for memmap
  DMA zone: 1994 pages reserved
  DMA zone: 1948 pages, LIFO batch:0
  DMA32 zone: 14280 pages used for memmap
  DMA32 zone: 831000 pages, LIFO batch:31
  Normal zone: 2688 pages used for memmap
  Normal zone: 193920 pages, LIFO batch:31
ACPI: PM-Timer IO Port: 0x408
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x03] enabled)
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 0, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
Using ACPI (MADT) for SMP configuration information
ACPI: HPET id: 0x8086a201 base: 0xfed00000
SMP: Allowing 4 CPUs, 0 hotplug CPUs
nr_irqs_gsi: 24
PM: Registered nosave memory: 000000000009e000 - 00000000000a0000
PM: Registered nosave memory: 00000000000a0000 - 00000000000f0000
PM: Registered nosave memory: 00000000000f0000 - 0000000000100000
PM: Registered nosave memory: 00000000cf5e0000 - 00000000cf5e3000
PM: Registered nosave memory: 00000000cf5e3000 - 00000000cf5f0000
PM: Registered nosave memory: 00000000cf5f0000 - 00000000cf600000
PM: Registered nosave memory: 00000000cf600000 - 00000000d0000000
PM: Registered nosave memory: 00000000d0000000 - 00000000e0000000
PM: Registered nosave memory: 00000000e0000000 - 00000000fec00000
PM: Registered nosave memory: 00000000fec00000 - 0000000100000000
Allocating PCI resources starting at e2000000 (gap: e0000000:1ec00000)
NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:4 nr_node_ids:1
PERCPU: Embedded 25 pages at ffff880028022000, static data 72160 bytes
Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1026868
Policy zone: Normal
Kernel command line: ro root=LABEL=/ rhgb quiet 8250.nr_uarts=16 crashkernel=128M
Initializing CPU#0
NR_IRQS:4352 nr_irqs:440
PID hash table entries: 4096 (order: 12, 32768 bytes)
Fast TSC calibration using PIT
Detected 2399.928 MHz processor.
Console: colour VGA+ 80x25
console [tty0] enabled
Checking aperture...
No AGP bridge found
Calgary: detecting Calgary via BIOS EBDA area
Calgary: Unable to locate Rio Grande table in EBDA - bailing!
PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
Placing 64MB software IO TLB between ffff880020000000 - ffff880024000000
software IO TLB at phys 0x20000000 - 0x24000000
Memory: 3915580k/4980736k available (3571k kernel code, 797192k absent, 267964k reserved, 1887k data, 1280k init)
SLUB: Genslabs=14, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
hpet clockevent registered
HPET: 4 timers in total, 0 timers will be used for per-cpu timer
Calibrating delay loop (skipped), value calculated using timer frequency.. 4799.85 BogoMIPS (lpj=9599712)
Security Framework initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
Mount-cache hash table entries: 256
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 0/0x0 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 0
CPU0: Thermal monitoring enabled (TM2)
using mwait in idle threads.
ACPI: Core revision 20090320
Setting APIC routing to flat
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz stepping 0b
Booting processor 1 APIC 0x3 ip 0x6000
Initializing CPU#1
Calibrating delay using timer specific routine.. 4800.02 BogoMIPS (lpj=9600054)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 1/0x3 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 3
CPU1: Thermal monitoring enabled (TM2)
x86 PAT enabled: cpu 1, old 0x7040600070406, new 0x7010600070106
CPU1: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#1]: passed.
Booting processor 2 APIC 0x2 ip 0x6000
Initializing CPU#2
Calibrating delay using timer specific routine.. 4799.98 BogoMIPS (lpj=9599971)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 2/0x2 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 2
CPU2: Thermal monitoring enabled (TM2)
x86 PAT enabled: cpu 2, old 0x7040600070406, new 0x7010600070106
CPU2: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#2]: passed.
Booting processor 3 APIC 0x1 ip 0x6000
Initializing CPU#3
Calibrating delay using timer specific routine.. 4799.85 BogoMIPS (lpj=9599715)
CPU: L1 I cache: 32K, L1 D cache: 32K
CPU: L2 cache: 4096K
CPU 3/0x1 -> Node 0
CPU: Physical Processor ID: 0
CPU: Processor Core ID: 1
CPU3: Thermal monitoring enabled (TM2)
x86 PAT enabled: cpu 3, old 0x7040600070406, new 0x7010600070106
CPU3: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz stepping 0b
checking TSC synchronization [CPU#0 -> CPU#3]: passed.
Brought up 4 CPUs
Total of 4 processors activated (19199.72 BogoMIPS).
net_namespace: 1888 bytes
Booting paravirtualized kernel on bare hardware
NET: Registered protocol family 16
ACPI: bus type pci registered
PCI: MCFG configuration 0: base d0000000 segment 0 buses 0 - 255
PCI: MCFG area at d0000000 reserved in E820
PCI: Using MMCONFIG at d0000000 - dfffffff
PCI: Using configuration type 1 for base access
mtrr: your CPUs had inconsistent variable MTRR settings
mtrr: probably your BIOS does not setup all CPUs.
mtrr: corrected configuration.
bio: create slab <bio-0> at 0
ACPI: EC: Look up EC in DSDT
ACPI: Interpreter enabled
ACPI: (supports S0 S3 S4 S5)
ACPI: Using IOAPIC for interrupt routing
ACPI: No dock devices found.
ACPI: PCI Root Bridge [PCI0] (0000:00)
pci 0000:00:02.0: reg 10 32bit mmio: [0xf3200000-0xf327ffff]
pci 0000:00:02.0: reg 14 io port: [0xe200-0xe207]
pci 0000:00:02.0: reg 18 32bit mmio: [0xe0000000-0xefffffff]
pci 0000:00:02.0: reg 1c 32bit mmio: [0xf3000000-0xf30fffff]
pci 0000:00:1a.0: reg 20 io port: [0xe300-0xe31f]
pci 0000:00:1a.1: reg 20 io port: [0xe000-0xe01f]
pci 0000:00:1a.2: reg 20 io port: [0xe100-0xe11f]
pci 0000:00:1a.7: reg 10 32bit mmio: [0xf3285000-0xf32853ff]
pci 0000:00:1b.0: reg 10 64bit mmio: [0xf3280000-0xf3283fff]
pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1b.0: PME# disabled
pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.0: PME# disabled
pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.4: PME# disabled
pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold
pci 0000:00:1c.5: PME# disabled
pci 0000:00:1d.0: reg 20 io port: [0xe400-0xe41f]
pci 0000:00:1d.1: reg 20 io port: [0xe500-0xe51f]
pci 0000:00:1d.2: reg 20 io port: [0xe600-0xe61f]
pci 0000:00:1d.7: reg 10 32bit mmio: [0xf3284000-0xf32843ff]
pci 0000:00:1f.0: quirk: region 0400-047f claimed by ICH6 ACPI/GPIO/TCO
pci 0000:00:1f.0: quirk: region 0480-04bf claimed by ICH6 GPIO
pci 0000:00:1f.0: ICH7 LPC Generic IO decode 1 PIO at 0800 (mask 000f)
pci 0000:00:1f.0: ICH7 LPC Generic IO decode 2 PIO at 0290 (mask 000f)
pci 0000:00:1f.2: reg 10 io port: [0xe700-0xe707]
pci 0000:00:1f.2: reg 14 io port: [0xe800-0xe803]
pci 0000:00:1f.2: reg 18 io port: [0xe900-0xe907]
pci 0000:00:1f.2: reg 1c io port: [0xea00-0xea03]
pci 0000:00:1f.2: reg 20 io port: [0xeb00-0xeb1f]
pci 0000:00:1f.2: reg 24 32bit mmio: [0xf3286000-0xf32867ff]
pci 0000:00:1f.2: PME# supported from D3hot
pci 0000:00:1f.2: PME# disabled
pci 0000:00:1f.3: reg 10 64bit mmio: [0xf3287000-0xf32870ff]
pci 0000:00:1f.3: reg 20 io port: [0x500-0x51f]
pci 0000:00:1c.0: bridge io port: [0xb000-0xbfff]
pci 0000:02:00.0: reg 10 io port: [0xc000-0xc007]
pci 0000:02:00.0: reg 14 io port: [0xc100-0xc103]
pci 0000:02:00.0: reg 18 io port: [0xc200-0xc207]
pci 0000:02:00.0: reg 1c io port: [0xc300-0xc303]
pci 0000:02:00.0: reg 20 io port: [0xc400-0xc40f]
pci 0000:02:00.0: disabling ASPM on pre-1.1 PCIe device.  You can enable it with 'pcie_aspm=force'
pci 0000:00:1c.4: bridge io port: [0xc000-0xcfff]
pci 0000:00:1c.4: bridge 32bit mmio: [0xf0000000-0xf0ffffff]
pci 0000:03:00.0: reg 10 io port: [0xd000-0xd0ff]
pci 0000:03:00.0: reg 18 64bit mmio: [0xf2000000-0xf2000fff]
pci 0000:03:00.0: reg 30 32bit mmio: [0x000000-0x00ffff]
pci 0000:03:00.0: supports D1 D2
pci 0000:03:00.0: PME# supported from D1 D2 D3hot D3cold
pci 0000:03:00.0: PME# disabled
pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device.  You can enable it with 'pcie_aspm=force'
pci 0000:00:1c.5: bridge io port: [0xd000-0xdfff]
pci 0000:00:1c.5: bridge 32bit mmio: [0xf1000000-0xf2ffffff]
pci 0000:04:07.0: reg 10 32bit mmio: [0xf3104000-0xf31047ff]
pci 0000:04:07.0: reg 14 32bit mmio: [0xf3100000-0xf3103fff]
pci 0000:04:07.0: supports D1 D2
pci 0000:04:07.0: PME# supported from D0 D1 D2 D3hot
pci 0000:04:07.0: PME# disabled
pci 0000:00:1e.0: transparent bridge
pci 0000:00:1e.0: bridge 32bit mmio: [0xf3100000-0xf31fffff]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX4._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEX5._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HUB0._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 *15)
ACPI: PCI Interrupt Link [LNK0] (IRQs 3 4 5 6 7 9 10 11 12 *14 15)
ACPI: PCI Interrupt Link [LNK1] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
PCI: Using ACPI for IRQ routing
NetLabel: Initializing
NetLabel:  domain hash size = 128
NetLabel:  protocols = UNLABELED CIPSOv4
NetLabel:  unlabeled traffic allowed by default
hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0
hpet0: 4 comparators, 64-bit 14.318180 MHz counter
pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp: PnP ACPI: found 15 devices
ACPI: ACPI bus type pnp unregistered
system 00:01: ioport range 0x4d0-0x4d1 has been reserved
system 00:01: ioport range 0x290-0x29f has been reserved
system 00:01: ioport range 0x800-0x87f has been reserved
system 00:01: ioport range 0x290-0x294 has been reserved
system 00:01: ioport range 0x880-0x88f has been reserved
system 00:0b: ioport range 0x400-0x4bf could not be reserved
system 00:0c: iomem range 0xd0000000-0xdfffffff has been reserved
system 00:0d: iomem range 0xcee00-0xcffff has been reserved
system 00:0d: iomem range 0xf0000-0xf7fff could not be reserved
system 00:0d: iomem range 0xf8000-0xfbfff could not be reserved
system 00:0d: iomem range 0xfc000-0xfffff could not be reserved
system 00:0d: iomem range 0xcf5e0000-0xcf5fffff could not be reserved
system 00:0d: iomem range 0x0-0x9ffff could not be reserved
system 00:0d: iomem range 0x100000-0xcf5dffff could not be reserved
system 00:0d: iomem range 0xfec00000-0xfec00fff has been reserved
system 00:0d: iomem range 0xfed10000-0xfed1dfff has been reserved
system 00:0d: iomem range 0xfed20000-0xfed8ffff has been reserved
system 00:0d: iomem range 0xfee00000-0xfee00fff has been reserved
system 00:0d: iomem range 0xffb00000-0xffb7ffff has been reserved
system 00:0d: iomem range 0xfff00000-0xffffffff has been reserved
system 00:0d: iomem range 0xe0000-0xeffff has been reserved
pci 0000:00:1c.0: PCI bridge, secondary bus 0000:01
pci 0000:00:1c.0:   IO window: 0xb000-0xbfff
pci 0000:00:1c.0:   MEM window: disabled
pci 0000:00:1c.0:   PREFETCH window: disabled
pci 0000:00:1c.4: PCI bridge, secondary bus 0000:02
pci 0000:00:1c.4:   IO window: 0xc000-0xcfff
pci 0000:00:1c.4:   MEM window: 0xf0000000-0xf0ffffff
pci 0000:00:1c.4:   PREFETCH window: disabled
pci 0000:00:1c.5: PCI bridge, secondary bus 0000:03
pci 0000:00:1c.5:   IO window: 0xd000-0xdfff
pci 0000:00:1c.5:   MEM window: 0xf1000000-0xf2ffffff
pci 0000:00:1c.5:   PREFETCH window: 0x000000f3300000-0x000000f33fffff
pci 0000:00:1e.0: PCI bridge, secondary bus 0000:04
pci 0000:00:1e.0:   IO window: disabled
pci 0000:00:1e.0:   MEM window: 0xf3100000-0xf31fffff
pci 0000:00:1e.0:   PREFETCH window: disabled
  alloc irq_desc for 16 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
pci 0000:00:1c.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:1c.0: setting latency timer to 64
pci 0000:00:1c.4: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pci 0000:00:1c.4: setting latency timer to 64
  alloc irq_desc for 17 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
pci 0000:00:1c.5: PCI INT B -> GSI 17 (level, low) -> IRQ 17
pci 0000:00:1c.5: setting latency timer to 64
pci 0000:00:1e.0: setting latency timer to 64
pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
pci_bus 0000:01: resource 0 io:  [0xb000-0xbfff]
pci_bus 0000:02: resource 0 io:  [0xc000-0xcfff]
pci_bus 0000:02: resource 1 mem: [0xf0000000-0xf0ffffff]
pci_bus 0000:03: resource 0 io:  [0xd000-0xdfff]
pci_bus 0000:03: resource 1 mem: [0xf1000000-0xf2ffffff]
pci_bus 0000:03: resource 2 pref mem [0xf3300000-0xf33fffff]
pci_bus 0000:04: resource 1 mem: [0xf3100000-0xf31fffff]
pci_bus 0000:04: resource 3 io:  [0x00-0xffff]
pci_bus 0000:04: resource 4 mem: [0x000000-0xffffffffffffffff]
NET: Registered protocol family 2
IP route cache hash table entries: 131072 (order: 8, 1048576 bytes)
TCP established hash table entries: 524288 (order: 11, 8388608 bytes)
TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
TCP: Hash tables configured (established 524288 bind 65536)
TCP reno registered
NET: Registered protocol family 1
Trying to unpack rootfs image as initramfs...
Switched to high resolution mode on CPU 3
Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 2
Switched to high resolution mode on CPU 0
Freeing initrd memory: 3193k freed
audit: initializing netlink socket (disabled)
type=2000 audit(1251019087.537:1): initialized
HugeTLB registered 2 MB page size, pre-allocated 0 pages
VFS: Disk quotas dquot_6.5.2
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
msgmni has been set to 7653
SELinux:  Registering netfilter hooks
cryptomgr_test used greatest stack depth: 6152 bytes left
alg: No test for stdrng (krng)
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
pci 0000:00:02.0: Boot video device
pci 0000:00:1a.0: uhci_check_and_reset_hc: legsup = 0x0f10
pci 0000:00:1a.0: Performing full reset
pci 0000:00:1a.1: uhci_check_and_reset_hc: legsup = 0x0010
pci 0000:00:1a.1: Performing full reset
pci 0000:00:1a.2: uhci_check_and_reset_hc: legsup = 0x0010
pci 0000:00:1a.2: Performing full reset
pci 0000:00:1d.0: uhci_check_and_reset_hc: legsup = 0x003a
pci 0000:00:1d.0: Performing full reset
pci 0000:00:1d.1: uhci_check_and_reset_hc: legsup = 0x0010
pci 0000:00:1d.1: Performing full reset
pci 0000:00:1d.2: uhci_check_and_reset_hc: legsup = 0x0010
pci 0000:00:1d.2: Performing full reset
  alloc irq_desc for 24 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
pcieport-driver 0000:00:1c.0: irq 24 for MSI/MSI-X
pcieport-driver 0000:00:1c.0: setting latency timer to 64
  alloc irq_desc for 25 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
pcieport-driver 0000:00:1c.4: irq 25 for MSI/MSI-X
pcieport-driver 0000:00:1c.4: setting latency timer to 64
  alloc irq_desc for 26 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
pcieport-driver 0000:00:1c.5: irq 26 for MSI/MSI-X
pcieport-driver 0000:00:1c.5: setting latency timer to 64
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
ACPI: Power Button [PWRF]
input: Power Button as /devices/LNXSYSTM:00/device:00/PNP0C0C:00/input/input1
ACPI: Power Button [PWRB]
processor ACPI_CPU:00: registered as cooling_device0
ACPI: Processor [CPU0] (supports 8 throttling states)
ACPI: SSDT 00000000cf5e8cc0 00087 (v01  PmRef  Cpu1Ist 00003000 INTL 20040311)
processor ACPI_CPU:01: registered as cooling_device1
ACPI: Processor [CPU1] (supports 8 throttling states)
ACPI: SSDT 00000000cf5e8d50 00087 (v01  PmRef  Cpu2Ist 00003000 INTL 20040311)
processor ACPI_CPU:02: registered as cooling_device2
ACPI: Processor [CPU2] (supports 8 throttling states)
ACPI: SSDT 00000000cf5e8de0 00087 (v01  PmRef  Cpu3Ist 00003000 INTL 20040311)
processor ACPI_CPU:03: registered as cooling_device3
ACPI: Processor [CPU3] (supports 8 throttling states)
Non-volatile memory driver v1.3
Linux agpgart interface v0.103
agpgart-intel 0000:00:00.0: Intel G33 Chipset
agpgart-intel 0000:00:00.0: detected 7164K stolen memory
agpgart-intel 0000:00:00.0: AGP aperture is 256M @ 0xe0000000
work_for_cpu used greatest stack depth: 5928 bytes left
Serial: 8250/16550 driver, 16 ports, IRQ sharing enabled
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
00:07: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
00:08: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
brd: module loaded
input: Macintosh mouse button emulation as /devices/virtual/input/input2
PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
serio: i8042 KBD port at 0x60,0x64 irq 1
mice: PS/2 mouse device common for all mice
rtc_cmos 00:03: RTC can wake from S4
rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one month, 242 bytes nvram, hpet irqs
cpuidle: using governor ladder
cpuidle: using governor menu
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
PM: Resume from disk failed.
registered taskstats version 1
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 5080k
modprobe used greatest stack depth: 5776 bytes left
input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input3
[drm] Initialized drm 1.1.0 20060810
i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
i915 0000:00:02.0: setting latency timer to 64
mtrr: no more MTRRs available
[drm] MTRR allocation failed.  Graphics performance may suffer.
  alloc irq_desc for 27 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
i915 0000:00:02.0: irq 27 for MSI/MSI-X
allocated 1280x1024 fb: 0x007e0000, bo ffff88012dc77180
Console: switching to colour frame buffer device 160x64
[drm] DAC-6: set mode 1280x1024 1a
fb0: inteldrmfb frame buffer device
registered panic notifier
[drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
work_for_cpu used greatest stack depth: 4680 bytes left
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci_hcd: block sizes: qh 160 qtd 96 itd 192 sitd 96
  alloc irq_desc for 18 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
ehci_hcd 0000:00:1a.7: PCI INT C -> GSI 18 (level, low) -> IRQ 18
ehci_hcd 0000:00:1a.7: setting latency timer to 64
ehci_hcd 0000:00:1a.7: EHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file 'devices'
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
ehci_hcd 0000:00:1a.7: new USB bus registered, assigned bus number 1
ehci_hcd 0000:00:1a.7: reset hcs_params 0x103206 dbg=1 cc=3 pcc=2 ordered !ppc ports=6
ehci_hcd 0000:00:1a.7: reset hcc_params 16871 thresh 7 uframes 1024 64 bit addr
ehci_hcd 0000:00:1a.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
ehci_hcd 0000:00:1a.7: cache line size of 32 is not supported
ehci_hcd 0000:00:1a.7: supports USB remote wakeup
ehci_hcd 0000:00:1a.7: irq 18, io mem 0xf3285000
ehci_hcd 0000:00:1a.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
ehci_hcd 0000:00:1a.7: init command 010001 (park)=0 ithresh=1 period=1024 RUN
ehci_hcd 0000:00:1a.7: USB 2.0 started, EHCI 1.00
usb usb1: default language 0x0409
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb1: Product: EHCI Host Controller
usb usb1: Manufacturer: Linux 2.6.30eric-rtl8169-debug ehci_hcd
usb usb1: SerialNumber: 0000:00:1a.7
usb usb1: uevent
usb usb1: usb_probe_device
usb usb1: configuration #1 chosen from 1 choice
usb usb1: adding 1-0:1.0 (config #1, interface 0)
usb 1-0:1.0: uevent
hub 1-0:1.0: usb_probe_interface
hub 1-0:1.0: usb_probe_interface - got id
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
hub 1-0:1.0: standalone hub
hub 1-0:1.0: no power switching (usb 1.0)
hub 1-0:1.0: individual port over-current protection
hub 1-0:1.0: power on to power good time: 20ms
hub 1-0:1.0: local power source is good
hub 1-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
  alloc irq_desc for 23 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
ehci_hcd 0000:00:1d.7: PCI INT A -> GSI 23 (level, low) -> IRQ 23
ehci_hcd 0000:00:1d.7: setting latency timer to 64
ehci_hcd 0000:00:1d.7: EHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '002'
ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 2
ehci_hcd 0000:00:1d.7: reset hcs_params 0x103206 dbg=1 cc=3 pcc=2 ordered !ppc ports=6
ehci_hcd 0000:00:1d.7: reset hcc_params 16871 thresh 7 uframes 1024 64 bit addr
ehci_hcd 0000:00:1d.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
ehci_hcd 0000:00:1d.7: cache line size of 32 is not supported
ehci_hcd 0000:00:1d.7: supports USB remote wakeup
ehci_hcd 0000:00:1d.7: irq 23, io mem 0xf3284000
ehci_hcd 0000:00:1d.7: reset command 080002 (park)=0 ithresh=8 period=1024 Reset HALT
ehci_hcd 0000:00:1d.7: init command 010001 (park)=0 ithresh=1 period=1024 RUN
ehci_hcd 0000:00:1d.7: USB 2.0 started, EHCI 1.00
usb usb2: default language 0x0409
usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb2: Product: EHCI Host Controller
usb usb2: Manufacturer: Linux 2.6.30eric-rtl8169-debug ehci_hcd
usb usb2: SerialNumber: 0000:00:1d.7
usb usb2: uevent
usb usb2: usb_probe_device
usb usb2: configuration #1 chosen from 1 choice
usb usb2: adding 2-0:1.0 (config #1, interface 0)
usb 2-0:1.0: uevent
hub 2-0:1.0: usb_probe_interface
hub 2-0:1.0: usb_probe_interface - got id
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 6 ports detected
hub 2-0:1.0: standalone hub
hub 2-0:1.0: no power switching (usb 1.0)
hub 2-0:1.0: individual port over-current protection
hub 2-0:1.0: power on to power good time: 20ms
hub 2-0:1.0: local power source is good
hub 2-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci_hcd: block sizes: ed 80 td 96
uhci_hcd: USB Universal Host Controller Interface driver
uhci_hcd 0000:00:1a.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
uhci_hcd 0000:00:1a.0: setting latency timer to 64
uhci_hcd 0000:00:1a.0: UHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '003'
uhci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 3
uhci_hcd 0000:00:1a.0: detected 2 ports
uhci_hcd 0000:00:1a.0: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1a.0: Performing full reset
uhci_hcd 0000:00:1a.0: supports USB remote wakeup
uhci_hcd 0000:00:1a.0: irq 16, io base 0x0000e300
usb usb3: default language 0x0409
usb usb3: New USB device found, idVendor=1d6b, idProduct=0001
usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb3: Product: UHCI Host Controller
usb usb3: Manufacturer: Linux 2.6.30eric-rtl8169-debug uhci_hcd
usb usb3: SerialNumber: 0000:00:1a.0
usb usb3: uevent
usb usb3: usb_probe_device
usb usb3: configuration #1 chosen from 1 choice
usb usb3: adding 3-0:1.0 (config #1, interface 0)
usb 3-0:1.0: uevent
hub 3-0:1.0: usb_probe_interface
hub 3-0:1.0: usb_probe_interface - got id
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 2 ports detected
hub 3-0:1.0: standalone hub
hub 3-0:1.0: no power switching (usb 1.0)
hub 3-0:1.0: individual port over-current protection
hub 3-0:1.0: power on to power good time: 2ms
hub 3-0:1.0: local power source is good
hub 3-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
  alloc irq_desc for 21 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
uhci_hcd 0000:00:1a.1: PCI INT B -> GSI 21 (level, low) -> IRQ 21
uhci_hcd 0000:00:1a.1: setting latency timer to 64
uhci_hcd 0000:00:1a.1: UHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '004'
uhci_hcd 0000:00:1a.1: new USB bus registered, assigned bus number 4
uhci_hcd 0000:00:1a.1: detected 2 ports
uhci_hcd 0000:00:1a.1: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1a.1: Performing full reset
uhci_hcd 0000:00:1a.1: supports USB remote wakeup
uhci_hcd 0000:00:1a.1: irq 21, io base 0x0000e000
usb usb4: default language 0x0409
usb usb4: New USB device found, idVendor=1d6b, idProduct=0001
usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb4: Product: UHCI Host Controller
usb usb4: Manufacturer: Linux 2.6.30eric-rtl8169-debug uhci_hcd
usb usb4: SerialNumber: 0000:00:1a.1
usb usb4: uevent
usb usb4: usb_probe_device
usb usb4: configuration #1 chosen from 1 choice
usb usb4: adding 4-0:1.0 (config #1, interface 0)
usb 4-0:1.0: uevent
hub 4-0:1.0: usb_probe_interface
hub 4-0:1.0: usb_probe_interface - got id
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
hub 4-0:1.0: standalone hub
hub 4-0:1.0: no power switching (usb 1.0)
hub 4-0:1.0: individual port over-current protection
hub 4-0:1.0: power on to power good time: 2ms
hub 4-0:1.0: local power source is good
hub 4-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
uhci_hcd 0000:00:1a.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
uhci_hcd 0000:00:1a.2: setting latency timer to 64
uhci_hcd 0000:00:1a.2: UHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '005'
uhci_hcd 0000:00:1a.2: new USB bus registered, assigned bus number 5
uhci_hcd 0000:00:1a.2: detected 2 ports
uhci_hcd 0000:00:1a.2: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1a.2: Performing full reset
uhci_hcd 0000:00:1a.2: supports USB remote wakeup
uhci_hcd 0000:00:1a.2: irq 18, io base 0x0000e100
usb usb5: default language 0x0409
usb usb5: New USB device found, idVendor=1d6b, idProduct=0001
usb usb5: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb5: Product: UHCI Host Controller
usb usb5: Manufacturer: Linux 2.6.30eric-rtl8169-debug uhci_hcd
usb usb5: SerialNumber: 0000:00:1a.2
usb usb5: uevent
usb usb5: usb_probe_device
usb usb5: configuration #1 chosen from 1 choice
usb usb5: adding 5-0:1.0 (config #1, interface 0)
usb 5-0:1.0: uevent
hub 5-0:1.0: usb_probe_interface
hub 5-0:1.0: usb_probe_interface - got id
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
hub 5-0:1.0: standalone hub
hub 5-0:1.0: no power switching (usb 1.0)
hub 5-0:1.0: individual port over-current protection
hub 5-0:1.0: power on to power good time: 2ms
hub 5-0:1.0: local power source is good
hub 5-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
uhci_hcd 0000:00:1d.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
uhci_hcd 0000:00:1d.0: setting latency timer to 64
uhci_hcd 0000:00:1d.0: UHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '006'
uhci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 6
uhci_hcd 0000:00:1d.0: detected 2 ports
uhci_hcd 0000:00:1d.0: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1d.0: Performing full reset
uhci_hcd 0000:00:1d.0: supports USB remote wakeup
uhci_hcd 0000:00:1d.0: irq 23, io base 0x0000e400
usb usb6: default language 0x0409
usb usb6: New USB device found, idVendor=1d6b, idProduct=0001
usb usb6: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb6: Product: UHCI Host Controller
usb usb6: Manufacturer: Linux 2.6.30eric-rtl8169-debug uhci_hcd
usb usb6: SerialNumber: 0000:00:1d.0
usb usb6: uevent
usb usb6: usb_probe_device
usb usb6: configuration #1 chosen from 1 choice
usb usb6: adding 6-0:1.0 (config #1, interface 0)
usb 6-0:1.0: uevent
hub 6-0:1.0: usb_probe_interface
hub 6-0:1.0: usb_probe_interface - got id
hub 6-0:1.0: USB hub found
hub 6-0:1.0: 2 ports detected
hub 6-0:1.0: standalone hub
hub 6-0:1.0: no power switching (usb 1.0)
hub 6-0:1.0: individual port over-current protection
hub 6-0:1.0: power on to power good time: 2ms
hub 6-0:1.0: local power source is good
hub 6-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
  alloc irq_desc for 19 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
uhci_hcd 0000:00:1d.1: PCI INT B -> GSI 19 (level, low) -> IRQ 19
uhci_hcd 0000:00:1d.1: setting latency timer to 64
uhci_hcd 0000:00:1d.1: UHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '007'
uhci_hcd 0000:00:1d.1: new USB bus registered, assigned bus number 7
uhci_hcd 0000:00:1d.1: detected 2 ports
uhci_hcd 0000:00:1d.1: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1d.1: Performing full reset
uhci_hcd 0000:00:1d.1: supports USB remote wakeup
uhci_hcd 0000:00:1d.1: irq 19, io base 0x0000e500
usb usb7: default language 0x0409
usb usb7: New USB device found, idVendor=1d6b, idProduct=0001
usb usb7: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb7: Product: UHCI Host Controller
usb usb7: Manufacturer: Linux 2.6.30eric-rtl8169-debug uhci_hcd
usb usb7: SerialNumber: 0000:00:1d.1
usb usb7: uevent
usb usb7: usb_probe_device
usb usb7: configuration #1 chosen from 1 choice
usb usb7: adding 7-0:1.0 (config #1, interface 0)
usb 7-0:1.0: uevent
hub 7-0:1.0: usb_probe_interface
hub 7-0:1.0: usb_probe_interface - got id
hub 7-0:1.0: USB hub found
hub 7-0:1.0: 2 ports detected
hub 7-0:1.0: standalone hub
hub 7-0:1.0: no power switching (usb 1.0)
hub 7-0:1.0: individual port over-current protection
hub 7-0:1.0: power on to power good time: 2ms
hub 7-0:1.0: local power source is good
hub 7-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
uhci_hcd 0000:00:1d.2: PCI INT C -> GSI 18 (level, low) -> IRQ 18
uhci_hcd 0000:00:1d.2: setting latency timer to 64
uhci_hcd 0000:00:1d.2: UHCI Host Controller
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '008'
uhci_hcd 0000:00:1d.2: new USB bus registered, assigned bus number 8
uhci_hcd 0000:00:1d.2: detected 2 ports
uhci_hcd 0000:00:1d.2: uhci_check_and_reset_hc: cmd = 0x0000
uhci_hcd 0000:00:1d.2: Performing full reset
uhci_hcd 0000:00:1d.2: supports USB remote wakeup
uhci_hcd 0000:00:1d.2: irq 18, io base 0x0000e600
usb usb8: default language 0x0409
usb usb8: New USB device found, idVendor=1d6b, idProduct=0001
usb usb8: New USB device strings: Mfr=3, Product=2, SerialNumber=1
usb usb8: Product: UHCI Host Controller
usb usb8: Manufacturer: Linux 2.6.30eric-rtl8169-debug uhci_hcd
usb usb8: SerialNumber: 0000:00:1d.2
usb usb8: uevent
usb usb8: usb_probe_device
usb usb8: configuration #1 chosen from 1 choice
usb usb8: adding 8-0:1.0 (config #1, interface 0)
usb 8-0:1.0: uevent
hub 8-0:1.0: usb_probe_interface
hub 8-0:1.0: usb_probe_interface - got id
hub 8-0:1.0: USB hub found
hub 8-0:1.0: 2 ports detected
hub 8-0:1.0: standalone hub
hub 8-0:1.0: no power switching (usb 1.0)
hub 8-0:1.0: individual port over-current protection
hub 8-0:1.0: power on to power good time: 2ms
hub 8-0:1.0: local power source is good
hub 8-0:1.0: trying to enable port power on non-switchable hub
/home/ebiederm/projects/linux/linux-2.6.30-Arora/drivers/usb/core/inode.c: creating file '001'
SCSI subsystem initialized
Driver 'sd' needs updating - please use bus_type methods
libata version 3.00 loaded.
ahci 0000:00:1f.2: version 3.0
ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
  alloc irq_desc for 28 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
ahci 0000:00:1f.2: irq 28 for MSI/MSI-X
ahci: SSS flag set, parallel bus scan disabled
ahci 0000:00:1f.2: AHCI 0001.0200 32 slots 4 ports 3 Gbps 0x33 impl SATA mode
ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pmp pio slum part ems 
ahci 0000:00:1f.2: setting latency timer to 64
scsi0 : ahci
scsi1 : ahci
scsi2 : ahci
scsi3 : ahci
scsi4 : ahci
scsi5 : ahci
ata1: SATA max UDMA/133 abar m2048@0xf3286000 port 0xf3286100 irq 28
ata2: SATA max UDMA/133 abar m2048@0xf3286000 port 0xf3286180 irq 28
ata3: DUMMY
ata4: DUMMY
ata5: SATA max UDMA/133 abar m2048@0xf3286000 port 0xf3286300 irq 28
ata6: SATA max UDMA/133 abar m2048@0xf3286000 port 0xf3286380 irq 28
hub 1-0:1.0: state 7 ports 6 chg 0000 evt 0000
hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
hub 3-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 4-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 5-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 6-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 7-0:1.0: state 7 ports 2 chg 0000 evt 0000
hub 8-0:1.0: state 7 ports 2 chg 0000 evt 0000
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: HPA detected: current 488395055, native 488397168
ata1.00: ATA-7: ST3250410AS, 3.AAA, max UDMA/133
ata1.00: 488395055 sectors, multi 0: LBA48 NCQ (depth 31/32)
ata1.00: configured for UDMA/133
scsi 0:0:0:0: Direct-Access     ATA      ST3250410AS      3.AA PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 488395055 512-byte hardware sectors: (250 GB/232 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
 sda: sda1 sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
ata2: SATA link down (SStatus 0 SControl 300)
ata5: SATA link down (SStatus 0 SControl 300)
usb usb3: suspend_rh (auto-stop)
usb usb4: suspend_rh (auto-stop)
usb usb5: suspend_rh (auto-stop)
usb usb6: suspend_rh (auto-stop)
usb usb7: suspend_rh (auto-stop)
usb usb8: suspend_rh (auto-stop)
ata6: SATA link down (SStatus 0 SControl 300)
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
SELinux:  Disabled at runtime.
SELinux:  Unregistering netfilter hooks
type=1404 audit(1251019090.797:2): selinux=0 auid=4294967295 ses=4294967295
sh used greatest stack depth: 4256 bytes left
awk used greatest stack depth: 3632 bytes left
mount used greatest stack depth: 3600 bytes left
hub 1-0:1.0: hub_suspend
usb usb1: bus auto-suspend
ehci_hcd 0000:00:1a.7: suspend root hub
hub 2-0:1.0: hub_suspend
usb usb2: bus auto-suspend
ehci_hcd 0000:00:1d.7: suspend root hub
hub 3-0:1.0: hub_suspend
usb usb3: bus auto-suspend
usb usb3: suspend_rh
hub 4-0:1.0: hub_suspend
usb usb4: bus auto-suspend
usb usb4: suspend_rh
hub 5-0:1.0: hub_suspend
usb usb5: bus auto-suspend
usb usb5: suspend_rh
hub 6-0:1.0: hub_suspend
usb usb6: bus auto-suspend
usb usb6: suspend_rh
hub 7-0:1.0: hub_suspend
usb usb7: bus auto-suspend
usb usb7: suspend_rh
hub 8-0:1.0: hub_suspend
usb usb8: bus auto-suspend
usb usb8: suspend_rh
udev: starting version 141
usb usb3: uevent
usb 3-0:1.0: uevent
usb usb4: uevent
usb 4-0:1.0: uevent
usb usb5: uevent
usb 5-0:1.0: uevent
usb usb1: uevent
usb 1-0:1.0: uevent
usb usb6: uevent
usb 6-0:1.0: uevent
usb usb7: uevent
usb 7-0:1.0: uevent
usb usb8: uevent
usb 8-0:1.0: uevent
usb usb2: uevent
usb 2-0:1.0: uevent
usb usb3: uevent
usb usb5: uevent
usb usb1: uevent
usb usb7: uevent
usb usb4: uevent
usb usb2: uevent
usb usb8: uevent
usb usb6: uevent
sd 0:0:0:0: Attached scsi generic sg0 type 0
pata_jmicron 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
pata_jmicron 0000:02:00.0: setting latency timer to 64
scsi6 : pata_jmicron
scsi7 : pata_jmicron
ata7: PATA max UDMA/100 cmd 0xc000 ctl 0xc100 bmdma 0xc400 irq 16
ata8: PATA max UDMA/100 cmd 0xc200 ctl 0xc300 bmdma 0xc408 irq 16
iTCO_vendor_support: vendor-support=0
input: PC Speaker as /devices/platform/pcspkr/input/input4
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
r8169 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
r8169 0000:03:00.0: setting latency timer to 64
  alloc irq_desc for 29 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
r8169 0000:03:00.0: irq 29 for MSI/MSI-X
eth0: RTL8168b/8111b at 0xffffc2000065a000, 00:1a:4d:4e:2c:8c, XID 38000000 IRQ 29
i801_smbus 0000:00:1f.3: PCI INT C -> GSI 18 (level, low) -> IRQ 18
firewire_ohci 0000:04:07.0: PCI INT A -> GSI 23 (level, low) -> IRQ 23
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.05
iTCO_wdt: Found a ICH9 TCO device (Version=2, TCOBASE=0x0460)
iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
firewire_ohci: Added fw-ohci device 0000:04:07.0, OHCI version 1.10
parport_pc 00:09: reported by Plug and Play ACPI
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
ppdev: user-space parallel port driver
firewire_core: created device fw0: GUID 00c650fd00001a4d, S400
  alloc irq_desc for 22 on cpu 0 node 0
  alloc kstat_irqs on cpu 0 node 0
HDA Intel 0000:00:1b.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
HDA Intel 0000:00:1b.0: setting latency timer to 64
readahead used greatest stack depth: 3440 bytes left
end_request: I/O error, dev fd0, sector 0
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
802.1Q VLAN Support v1.8 Ben Greear <greearb@candelatech.com>
All bugs added by David S. Miller <davem@redhat.com>
tun: Universal TUN/TAP device driver, 1.6
tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
nf_conntrack.acct=1 kernel paramater, acct=1 nf_conntrack module option or
sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
ip_tables: (C) 2000-2006 Netfilter Core Team
fuse init (API version 7.11)
has_svm: not amd
kvm: no hardware support
device-mapper: uevent: version 1.0.3
device-mapper: ioctl: 4.14.0-ioctl (2008-04-23) initialised: dm-devel@redhat.com
device-mapper: multipath: version 1.0.5 loaded
EXT3 FS on sda1, internal journal
EXT4-fs: barriers enabled
kjournald2 starting: pid 1011, dev sda3:8, commit interval 5 seconds
EXT4 FS on sda3, internal journal on sda3:8
EXT4-fs: delayed allocation enabled
EXT4-fs: file extents enabled
EXT4-fs: mballoc enabled
EXT4-fs: mounted filesystem sda3 with ordered data mode
Adding 1951888k swap on /dev/sda2.  Priority:-1 extents:1 across:1951888k 
platform microcode: firmware: requesting intel-ucode/06-0f-0b
platform microcode: firmware: requesting intel-ucode/06-0f-0b
platform microcode: firmware: requesting intel-ucode/06-0f-0b
platform microcode: firmware: requesting intel-ucode/06-0f-0b
Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
microcode: CPU0 updated from revision 0xb3 to 0xb6, date = 2007-07-13 
microcode: CPU1 updated from revision 0xb3 to 0xb6, date = 2007-07-13 
microcode: CPU2 updated from revision 0xb3 to 0xb6, date = 2007-07-13 
microcode: CPU3 updated from revision 0xb3 to 0xb6, date = 2007-07-13 
Microcode Update Driver: v2.00 removed.
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
r8169: eth0: link up
r8169: eth0: link up
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
r8169 screaming irq status 00000084 mask 0000ffff event 0000803f napi 0000001d
usb usb3: uevent
usb 3-0:1.0: uevent
usb usb4: uevent
usb 4-0:1.0: uevent
usb usb5: uevent
usb 5-0:1.0: uevent
usb usb1: uevent
usb 1-0:1.0: uevent
usb usb6: uevent
usb 6-0:1.0: uevent
usb usb7: uevent
usb 7-0:1.0: uevent
usb usb8: uevent
usb 8-0:1.0: uevent
usb usb2: uevent
usb 2-0:1.0: uevent
Bluetooth: Core ver 2.15
NET: Registered protocol family 31
Bluetooth: HCI device and connection manager initialized
Bluetooth: HCI socket layer initialized
Bluetooth: L2CAP ver 2.13
Bluetooth: L2CAP socket layer initialized
Bluetooth: SCO (Voice Link) ver 0.6
Bluetooth: SCO socket layer initialized
Bluetooth: BNEP (Ethernet Emulation) ver 1.3
Bluetooth: BNEP filters: protocol multicast
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
Bridge firewalling registered
------------[ cut here ]------------
WARNING: at /home/ebiederm/projects/linux/linux-2.6.30-Arora/net/sched/sch_generic.c:226 dev_watchdog+0xd9/0x14a()
Hardware name: G33M-S2
NETDEV WATCHDOG: eth0 (r8169): transmit timed out
Modules linked in: bridge nfsd lockd stp bnep nfs_acl sco l2cap auth_rpcgss bluetooth exportfs sunrpc ipv6 cpufreq_ondemand acpi_cpufreq freq_table ext4 jbd2 crc16 dm_mirror dm_region_hash dm_log dm_multipath dm_mod uinput kvm_intel kvm fuse xt_multiport iptable_nat ip_tables nf_nat x_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 tun 8021q snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_pcm ata_generic ppdev snd_timer parport_pc pata_acpi snd iTCO_wdt firewire_ohci r8169 i2c_i801 parport firewire_core floppy soundcore pcspkr iTCO_vendor_support pata_jmicron crc_itu_t mii snd_page_alloc sg ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd i915 drm i2c_algo_bit video output i2c_core [last unloaded: microcode]
Pid: 0, comm: swapper Not tainted 2.6.30eric-rtl8169-debug #36
Call Trace:
 <IRQ>  [<ffffffff8104583b>] warn_slowpath_common+0x88/0xb6
 [<ffffffff812ee53c>] ? dev_watchdog+0x0/0x14a
 [<ffffffff810458ec>] warn_slowpath_fmt+0x4b/0x61
 [<ffffffff812d607e>] ? netdev_drivername+0x52/0x70
 [<ffffffff812ee615>] dev_watchdog+0xd9/0x14a
 [<ffffffff810596de>] ? __queue_work+0x44/0x61
 [<ffffffff810128ad>] ? read_tsc+0x9/0x1b
 [<ffffffff810665cd>] ? clocksource_read+0x1d/0x33
 [<ffffffff81050922>] run_timer_softirq+0x169/0x1f5
 [<ffffffff81020306>] ? apic_write+0x24/0x3a
 [<ffffffff81069ce2>] ? clockevents_program_event+0x88/0xa5
 [<ffffffff8104ba7b>] __do_softirq+0xc3/0x1b0
 [<ffffffff8100cfcc>] call_softirq+0x1c/0x28
 [<ffffffff8100e721>] do_softirq+0x51/0xae
 [<ffffffff8104b6d2>] irq_exit+0x52/0xa3
 [<ffffffff81020f11>] smp_apic_timer_interrupt+0x94/0xb8
 [<ffffffff8100c9d3>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff81014096>] ? mwait_idle+0x9b/0xcc
 [<ffffffff81014038>] ? mwait_idle+0x3d/0xcc
 [<ffffffff8100ae08>] ? enter_idle+0x33/0x49
 [<ffffffff8100aece>] ? cpu_idle+0xb0/0xf3
 [<ffffffff813613c5>] ? rest_init+0x79/0x8f
 [<ffffffff81577f45>] ? start_kernel+0x3d7/0x3f8
 [<ffffffff815772b4>] ? x86_64_start_reservations+0xbb/0xd6
 [<ffffffff815773d4>] ? x86_64_start_kernel+0x105/0x128
---[ end trace 2ef0a88e80889562 ]---
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
lo: Disabled Privacy Extensions
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
end_request: I/O error, dev fd0, sector 0
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
end_request: I/O error, dev fd0, sector 0
Buffer I/O error on device fd0, logical block 0
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up
r8169: eth0: link up

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-22 20:43                                       ` David Dillow
  2009-08-23 17:17                                         ` Jarek Poplawski
  2009-08-24  2:37                                         ` Eric W. Biederman
@ 2009-08-25  0:51                                         ` Eric W. Biederman
  2009-08-25  2:59                                           ` David Dillow
  2 siblings, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-25  0:51 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

> On Sat, 2009-08-22 at 05:07 -0700, Eric W. Biederman wrote:
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>> 
>> > David Dillow <dave@thedillows.org> writes:
>> >
>> >>
>> >> Re-looking at the code, I'd guess that some IRQ status line is getting
>> >> stuck high, but I don't see why -- we should acknowledge all outstanding
>> >> interrupts each time through the loop, whether we care about them or
>> >> not.
>> >>
>> >> Could reproduce a problem with the following patch applied, and send the
>> >> full dmesg, please?
>> >
>> > Here is what I get.
>> >
>> > r8169 screaming irq status 00000085 mask 0000ffff event 0000803f napi 0000001d
>> 
>> And now that the machine has come out of it, that was followed by:
>> Looks like the soft lockup did not manage to trigger in this case.
>
> I need some more context, please. What is the network load through this
> NIC when you have the issues? Light, heavy? Can you give me more details
> about the machine? A full dmesg from boot until this happens would help
> quite a bit. At a minimum it would help answer which version of the chip
> we're dealing with and what the machine it is in looks like.
>
> Can you reproduce this with pci=nomsi? I'm assuming it the chip running
> in MSI mode.
>
> Also, can you reproduce it when booting UP (or maxcpus=1)? I'm thinking
> about a race between rtl8169_interrupt() and rtl8169_poll(), but it
> isn't jumping out at me.
>
> Also, I'm having connectivity troubles this weekend, so my response may
> be spotty. :(

When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
there is some bidirectional communication going on.

Do we really want to loop when those bits are set?

Perhaps we want to remove them from rtl_cfg_infos for the part?

Eric


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25  0:51                                         ` Eric W. Biederman
@ 2009-08-25  2:59                                           ` David Dillow
  2009-08-25 20:22                                             ` Eric W. Biederman
  2009-08-25 21:37                                             ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
  0 siblings, 2 replies; 77+ messages in thread
From: David Dillow @ 2009-08-25  2:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

On Mon, 2009-08-24 at 17:51 -0700, Eric W. Biederman wrote:
> When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
> there is some bidirectional communication going on.
> 
> Do we really want to loop when those bits are set?

Maybe not when only those bits are set, but I worry that we would trade
one race for another where we stop getting interrupts from the card.

> Perhaps we want to remove them from rtl_cfg_infos for the part?

Then you'd never get an interrupt for them in the first place, I think.

I'm not real happy with the interrupt handling in the driver; it makes a
certain amount of sense to split the MSI vs non-MSI interrupt cases out.
It also means another pass through re-auditing things against the vendor
driver. That's more work than I'm able to commit to at the moment.

I've not been able to reproduce it locally on my r8169d, running for ~30
minutes straight at full speed. I've not tried running it in UP, though.
Perhaps I can do that tomorrow.

Here's a possible patch to mask the NAPI events while we're running in
NAPI mode. I'm not sure it is going to help, since the intr_mask was
0xffff when you hit the loop guard, so I left it in for now.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b82780d..12755b7 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	void __iomem *ioaddr = tp->mmio_addr;
 	int handled = 0;
 	int status;
+	int count = 0;
 
 	/* loop handling interrupts until we have no new ones or
 	 * we hit a invalid/hotplug case.
@@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	while (status && status != 0xffff) {
 		handled = 1;
 
+		if (count++ > 100) {
+			printk_once("r8169 screaming irq status %08x "
+				"mask %08x event %08x napi %08x\n",
+				status, tp->intr_mask, tp->intr_event,
+				tp->napi_event);
+			break;
+		}
+
+
 		/* Handle all of the error cases first. These will reset
 		 * the chip, so just exit the loop.
 		 */
@@ -3613,6 +3623,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		RTL_W16(IntrStatus,
 			(status & RxFIFOOver) ? (status | RxOverflow) : status);
 		status = RTL_R16(IntrStatus);
+		status &= tp->intr_mask;
 	}
 
 	return IRQ_RETVAL(handled);

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25  2:59                                           ` David Dillow
@ 2009-08-25 20:22                                             ` Eric W. Biederman
  2009-08-25 20:40                                               ` David Dillow
  2009-08-25 21:37                                             ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
  1 sibling, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-25 20:22 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

> On Mon, 2009-08-24 at 17:51 -0700, Eric W. Biederman wrote:
>> When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
>> there is some bidirectional communication going on.
>> 
>> Do we really want to loop when those bits are set?
>
> Maybe not when only those bits are set, but I worry that we would trade
> one race for another where we stop getting interrupts from the card.
>
>> Perhaps we want to remove them from rtl_cfg_infos for the part?
>
> Then you'd never get an interrupt for them in the first place, I think.
>
> I'm not real happy with the interrupt handling in the driver; it makes a
> certain amount of sense to split the MSI vs non-MSI interrupt cases out.
> It also means another pass through re-auditing things against the vendor
> driver. That's more work than I'm able to commit to at the moment.
>
> I've not been able to reproduce it locally on my r8169d, running for ~30
> minutes straight at full speed. I've not tried running it in UP, though.
> Perhaps I can do that tomorrow.
>
> Here's a possible patch to mask the NAPI events while we're running in
> NAPI mode. I'm not sure it is going to help, since the intr_mask was
> 0xffff when you hit the loop guard, so I left it in for now.

Interesting.

If I understand this correctly the situation is that we have on the
chip there is correct logic for a level triggered interrupt and that
the msi logic sits on it and sends an event when the interrupt signal
goes high, but when we acknowledge some bits but not all it does not
send another interrupt.

Baring playing games with what version of the card has working logic
and which does not we seem to have to simple choices (if we don't want
to loop possibly forever).
- Don't use the msi logic on this card.
- Move all of the logic into rtl8169_poll and only come out of NAPI
  mode when we have caught up with all of the interrupt work.

Is that how you understand the hardware issue you are trying to work
around?

Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25 20:22                                             ` Eric W. Biederman
@ 2009-08-25 20:40                                               ` David Dillow
  2009-08-25 21:24                                                 ` Eric W. Biederman
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-08-25 20:40 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

On Tue, 2009-08-25 at 13:22 -0700, Eric W. Biederman wrote:
> David Dillow <dave@thedillows.org> writes:
> > I'm not real happy with the interrupt handling in the driver; it makes a
> > certain amount of sense to split the MSI vs non-MSI interrupt cases out.
> > It also means another pass through re-auditing things against the vendor
> > driver. That's more work than I'm able to commit to at the moment.
> >
> > I've not been able to reproduce it locally on my r8169d, running for ~30
> > minutes straight at full speed. I've not tried running it in UP, though.
> > Perhaps I can do that tomorrow.
> >
> > Here's a possible patch to mask the NAPI events while we're running in
> > NAPI mode. I'm not sure it is going to help, since the intr_mask was
> > 0xffff when you hit the loop guard, so I left it in for now.
> 
> Interesting.
> 
> If I understand this correctly the situation is that we have on the
> chip there is correct logic for a level triggered interrupt and that
> the msi logic sits on it and sends an event when the interrupt signal
> goes high, but when we acknowledge some bits but not all it does not
> send another interrupt.

Correct, we have to acknowledge all current outstanding event sources
before we get another MSI interrupt. It looks like the MSI interrupt is
triggered on the edge transition of a logical OR of all irq sources.

> Baring playing games with what version of the card has working logic
> and which does not we seem to have to simple choices (if we don't want
> to loop possibly forever).
> - Don't use the msi logic on this card.
> - Move all of the logic into rtl8169_poll and only come out of NAPI
>   mode when we have caught up with all of the interrupt work.
> 
> Is that how you understand the hardware issue you are trying to work
> around?

That's how I understood the issue I was working around with the
problematic patch, but I thought I had covered both issues fairly well
without having to split the handling any further -- we ACK all existing
sources each pass through the loop, so we'll get a new interrupt on the
unmasked events, but not on ones we've masked out for NAPI until NAPI
completes and unmasks them.

I'm curious how you managed to receive an packet between us clearing the
all current sources and reading the current source list continuously for
60+ seconds -- the loop is basically

status = get IRQ events from chip
while (status) {
	/* process events, start NAPI if needed */
	clear current events from chip
	status = get IRQ events from chip
}

That seems like a very small race window to consistently hit --
especially for long enough to trigger soft lockups.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25 20:40                                               ` David Dillow
@ 2009-08-25 21:24                                                 ` Eric W. Biederman
  2009-08-25 21:46                                                   ` David Dillow
  2009-08-25 22:19                                                   ` Francois Romieu
  0 siblings, 2 replies; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-25 21:24 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

> On Tue, 2009-08-25 at 13:22 -0700, Eric W. Biederman wrote:
>> David Dillow <dave@thedillows.org> writes:
>> > I'm not real happy with the interrupt handling in the driver; it makes a
>> > certain amount of sense to split the MSI vs non-MSI interrupt cases out.
>> > It also means another pass through re-auditing things against the vendor
>> > driver. That's more work than I'm able to commit to at the moment.
>> >
>> > I've not been able to reproduce it locally on my r8169d, running for ~30
>> > minutes straight at full speed. I've not tried running it in UP, though.
>> > Perhaps I can do that tomorrow.
>> >
>> > Here's a possible patch to mask the NAPI events while we're running in
>> > NAPI mode. I'm not sure it is going to help, since the intr_mask was
>> > 0xffff when you hit the loop guard, so I left it in for now.
>> 
>> Interesting.
>> 
>> If I understand this correctly the situation is that we have on the
>> chip there is correct logic for a level triggered interrupt and that
>> the msi logic sits on it and sends an event when the interrupt signal
>> goes high, but when we acknowledge some bits but not all it does not
>> send another interrupt.
>
> Correct, we have to acknowledge all current outstanding event sources
> before we get another MSI interrupt. It looks like the MSI interrupt is
> triggered on the edge transition of a logical OR of all irq sources.
>
>> Baring playing games with what version of the card has working logic
>> and which does not we seem to have to simple choices (if we don't want
>> to loop possibly forever).
>> - Don't use the msi logic on this card.
>> - Move all of the logic into rtl8169_poll and only come out of NAPI
>>   mode when we have caught up with all of the interrupt work.
>> 
>> Is that how you understand the hardware issue you are trying to work
>> around?
>
> That's how I understood the issue I was working around with the
> problematic patch, but I thought I had covered both issues fairly well
> without having to split the handling any further -- we ACK all existing
> sources each pass through the loop, so we'll get a new interrupt on the
> unmasked events, but not on ones we've masked out for NAPI until NAPI
> completes and unmasks them.


> I'm curious how you managed to receive an packet between us clearing the
> all current sources and reading the current source list continuously for
> 60+ seconds -- the loop is basically


> status = get IRQ events from chip
> while (status) {
> 	/* process events, start NAPI if needed */
> 	clear current events from chip
> 	status = get IRQ events from chip
> }
>
> That seems like a very small race window to consistently hit --
> especially for long enough to trigger soft lockups.

Interesting indeed.  When I hit the guard we had popped out of NAPI
mode while we were in the loop.  The only way to do that is if
poll and interrupt were running on different cpus.

I am a bit curious about TxDescUnavail.  Perhaps we had a temporary
memory shortage and that is what was screaming?  I don't think we do
anything at all with that state.

Perhaps the flaw here is simply not masking TxDescUnavail while we are
in NAPI mode?

Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25  2:59                                           ` David Dillow
  2009-08-25 20:22                                             ` Eric W. Biederman
@ 2009-08-25 21:37                                             ` Eric W. Biederman
  2009-08-25 21:54                                               ` David Dillow
  1 sibling, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-25 21:37 UTC (permalink / raw)
  To: David Dillow
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

> On Mon, 2009-08-24 at 17:51 -0700, Eric W. Biederman wrote:
>> When I decode the bits in status they are TxOK, RxOK and TxDescUnavail so it looks
>> there is some bidirectional communication going on.
>> 
>> Do we really want to loop when those bits are set?
>
> Maybe not when only those bits are set, but I worry that we would trade
> one race for another where we stop getting interrupts from the card.
>
>> Perhaps we want to remove them from rtl_cfg_infos for the part?
>
> Then you'd never get an interrupt for them in the first place, I think.
>
> I'm not real happy with the interrupt handling in the driver; it makes a
> certain amount of sense to split the MSI vs non-MSI interrupt cases out.
> It also means another pass through re-auditing things against the vendor
> driver. That's more work than I'm able to commit to at the moment.
>
> I've not been able to reproduce it locally on my r8169d, running for ~30
> minutes straight at full speed. I've not tried running it in UP, though.
> Perhaps I can do that tomorrow.
>
> Here's a possible patch to mask the NAPI events while we're running in
> NAPI mode. I'm not sure it is going to help, since the intr_mask was
> 0xffff when you hit the loop guard, so I left it in for now.

Ok.  I now get what your patch was trying to do and that does look like
a reasonable test.  

I prefer:
while ((status != 0xffff) && (status & tp->intr_mask))

The presence of TxDescUnavail suggests as is usually the case
that the interrupt status bits are independent of which interrupts
are actually enabled to fire.

I will take a moment and give that a try.

I still like the idea of masking everything having poll do all
of the work and then unmasking everything.  That seems a little less
fragile to me.

Eric

> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index b82780d..12755b7 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -3556,6 +3556,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	void __iomem *ioaddr = tp->mmio_addr;
>  	int handled = 0;
>  	int status;
> +	int count = 0;
>  
>  	/* loop handling interrupts until we have no new ones or
>  	 * we hit a invalid/hotplug case.
> @@ -3564,6 +3565,15 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  	while (status && status != 0xffff) {
>  		handled = 1;
>  
> +		if (count++ > 100) {
> +			printk_once("r8169 screaming irq status %08x "
> +				"mask %08x event %08x napi %08x\n",
> +				status, tp->intr_mask, tp->intr_event,
> +				tp->napi_event);
> +			break;
> +		}
> +
> +
>  		/* Handle all of the error cases first. These will reset
>  		 * the chip, so just exit the loop.
>  		 */
> @@ -3613,6 +3623,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  		RTL_W16(IntrStatus,
>  			(status & RxFIFOOver) ? (status | RxOverflow) : status);
>  		status = RTL_R16(IntrStatus);
> +		status &= tp->intr_mask;
>  	}
>  
>  	return IRQ_RETVAL(handled);

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25 21:24                                                 ` Eric W. Biederman
@ 2009-08-25 21:46                                                   ` David Dillow
  2009-08-25 22:19                                                   ` Francois Romieu
  1 sibling, 0 replies; 77+ messages in thread
From: David Dillow @ 2009-08-25 21:46 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

On Tue, 2009-08-25 at 14:24 -0700, Eric W. Biederman wrote:
> David Dillow <dave@thedillows.org> writes:
> > I'm curious how you managed to receive an packet between us clearing the
> > all current sources and reading the current source list continuously for
> > 60+ seconds -- the loop is basically
> 
> 
> > status = get IRQ events from chip
> > while (status) {
> > 	/* process events, start NAPI if needed */
> > 	clear current events from chip
> > 	status = get IRQ events from chip
> > }
> >
> > That seems like a very small race window to consistently hit --
> > especially for long enough to trigger soft lockups.
> 
> Interesting indeed.  When I hit the guard we had popped out of NAPI
> mode while we were in the loop.  The only way to do that is if
> poll and interrupt were running on different cpus.

That is the normal case on an SMP machine, but again that race window
should be fairly small as well -- from the __napi_schedule() to the
acking of the interrupt source is only a few lines of code, most of
which is in an error case that is skipped. Granted there may be a fair
number of instructions there if debugging or tracing is on -- I've not
checked -- but even then hitting that race consistently for 60+ seconds
doesn't seem likely.

Being out of NAPI in the guard may be a red herring -- it doesn't tell
us how long you were out of NAPI when you hit it. If there's a stuck bit
somewhere, then you could have been out of NAPI after the first cycle
and we'd have no way to tell. You could add some variables to keep track
of the status and mask values, and how long ago they changed to see.

> I am a bit curious about TxDescUnavail.  Perhaps we had a temporary
> memory shortage and that is what was screaming?  I don't think we do
> anything at all with that state.

TxDescUnavail is normal -- it means the chip finished sending everything
we asked it to.

> Perhaps the flaw here is simply not masking TxDescUnavail while we are
> in NAPI mode?

No, we never enable it on the chip, and it gets masked out when we
decide if we want to go to NAPI mode -- it is not set in tp->napi_event:

	if (status & tp->intr_mask & tp->napi_event) {

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25 21:37                                             ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
@ 2009-08-25 21:54                                               ` David Dillow
  2009-08-25 23:11                                                 ` Francois Romieu
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-08-25 21:54 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Michael Riepe, Michael Buesch, Francois Romieu, Rui Santos,
	Michael Büker, linux-kernel, netdev

On Tue, 2009-08-25 at 14:37 -0700, Eric W. Biederman wrote:
> David Dillow <dave@thedillows.org> writes:
> > Here's a possible patch to mask the NAPI events while we're running in
> > NAPI mode. I'm not sure it is going to help, since the intr_mask was
> > 0xffff when you hit the loop guard, so I left it in for now.
> 
> Ok.  I now get what your patch was trying to do and that does look like
> a reasonable test.  
> 
> I prefer:
> while ((status != 0xffff) && (status & tp->intr_mask))

I had thought of going that route first, but if you have any interrupt
event sources set, you want to enter the loop at least once to clear
them, otherwise you never see another MSI interrupt.

If the masking is the way things play out, then I'd put it where I had
it and put in a comment as to why it is there.

> The presence of TxDescUnavail suggests as is usually the case
> that the interrupt status bits are independent of which interrupts
> are actually enabled to fire.

Yes, but I seem to recall the MSI's edge detection being especially
oddly done -- I did tests with various masks and using the ability to
have it generate an interrupt on user demand, and IIRC it was handled
before the mask was applied, so we really did care about the events that
were active -- but I may misremember.

> I will take a moment and give that a try.
> 
> I still like the idea of masking everything having poll do all
> of the work and then unmasking everything.  That seems a little less
> fragile to me.

I wouldn't object if you did it, but I don't have time for it right now.
And it may make Francois's life harder when he does his periodic sweep
of the vendor driver, looking for differences.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25 21:24                                                 ` Eric W. Biederman
  2009-08-25 21:46                                                   ` David Dillow
@ 2009-08-25 22:19                                                   ` Francois Romieu
  2009-08-26  3:47                                                     ` Eric W. Biederman
  2009-08-26  7:58                                                     ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
  1 sibling, 2 replies; 77+ messages in thread
From: Francois Romieu @ 2009-08-25 22:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Eric W. Biederman <ebiederm@xmission.com> :
[...]
> I am a bit curious about TxDescUnavail.  Perhaps we had a temporary
> memory shortage and that is what was screaming?  I don't think we do
> anything at all with that state.

You are not alone, the driver completely ignores this bit.

As far as I remember, the TxDescUnavail event mostly pops up when the
driver makes an excessive use of TxPoll requests.

> Perhaps the flaw here is simply not masking TxDescUnavail while we are
> in NAPI mode ?

Yes, it is worth trying.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25 21:54                                               ` David Dillow
@ 2009-08-25 23:11                                                 ` Francois Romieu
  0 siblings, 0 replies; 77+ messages in thread
From: Francois Romieu @ 2009-08-25 23:11 UTC (permalink / raw)
  To: David Dillow
  Cc: Eric W. Biederman, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B?ker, linux-kernel, netdev

David Dillow <dave@thedillows.org> :
[...]
> I wouldn't object if you did it, but I don't have time for it right now.
> And it may make Francois's life harder when he does his periodic sweep
> of the vendor driver, looking for differences.

This part of Realtek's driver(s) is not too tricky (I wonder if some code is
there by design or accident but it is a different story).

I do not feel safe with the TxDescUnavail bit : the driver does not
explicitely do anything to handle it but the behavior of the driver
can change depending on it. :o/

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts
  2009-08-25 22:19                                                   ` Francois Romieu
@ 2009-08-26  3:47                                                     ` Eric W. Biederman
  2009-08-26  7:58                                                     ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
  1 sibling, 0 replies; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-26  3:47 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Francois Romieu <romieu@fr.zoreil.com> writes:

> Eric W. Biederman <ebiederm@xmission.com> :
> [...]
>> I am a bit curious about TxDescUnavail.  Perhaps we had a temporary
>> memory shortage and that is what was screaming?  I don't think we do
>> anything at all with that state.
>
> You are not alone, the driver completely ignores this bit.
>
> As far as I remember, the TxDescUnavail event mostly pops up when the
> driver makes an excessive use of TxPoll requests.
>
>> Perhaps the flaw here is simply not masking TxDescUnavail while we are
>> in NAPI mode ?
>
> Yes, it is worth trying.

At first blush things seem better, but it isn't sufficient.
I still have a problem with RxFIFOOver set.
r8169 screaming irq status 00000040 mask 0000ffe2 event 0000803f napi 0000001d

The patch I ran is below.

Eric

---
 drivers/net/r8169.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 3b19e0c..e144bc1 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3552,6 +3552,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	void __iomem *ioaddr = tp->mmio_addr;
 	int handled = 0;
 	int status;
+	int count = 0;
 
 	/* loop handling interrupts until we have no new ones or
 	 * we hit a invalid/hotplug case.
@@ -3560,6 +3561,14 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	while (status && status != 0xffff) {
 		handled = 1;
 
+		if (count++ > 100) {                                                                                               
+			printk_once("r8169 screaming irq status %08x "
+				"mask %08x event %08x napi %08x\n",
+				status, tp->intr_mask, tp->intr_event,
+				tp->napi_event);
+			break;
+		}
+
 		/* Handle all of the error cases first. These will reset
 		 * the chip, so just exit the loop.
 		 */
@@ -3609,6 +3618,9 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		RTL_W16(IntrStatus,
 			(status & RxFIFOOver) ? (status | RxOverflow) : status);
 		status = RTL_R16(IntrStatus);
+		if (status == 0xffff)
+			break;
+		status &= tp->intr_mask;
 	}
 
 	return IRQ_RETVAL(handled);
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-25 22:19                                                   ` Francois Romieu
  2009-08-26  3:47                                                     ` Eric W. Biederman
@ 2009-08-26  7:58                                                     ` Eric W. Biederman
  2009-08-26 13:56                                                       ` David Dillow
  1 sibling, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-26  7:58 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev


As of 2.6.30 I have been observing soft lockups and netdev watchdog
timeouts caused by looping in the r8169 interrupt handler.

- Introduce a hard limit to the maximum number of times we will loop in
  the interrupt handler, and print a message when we hit it.

- Break out of the loop if after looking none of the events in status
  are events we expect to be delivered by an interrupt.

With just the hard limit and message bits of my patch in my test case
I get hit my limit of 10 loops 12 times.  After filtering by intr_mask
and intr_event I don't get any warnings.

Any complaints from those who know the driver better than I?

Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
---
 drivers/net/r8169.c |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 3b19e0c..2214945 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -61,6 +61,8 @@ static const int multicast_filter_limit = 32;
 /* MAC address length */
 #define MAC_ADDR_LEN	6
 
+#define MAX_INTR_LOOPS 10	/* Limit the msi acking loop from going crazy */
+
 #define MAX_READ_REQUEST_SHIFT	12
 #define RX_FIFO_THRESH	7	/* 7 means NO threshold, Rx buffer level before first PCI xfer. */
 #define RX_DMA_BURST	6	/* Maximum PCI burst, '6' is 1024 */
@@ -3552,6 +3554,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	void __iomem *ioaddr = tp->mmio_addr;
 	int handled = 0;
 	int status;
+	int count = 0;
 
 	/* loop handling interrupts until we have no new ones or
 	 * we hit a invalid/hotplug case.
@@ -3560,6 +3563,17 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 	while (status && status != 0xffff) {
 		handled = 1;
 
+		if (count++ >= MAX_INTR_LOOPS) {
+			if (netif_msg_intr(tp) && net_ratelimit()) {
+				printk(KERN_INFO " %s Screaming irq "
+					"status %08x mask %08x event %08x "
+					"napi %08x\n",
+					dev->name, status, tp->intr_mask,
+					tp->intr_event,	tp->napi_event);
+			}
+			break;
+		}
+
 		/* Handle all of the error cases first. These will reset
 		 * the chip, so just exit the loop.
 		 */
@@ -3609,6 +3623,13 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		RTL_W16(IntrStatus,
 			(status & RxFIFOOver) ? (status | RxOverflow) : status);
 		status = RTL_R16(IntrStatus);
+
+		/* Ignore the parts of status that reflect more than
+		 * the enabled interrupts.
+		 */
+		smp_rmb();
+		if (!(status & tp->intr_mask & tp->intr_event))
+			break;
 	}
 
 	return IRQ_RETVAL(handled);
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-26  7:58                                                     ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
@ 2009-08-26 13:56                                                       ` David Dillow
  2009-08-26 13:59                                                         ` David Dillow
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-08-26 13:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Francois Romieu, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

On Wed, 2009-08-26 at 00:58 -0700, Eric W. Biederman wrote:
> As of 2.6.30 I have been observing soft lockups and netdev watchdog
> timeouts caused by looping in the r8169 interrupt handler.
> 
> - Introduce a hard limit to the maximum number of times we will loop in
>   the interrupt handler, and print a message when we hit it.
> 
> - Break out of the loop if after looking none of the events in status
>   are events we expect to be delivered by an interrupt.
> 
> With just the hard limit and message bits of my patch in my test case
> I get hit my limit of 10 loops 12 times.  After filtering by intr_mask
> and intr_event I don't get any warnings.
> 
> Any complaints from those who know the driver better than I?

Have you tried this under a sustained heavy transmit load? I think you
may have reintroduced the problem the original patch was trying to
prevent -- wedging the MSI interrupt by not ACKing all outstanding
interrupt sources, masked or otherwise. I'll try to test this out on the
machine where I've seen the problem consistently, but it may be this
weekend.

> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 3b19e0c..2214945 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c

> +
> +		/* Ignore the parts of status that reflect more than
> +		 * the enabled interrupts.
> +		 */
> +		smp_rmb();
> +		if (!(status & tp->intr_mask & tp->intr_event))
> +			break;
>  	}

This looks like an odd construct, since we're just about to go back the
while condition up top -- why not just mask it here and let the loop
handle it naturally?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-26 13:56                                                       ` David Dillow
@ 2009-08-26 13:59                                                         ` David Dillow
  2009-08-26 20:02                                                           ` Eric W. Biederman
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-08-26 13:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Francois Romieu, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

On Wed, 2009-08-26 at 09:56 -0400, David Dillow wrote:
> On Wed, 2009-08-26 at 00:58 -0700, Eric W. Biederman wrote:
> > diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> > index 3b19e0c..2214945 100644
> > --- a/drivers/net/r8169.c
> > +++ b/drivers/net/r8169.c
> 
> > +
> > +		/* Ignore the parts of status that reflect more than
> > +		 * the enabled interrupts.
> > +		 */
> > +		smp_rmb();
> > +		if (!(status & tp->intr_mask & tp->intr_event))
> > +			break;
> >  	}
> 
> This looks like an odd construct, since we're just about to go back the
> while condition up top -- why not just mask it here and let the loop
> handle it naturally?

Never mind, I see what you are doing -- avoiding a false loop if we get
status == 0xffff. I still don't like the aesthetics of it, but it makes
sense, and I'll blame it on the card. :)

I should really get some caffeine before posting...  


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-26 13:59                                                         ` David Dillow
@ 2009-08-26 20:02                                                           ` Eric W. Biederman
  2009-08-26 21:30                                                             ` Francois Romieu
  0 siblings, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-26 20:02 UTC (permalink / raw)
  To: David Dillow
  Cc: Francois Romieu, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

David Dillow <dave@thedillows.org> writes:

> On Wed, 2009-08-26 at 09:56 -0400, David Dillow wrote:
>> On Wed, 2009-08-26 at 00:58 -0700, Eric W. Biederman wrote:
>> > diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
>> > index 3b19e0c..2214945 100644
>> > --- a/drivers/net/r8169.c
>> > +++ b/drivers/net/r8169.c
>> 
>> > +
>> > +		/* Ignore the parts of status that reflect more than
>> > +		 * the enabled interrupts.
>> > +		 */
>> > +		smp_rmb();
>> > +		if (!(status & tp->intr_mask & tp->intr_event))
>> > +			break;
>> >  	}
>> 
>> This looks like an odd construct, since we're just about to go back the
>> while condition up top -- why not just mask it here and let the loop
>> handle it naturally?
>
> Never mind, I see what you are doing -- avoiding a false loop if we get
> status == 0xffff. I still don't like the aesthetics of it, but it makes
> sense, and I'll blame it on the card. :)
>
> I should really get some caffeine before posting...  

It is a bit weird but it also means we aren't playing silly games
with status inside the loop.  So if we go through the loop we ack
everything in status.

Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-26 20:02                                                           ` Eric W. Biederman
@ 2009-08-26 21:30                                                             ` Francois Romieu
  2009-08-26 21:40                                                               ` Eric W. Biederman
  0 siblings, 1 reply; 77+ messages in thread
From: Francois Romieu @ 2009-08-26 21:30 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Eric W. Biederman <ebiederm@xmission.com> :
[...]
> It is a bit weird but it also means we aren't playing silly games
> with status inside the loop.  So if we go through the loop we ack
> everything in status.

I fear we have some longstanding problem anyway :

1. quiescent state
2. packets are received
3. rtl8169_interrupt schedules napi, clears IntrStatus and exits
4. packets are received and some non-napi event happens
5. rtl8169_interrupt wakes up, reads IntrStatus and goes on...
6. rtl8169_poll wakes up, processes Rx and Tx napi events and goes on...
7. tp->intr_mask still equals ~tp->napi_event : rtl8169_interrupt
   handler does not even try to schedule napi.
8. more packets are received
9. rtl8169_interrupt clears IntrStatus
a. rtl8169_poll reenables napi scheduling, updates IntrMask and exits
b. rtl8169_interrupt reads a perfectly clean IntrStatus and exits

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-26 21:30                                                             ` Francois Romieu
@ 2009-08-26 21:40                                                               ` Eric W. Biederman
  2009-08-27  5:24                                                                 ` Francois Romieu
  0 siblings, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-26 21:40 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Francois Romieu <romieu@fr.zoreil.com> writes:

> Eric W. Biederman <ebiederm@xmission.com> :
> [...]
>> It is a bit weird but it also means we aren't playing silly games
>> with status inside the loop.  So if we go through the loop we ack
>> everything in status.
>
> I fear we have some longstanding problem anyway :
>
> 1. quiescent state
> 2. packets are received
> 3. rtl8169_interrupt schedules napi, clears IntrStatus and exits
> 4. packets are received and some non-napi event happens
> 5. rtl8169_interrupt wakes up, reads IntrStatus and goes on...
> 6. rtl8169_poll wakes up, processes Rx and Tx napi events and goes on...
> 7. tp->intr_mask still equals ~tp->napi_event : rtl8169_interrupt
>    handler does not even try to schedule napi.
> 8. more packets are received
> 9. rtl8169_interrupt clears IntrStatus
> a. rtl8169_poll reenables napi scheduling, updates IntrMask and exits
> b. rtl8169_interrupt reads a perfectly clean IntrStatus and exits

That would not surprise me.

Right now I really don't have much more test bandwidth.  So I tried
for something simple that would address my problem without
fundamentally changing the already tested logic.  I am not seeing any
of the weird corner cases where we get confused.  The changes to fix
that problem is totally killing my ability to use the NIC, because it
loops way to much.

Perhaps we should unconditionally ack everything after changing the 
interrupt mask?  If that would prevent races it sounds like a simple fix.

Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-26 21:40                                                               ` Eric W. Biederman
@ 2009-08-27  5:24                                                                 ` Francois Romieu
  2009-08-27  5:38                                                                   ` Eric W. Biederman
  0 siblings, 1 reply; 77+ messages in thread
From: Francois Romieu @ 2009-08-27  5:24 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

[...]
> Right now I really don't have much more test bandwidth.  So I tried
> for something simple that would address my problem without
> fundamentally changing the already tested logic.

Fair enough.

[...]
> Perhaps we should unconditionally ack everything after changing the 
> interrupt mask  ? If that would prevent races it sounds like a simple
> fix.

I should be able to implement something right this evening, please hold on.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-27  5:24                                                                 ` Francois Romieu
@ 2009-08-27  5:38                                                                   ` Eric W. Biederman
  2009-08-27 23:20                                                                     ` Francois Romieu
  0 siblings, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-27  5:38 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Francois Romieu <romieu@fr.zoreil.com> writes:

> [...]
>> Right now I really don't have much more test bandwidth.  So I tried
>> for something simple that would address my problem without
>> fundamentally changing the already tested logic.
>
> Fair enough.
>
> [...]
>> Perhaps we should unconditionally ack everything after changing the 
>> interrupt mask  ? If that would prevent races it sounds like a simple
>> fix.
>
> I should be able to implement something right this evening, please hold on.

Sounds good.

Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-27  5:38                                                                   ` Eric W. Biederman
@ 2009-08-27 23:20                                                                     ` Francois Romieu
  2009-08-28  1:17                                                                       ` Eric W. Biederman
  0 siblings, 1 reply; 77+ messages in thread
From: Francois Romieu @ 2009-08-27 23:20 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Eric W. Biederman <ebiederm@xmission.com> :
[...]
> Sounds good.

Gah, I was not able to test it with a decent packet load. Patch below against
the current tree:

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b82780d..6596ef6 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -3549,21 +3549,39 @@ static int rtl8169_rx_interrupt(struct net_device *dev,
 	return count;
 }
 
+static void rtl_napi_cond_schedule(struct rtl8169_private *tp, u16 status)
+{
+	if (status & tp->napi_event) {
+		void __iomem *ioaddr = tp->mmio_addr;
+
+		RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
+		mmiowb();
+		napi_schedule(&tp->napi);
+	}
+}
+
 static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 {
 	struct net_device *dev = dev_instance;
 	struct rtl8169_private *tp = netdev_priv(dev);
 	void __iomem *ioaddr = tp->mmio_addr;
 	int handled = 0;
-	int status;
+	u16 status;
 
 	/* loop handling interrupts until we have no new ones or
 	 * we hit a invalid/hotplug case.
 	 */
 	status = RTL_R16(IntrStatus);
 	while (status && status != 0xffff) {
+		u16 acked;
+
 		handled = 1;
 
+		acked = (status & RxFIFOOver) ? (status | RxOverflow) : status;
+		acked &= ~tp->napi_event;
+
+		RTL_W16(IntrStatus, acked);
+
 		/* Handle all of the error cases first. These will reset
 		 * the chip, so just exit the loop.
 		 */
@@ -3574,7 +3592,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 
 		/* Work around for rx fifo overflow */
 		if (unlikely(status & RxFIFOOver) &&
-		(tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+		    (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
 			netif_stop_queue(dev);
 			rtl8169_tx_timeout(dev);
 			break;
@@ -3588,31 +3606,9 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
 		if (status & LinkChg)
 			rtl8169_check_link_status(dev, tp, ioaddr);
 
-		/* We need to see the lastest version of tp->intr_mask to
-		 * avoid ignoring an MSI interrupt and having to wait for
-		 * another event which may never come.
-		 */
-		smp_rmb();
-		if (status & tp->intr_mask & tp->napi_event) {
-			RTL_W16(IntrMask, tp->intr_event & ~tp->napi_event);
-			tp->intr_mask = ~tp->napi_event;
-
-			if (likely(napi_schedule_prep(&tp->napi)))
-				__napi_schedule(&tp->napi);
-			else if (netif_msg_intr(tp)) {
-				printk(KERN_INFO "%s: interrupt %04x in poll\n",
-				dev->name, status);
-			}
-		}
+		rtl_napi_cond_schedule(tp, status);
 
-		/* We only get a new MSI interrupt when all active irq
-		 * sources on the chip have been acknowledged. So, ack
-		 * everything we've seen and check if new sources have become
-		 * active to avoid blocking all interrupts from the chip.
-		 */
-		RTL_W16(IntrStatus,
-			(status & RxFIFOOver) ? (status | RxOverflow) : status);
-		status = RTL_R16(IntrStatus);
+		break;
 	}
 
 	return IRQ_RETVAL(handled);
@@ -3625,22 +3621,19 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
 	void __iomem *ioaddr = tp->mmio_addr;
 	int work_done;
 
+
+	RTL_W16(IntrStatus, tp->napi_event);
+
 	work_done = rtl8169_rx_interrupt(dev, tp, ioaddr, (u32) budget);
 	rtl8169_tx_interrupt(dev, tp, ioaddr);
 
 	if (work_done < budget) {
 		napi_complete(napi);
 
-		/* We need for force the visibility of tp->intr_mask
-		 * for other CPUs, as we can loose an MSI interrupt
-		 * and potentially wait for a retransmit timeout if we don't.
-		 * The posted write to IntrMask is safe, as it will
-		 * eventually make it to the chip and we won't loose anything
-		 * until it does.
-		 */
-		tp->intr_mask = 0xffff;
-		smp_wmb();
 		RTL_W16(IntrMask, tp->intr_event);
+		mmiowb();
+
+		rtl_napi_cond_schedule(tp, RTL_R16(IntrStatus));
 	}
 
 	return work_done;

-- 
Ueimor

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-27 23:20                                                                     ` Francois Romieu
@ 2009-08-28  1:17                                                                       ` Eric W. Biederman
  2009-08-28  1:29                                                                         ` David Dillow
  0 siblings, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-28  1:17 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Francois Romieu <romieu@fr.zoreil.com> writes:

> Eric W. Biederman <ebiederm@xmission.com> :
> [...]
>> Sounds good.
>
> Gah, I was not able to test it with a decent packet load. Patch below against
> the current tree:

This kills the loop which was my concern.

I will double check it works on my configuration.
Then we have the question of does it work for those folks who have cards
where the msi appeared to get lost.

David any chance you can test this patch?

Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-28  1:17                                                                       ` Eric W. Biederman
@ 2009-08-28  1:29                                                                         ` David Dillow
  2009-08-30 20:37                                                                           ` Francois Romieu
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-08-28  1:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Francois Romieu, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

On Thu, 2009-08-27 at 18:17 -0700, Eric W. Biederman wrote:
> Francois Romieu <romieu@fr.zoreil.com> writes:
> 
> > Eric W. Biederman <ebiederm@xmission.com> :
> > [...]
> >> Sounds good.
> >
> > Gah, I was not able to test it with a decent packet load. Patch below against
> > the current tree:
> 
> This kills the loop which was my concern.
> 
> I will double check it works on my configuration.
> Then we have the question of does it work for those folks who have cards
> where the msi appeared to get lost.
> 
> David any chance you can test this patch?

It'll be this weekend, but I can see cases where it can lock my chip up
-- they should be rare, but then I thought your case would be extremely
rare...

I think we'll need to go to a full separation between MSI and
level-triggered interrupt handlers, or at the very least mask everything
and handle it all in NAPI. That seems more viable now that I know
Francois is not terribly worried about diverging from the vendor driver
there.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-28  1:29                                                                         ` David Dillow
@ 2009-08-30 20:37                                                                           ` Francois Romieu
  2009-08-30 20:53                                                                             ` Eric W. Biederman
  0 siblings, 1 reply; 77+ messages in thread
From: Francois Romieu @ 2009-08-30 20:37 UTC (permalink / raw)
  To: David Dillow
  Cc: Eric W. Biederman, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

David Dillow <dave@thedillows.org> :
[...]
> It'll be this weekend, but I can see cases where it can lock my chip up
> -- they should be rare, but then I thought your case would be extremely
> rare...

I don't get it.

Can you elaborate the relevant cases or give some sample scenarios for
them ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-30 20:37                                                                           ` Francois Romieu
@ 2009-08-30 20:53                                                                             ` Eric W. Biederman
  2009-09-01  3:33                                                                               ` David Dillow
  0 siblings, 1 reply; 77+ messages in thread
From: Eric W. Biederman @ 2009-08-30 20:53 UTC (permalink / raw)
  To: Francois Romieu
  Cc: David Dillow, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

Francois Romieu <romieu@fr.zoreil.com> writes:

> David Dillow <dave@thedillows.org> :
> [...]
>> It'll be this weekend, but I can see cases where it can lock my chip up
>> -- they should be rare, but then I thought your case would be extremely
>> rare...
>
> I don't get it.
>
> Can you elaborate the relevant cases or give some sample scenarios for
> them ?

I think David is referring to the fact that in the NAPI loop there is
nothing that acks everything.

Eric

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-08-30 20:53                                                                             ` Eric W. Biederman
@ 2009-09-01  3:33                                                                               ` David Dillow
  2009-09-01  9:20                                                                                 ` Francois Romieu
  0 siblings, 1 reply; 77+ messages in thread
From: David Dillow @ 2009-09-01  3:33 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Francois Romieu, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

On Sun, 2009-08-30 at 13:53 -0700, Eric W. Biederman wrote:
> Francois Romieu <romieu@fr.zoreil.com> writes:
> 
> > David Dillow <dave@thedillows.org> :
> > [...]
> >> It'll be this weekend, but I can see cases where it can lock my chip up
> >> -- they should be rare, but then I thought your case would be extremely
> >> rare...
> >
> > I don't get it.
> >
> > Can you elaborate the relevant cases or give some sample scenarios for
> > them ?
> 
> I think David is referring to the fact that in the NAPI loop there is
> nothing that acks everything.

That was my concern, yes.

I've not been able to reproduce my lockups under medium testing with
Francois's patch applied, so I'm a little more comfortable with it.

At the same time, I'm worried that the timing just changed enough to
make it harder to trigger, as was the case when I did the patch IIRC.
The kernel's interrupt handling changed in a manner that made it much
easier to hit about that time. The testing I did in May pointed strongly
at us failing to ACK an interrupt source, causing the MSI generation to
stop, so I need to think about things some more.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH] r8169: Reduce looping in the interrupt handler.
  2009-09-01  3:33                                                                               ` David Dillow
@ 2009-09-01  9:20                                                                                 ` Francois Romieu
  0 siblings, 0 replies; 77+ messages in thread
From: Francois Romieu @ 2009-09-01  9:20 UTC (permalink / raw)
  To: David Dillow
  Cc: Eric W. Biederman, Michael Riepe, Michael Buesch, Rui Santos,
	Michael B??ker, linux-kernel, netdev

David Dillow <dave@thedillows.org> :
[...]
> I've not been able to reproduce my lockups under medium testing with
> Francois's patch applied, so I'm a little more comfortable with it.

Nice :o)

> At the same time, I'm worried that the timing just changed enough to
> make it harder to trigger, as was the case when I did the patch IIRC.

It is a legitimate concern.

> The kernel's interrupt handling changed in a manner that made it much
> easier to hit about that time. The testing I did in May pointed strongly
> at us failing to ACK an interrupt source, causing the MSI generation to
> stop, so I need to think about things some more.

It can be understood as us claiming to perform some work we didn't too.

In this regard, a "ack everything and perform no work loop in the irq
handler" design would require some work : it races with the - supposedly
fast, register read free - napi handler which does not check that
unprocessed events are acked.

As the current patch was provided with almost no explanation :
- the irq handler and the napi handler are allowed / assumed / expected
  to race
- the napi and irq handlers ack respectively their own events (IntrStatus).
  They do not ack their friend ones.
- everybody acks (IntrStatus) before the work is done
- napi irqs are disabled before napi is (tentatively) scheduled.
  napi irqs are only expected to be disabled most of the time the napi
  handler runs.
- the napi handler enables its irqs, tests new events and conditionaly
  schedules itself.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2009-09-01  9:20 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200903041828.49972.m.bueker@berlin.de>
     [not found] ` <20090322211159.GA23042@electric-eye.fr.zoreil.com>
     [not found]   ` <49CA1822.6050902@grupopie.com>
     [not found]     ` <200904041950.04324.mb@bu3sch.de>
     [not found]       ` <4A06D8D2.4010505@googlemail.com>
2009-05-11  0:29         ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too David Dillow
2009-05-11 20:48           ` Michael Buesch
2009-05-11 21:10             ` Michael Buesch
2009-05-11 21:29               ` David Dillow
2009-05-11 21:59                 ` Michael Buesch
2009-05-12 20:29                 ` Michael Riepe
2009-05-14  2:38                   ` David Dillow
2009-05-14 18:37                     ` Michael Riepe
2009-05-14 19:14                       ` David Dillow
2009-05-14 19:42                         ` Michael Riepe
2009-05-23  1:29                           ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts David Dillow
2009-05-23  9:24                             ` Michael Buesch
2009-05-23 14:35                               ` Michael Riepe
2009-05-23 14:44                                 ` Michael Buesch
2009-05-23 15:01                                   ` Michael Riepe
2009-05-23 16:40                                     ` Michael Buesch
2009-05-23 14:51                                 ` David Dillow
2009-05-23 16:12                                   ` Michael Riepe
2009-05-23 16:45                                     ` Michael Buesch
2009-05-23 16:46                                     ` David Dillow
2009-05-23 16:50                                       ` Michael Buesch
2009-05-23 16:53                                       ` Michael Riepe
2009-05-23 17:03                                         ` David Dillow
2009-05-24 21:15                             ` Francois Romieu
2009-05-24 22:55                               ` David Dillow
2009-05-26  5:55                             ` David Miller
2009-05-26 18:22                               ` Michael Buesch
2009-05-26 21:52                                 ` David Miller
2009-05-26 22:14                                   ` David Miller
2009-05-26 22:40                                     ` Michael Riepe
2009-05-26 22:43                                       ` David Miller
2009-05-26 23:10                                         ` David Miller
2009-05-27 16:19                                     ` Michael Buesch
2009-06-16 19:32                                     ` Rui Santos
2009-08-21 20:57                             ` Eric W. Biederman
2009-08-21 21:22                               ` Michael Riepe
2009-08-21 22:59                               ` David Dillow
2009-08-21 23:34                                 ` David Dillow
2009-08-22  0:24                                   ` Eric W. Biederman
2009-08-22 11:48                                   ` Eric W. Biederman
2009-08-22 12:07                                     ` Eric W. Biederman
2009-08-22 20:43                                       ` David Dillow
2009-08-23 17:17                                         ` Jarek Poplawski
2009-08-23 17:43                                           ` Michal Soltys
2009-08-23 17:54                                             ` Jarek Poplawski
2009-08-24  2:37                                         ` Eric W. Biederman
2009-08-25  0:51                                         ` Eric W. Biederman
2009-08-25  2:59                                           ` David Dillow
2009-08-25 20:22                                             ` Eric W. Biederman
2009-08-25 20:40                                               ` David Dillow
2009-08-25 21:24                                                 ` Eric W. Biederman
2009-08-25 21:46                                                   ` David Dillow
2009-08-25 22:19                                                   ` Francois Romieu
2009-08-26  3:47                                                     ` Eric W. Biederman
2009-08-26  7:58                                                     ` [PATCH] r8169: Reduce looping in the interrupt handler Eric W. Biederman
2009-08-26 13:56                                                       ` David Dillow
2009-08-26 13:59                                                         ` David Dillow
2009-08-26 20:02                                                           ` Eric W. Biederman
2009-08-26 21:30                                                             ` Francois Romieu
2009-08-26 21:40                                                               ` Eric W. Biederman
2009-08-27  5:24                                                                 ` Francois Romieu
2009-08-27  5:38                                                                   ` Eric W. Biederman
2009-08-27 23:20                                                                     ` Francois Romieu
2009-08-28  1:17                                                                       ` Eric W. Biederman
2009-08-28  1:29                                                                         ` David Dillow
2009-08-30 20:37                                                                           ` Francois Romieu
2009-08-30 20:53                                                                             ` Eric W. Biederman
2009-09-01  3:33                                                                               ` David Dillow
2009-09-01  9:20                                                                                 ` Francois Romieu
2009-08-25 21:37                                             ` [PATCH 2.6.30-rc4] r8169: avoid losing MSI interrupts Eric W. Biederman
2009-08-25 21:54                                               ` David Dillow
2009-08-25 23:11                                                 ` Francois Romieu
2009-05-12 11:10             ` 2.6.27.19 + 28.7: network timeouts for r8169 and 8139too Krzysztof Halasa
2009-05-12 21:45               ` Michael Riepe
2009-05-13  6:11                 ` Francois Romieu
2009-05-13  6:27                   ` Michael Riepe
2009-05-13 19:34                 ` Krzysztof Halasa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).