Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
       [not found] <bug-9990-10286@http.bugzilla.kernel.org/>
@ 2008-02-14 18:24 ` Andrew Morton
  2008-02-14 18:56   ` Andy Gospodarek
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2008-02-14 18:24 UTC (permalink / raw)
  To: Matt Carlson, Michael Chan; +Cc: bugme-daemon, netdev, ralf.hildebrandt

On Thu, 14 Feb 2008 01:59:12 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=9990
> 
>            Summary: tg3: eth0: The system may be re-ordering memory-mapped
>                     I/O cycles
>            Product: Drivers
>            Version: 2.5
>      KernelVersion: 2.6.24-git18
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Network
>         AssignedTo: jgarzik@pobox.com
>         ReportedBy: ralf.hildebrandt@charite.de
> 
> 
> Latest working kernel version: 2.6.24
> Earliest failing kernel version: 2.6.24-git18
> Distribution: Debian/testing
> Hardware Environment:
> Software Environment:
> Problem Description:
> 
> Feb 11 13:11:52 www kernel: [   12.015569] tg3: eth0: Link is up at 100 Mbps,
> full duplex.
> Feb 11 13:11:52 www kernel: [   12.015633] tg3: eth0: Flow control is on for TX
> and on for RX.
> Feb 11 13:33:44 www kernel: [ 1328.538204] tg3: eth0: The system may be
> re-ordering memory-mapped I/O cycles to the network
> device, attempting to recover. Please report the problem to the driver
> maintainer and include system chipset information.
> Feb 11 13:33:44 www kernel: [ 1328.667255] tg3: eth0: Link is down.
> Feb 11 13:33:46 www kernel: [ 1330.560734] tg3: eth0: Link is up at 100 Mbps,
> full duplex.
> Feb 11 13:33:46 www kernel: [ 1330.560734] tg3: eth0: Flow control is on for TX
> and on for RX.
> 
> After that, the machine rebooted (panic?)
> 
> Feb 11 13:35:14 www kernel: klogd 1.5.0#1.1, log source = /proc/kmsg started.
> 
> lspci -vvv info:
> 02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
> Ethernet (rev 10)
>         Subsystem: Compaq Computer Corporation NC7782 Gigabit Server Adapter
> (PCI-X, 10,100,1000-T)
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 64 (16000ns min), Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 19
>         Region 0: Memory at fdf70000 (64-bit, non-prefetchable) [size=64K]
>         [virtual] Expansion ROM at 88140000 [disabled] [size=64K]
>         Capabilities: [40] PCI-X non-bridge device
>                 Command: DPERE- ERO- RBC=2048 OST=1
>                 Status: Dev=02:02.0 64bit+ 133MHz+ SCD- USC- DC=simple
> DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
>         Capabilities: [48] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
>         Capabilities: [50] Vital Product Data <?>
>         Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3
> Enable-
>                 Address: fd7ffd6fdf7deeb8  Data: bdfd
>         Kernel driver in use: tg3
>         Kernel modules: tg3
> 
> 02:02.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
> Ethernet (rev 10)
>         Subsystem: Compaq Computer Corporation NC7782 Gigabit Server Adapter
> (PCI-X, 10,100,1000-T)
>         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 64 (16000ns min), Cache Line Size: 64 bytes
>         Interrupt: pin B routed to IRQ 20
>         Region 0: Memory at fdf60000 (64-bit, non-prefetchable) [size=64K]
>         Capabilities: [40] PCI-X non-bridge device
>                 Command: DPERE- ERO+ RBC=512 OST=1
>                 Status: Dev=02:02.1 64bit+ 133MHz+ SCD- USC- DC=simple
> DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
>         Capabilities: [48] Power Management version 2
>                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot+,D3cold+)
>                 Status: D0 PME-Enable- DSel=0 DScale=1 PME-
>         Capabilities: [50] Vital Product Data <?>
>         Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3
> Enable-
>                 Address: f73feeefffffe7f8  Data: 9bcd
>         Kernel driver in use: tg3
>         Kernel modules: tg3
> 
> 
> Steps to reproduce:
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
  2008-02-14 18:24 ` [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles Andrew Morton
@ 2008-02-14 18:56   ` Andy Gospodarek
  2008-02-14 21:25     ` Michael Chan
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Gospodarek @ 2008-02-14 18:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matt Carlson, Michael Chan, bugme-daemon, netdev,
	ralf.hildebrandt

On Thu, Feb 14, 2008 at 10:24:25AM -0800, Andrew Morton wrote:
> On Thu, 14 Feb 2008 01:59:12 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:
> 
> > http://bugzilla.kernel.org/show_bug.cgi?id=9990
> > 
> >            Summary: tg3: eth0: The system may be re-ordering memory-mapped
> >                     I/O cycles
> >            Product: Drivers
> >            Version: 2.5
> >      KernelVersion: 2.6.24-git18
> >           Platform: All
> >         OS/Version: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Network
> >         AssignedTo: jgarzik@pobox.com
> >         ReportedBy: ralf.hildebrandt@charite.de
> > 
> > 
> > Latest working kernel version: 2.6.24
> > Earliest failing kernel version: 2.6.24-git18
> > Distribution: Debian/testing
> > Hardware Environment:
> > Software Environment:
> > Problem Description:
> > 
> > Feb 11 13:11:52 www kernel: [   12.015569] tg3: eth0: Link is up at 100 Mbps,
> > full duplex.
> > Feb 11 13:11:52 www kernel: [   12.015633] tg3: eth0: Flow control is on for TX
> > and on for RX.
> > Feb 11 13:33:44 www kernel: [ 1328.538204] tg3: eth0: The system may be
> > re-ordering memory-mapped I/O cycles to the network
> > device, attempting to recover. Please report the problem to the driver
> > maintainer and include system chipset information.
> > Feb 11 13:33:44 www kernel: [ 1328.667255] tg3: eth0: Link is down.
> > Feb 11 13:33:46 www kernel: [ 1330.560734] tg3: eth0: Link is up at 100 Mbps,
> > full duplex.
> > Feb 11 13:33:46 www kernel: [ 1330.560734] tg3: eth0: Flow control is on for TX
> > and on for RX.
> > 
> > After that, the machine rebooted (panic?)
> > 
> > Feb 11 13:35:14 www kernel: klogd 1.5.0#1.1, log source = /proc/kmsg started.
> > 
> > lspci -vvv info:
> > 02:02.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
> > Ethernet (rev 10)
> >         Subsystem: Compaq Computer Corporation NC7782 Gigabit Server Adapter
> > (PCI-X, 10,100,1000-T)
> >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> > Stepping- SERR+ FastB2B- DisINTx-
> >         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 64 (16000ns min), Cache Line Size: 64 bytes
> >         Interrupt: pin A routed to IRQ 19
> >         Region 0: Memory at fdf70000 (64-bit, non-prefetchable) [size=64K]
> >         [virtual] Expansion ROM at 88140000 [disabled] [size=64K]
> >         Capabilities: [40] PCI-X non-bridge device
> >                 Command: DPERE- ERO- RBC=2048 OST=1
> >                 Status: Dev=02:02.0 64bit+ 133MHz+ SCD- USC- DC=simple
> > DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
> >         Capabilities: [48] Power Management version 2
> >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0-,D1-,D2-,D3hot+,D3cold+)
> >                 Status: D0 PME-Enable+ DSel=0 DScale=1 PME-
> >         Capabilities: [50] Vital Product Data <?>
> >         Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3
> > Enable-
> >                 Address: fd7ffd6fdf7deeb8  Data: bdfd
> >         Kernel driver in use: tg3
> >         Kernel modules: tg3
> > 
> > 02:02.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704 Gigabit
> > Ethernet (rev 10)
> >         Subsystem: Compaq Computer Corporation NC7782 Gigabit Server Adapter
> > (PCI-X, 10,100,1000-T)
> >         Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+
> > Stepping- SERR+ FastB2B- DisINTx-
> >         Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 64 (16000ns min), Cache Line Size: 64 bytes
> >         Interrupt: pin B routed to IRQ 20
> >         Region 0: Memory at fdf60000 (64-bit, non-prefetchable) [size=64K]
> >         Capabilities: [40] PCI-X non-bridge device
> >                 Command: DPERE- ERO+ RBC=512 OST=1
> >                 Status: Dev=02:02.1 64bit+ 133MHz+ SCD- USC- DC=simple
> > DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
> >         Capabilities: [48] Power Management version 2
> >                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA
> > PME(D0-,D1-,D2-,D3hot+,D3cold+)
> >                 Status: D0 PME-Enable- DSel=0 DScale=1 PME-
> >         Capabilities: [50] Vital Product Data <?>
> >         Capabilities: [58] Message Signalled Interrupts: Mask- 64bit+ Queue=0/3
> > Enable-
> >                 Address: f73feeefffffe7f8  Data: 9bcd
> >         Kernel driver in use: tg3
> >         Kernel modules: tg3
> > 
> > 
> > Steps to reproduce:
> > 
> > 

That should be a simple matter of adding the right pci-ids to
tg3_get_invariants -- hopefully Ralf will respond and we can get that
knocked out quickly.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
  2008-02-14 18:56   ` Andy Gospodarek
@ 2008-02-14 21:25     ` Michael Chan
  2008-02-14 22:12       ` Andy Gospodarek
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Chan @ 2008-02-14 21:25 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Andrew Morton, Matt Carlson, bugme-daemon, netdev,
	ralf.hildebrandt

On Thu, 2008-02-14 at 13:56 -0500, Andy Gospodarek wrote:
> On Thu, Feb 14, 2008 at 10:24:25AM -0800, Andrew Morton wrote:
> > On Thu, 14 Feb 2008 01:59:12 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:
> > 
> > > http://bugzilla.kernel.org/show_bug.cgi?id=9990
> > > 
> > >            Summary: tg3: eth0: The system may be re-ordering memory-mapped
> > >                     I/O cycles
> > >            Product: Drivers
> > >            Version: 2.5
> > >      KernelVersion: 2.6.24-git18
> > >           Platform: All
> > >         OS/Version: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Network
> > >         AssignedTo: jgarzik@pobox.com
> > >         ReportedBy: ralf.hildebrandt@charite.de
> > > 
> > > 
> 
> That should be a simple matter of adding the right pci-ids to
> tg3_get_invariants -- hopefully Ralf will respond and we can get that
> knocked out quickly.
> 
> 

It doesn't look like it was re-ordered IO.  If it was, it should have
self-recovered without hitting the BUG().

One possibility is that the nr_frags in the SKB got corrupted before the
TX SKB was freed.  The driver relies on the nr_frags in the SKB to find
the packet boundaries in the TX ring.  If it cannot find the packet
boundaries, it will exhibit the same symptom as re-ordered IO, only that
it cannot be self-recovered.

Ralf, please try this debug patch with the same traffic condition you
ran before.  This patch stores the nr_frags when transmitting an SKB.
During tx completion, it will compare the stored nr_frags with the one
in the SKB and will print out something in dmesg if they don't match.

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index db606b6..73f1ddd 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -3324,12 +3324,20 @@ static void tg3_tx(struct tg3 *tp)
 		struct tx_ring_info *ri = &tp->tx_buffers[sw_idx];
 		struct sk_buff *skb = ri->skb;
 		int i, tx_bug = 0;
+		unsigned short nr_frags = ri->nr_frags;
 
 		if (unlikely(skb == NULL)) {
 			tg3_tx_recover(tp);
 			return;
 		}
 
+		if (nr_frags != skb_shinfo(skb)->nr_frags) {
+			printk(KERN_ALERT "tg3: %s: Tx skb->nr_frags corrupted "
+				"before skb is freed. Expected nr_frags %d, "
+				"corrupted nr_frags %d\n", tp->dev->name,
+				nr_frags, skb_shinfo(skb)->nr_frags);
+		}
+
 		pci_unmap_single(tp->pdev,
 				 pci_unmap_addr(ri, mapping),
 				 skb_headlen(skb),
@@ -3339,7 +3347,7 @@ static void tg3_tx(struct tg3 *tp)
 
 		sw_idx = NEXT_TX(sw_idx);
 
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		for (i = 0; i < nr_frags; i++) {
 			ri = &tp->tx_buffers[sw_idx];
 			if (unlikely(ri->skb != NULL || sw_idx == hw_idx))
 				tx_bug = 1;
@@ -4105,6 +4113,7 @@ static int tigon3_dma_hwbug_workaround(struct tg3 *tp, struct sk_buff *skb,
 				 len, PCI_DMA_TODEVICE);
 		if (i == 0) {
 			tp->tx_buffers[entry].skb = new_skb;
+			tp->tx_buffers[entry].nr_frags = 0;
 			pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, new_addr);
 		} else {
 			tp->tx_buffers[entry].skb = NULL;
@@ -4211,6 +4220,7 @@ static int tg3_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE);
 
 	tp->tx_buffers[entry].skb = skb;
+	tp->tx_buffers[entry].nr_frags = skb_shinfo(skb)->nr_frags;
 	pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping);
 
 	tg3_set_txd(tp, entry, mapping, len, base_flags,
@@ -4388,6 +4398,7 @@ static int tg3_start_xmit_dma_bug(struct sk_buff *skb, struct net_device *dev)
 	mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE);
 
 	tp->tx_buffers[entry].skb = skb;
+	tp->tx_buffers[entry].nr_frags = skb_shinfo(skb)->nr_frags;
 	pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping);
 
 	would_hit_hwbug = 0;
diff --git a/drivers/net/tg3.h b/drivers/net/tg3.h
index 3938eb3..d4a3aca 100644
--- a/drivers/net/tg3.h
+++ b/drivers/net/tg3.h
@@ -2098,6 +2098,7 @@ struct tx_ring_info {
 	struct sk_buff			*skb;
 	DECLARE_PCI_UNMAP_ADDR(mapping)
 	u32				prev_vlan_tag;
+	unsigned short			nr_frags;
 };
 
 struct tg3_config_info {





^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
  2008-02-14 21:25     ` Michael Chan
@ 2008-02-14 22:12       ` Andy Gospodarek
  2008-02-14 22:48         ` Michael Chan
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Gospodarek @ 2008-02-14 22:12 UTC (permalink / raw)
  To: Michael Chan
  Cc: Andy Gospodarek, Andrew Morton, Matt Carlson, bugme-daemon,
	netdev, ralf.hildebrandt

On Thu, Feb 14, 2008 at 01:25:27PM -0800, Michael Chan wrote:
> On Thu, 2008-02-14 at 13:56 -0500, Andy Gospodarek wrote:
> > On Thu, Feb 14, 2008 at 10:24:25AM -0800, Andrew Morton wrote:
> > > On Thu, 14 Feb 2008 01:59:12 -0800 (PST) bugme-daemon@bugzilla.kernel.org wrote:
> > > 
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=9990
> > > > 
> > > >            Summary: tg3: eth0: The system may be re-ordering memory-mapped
> > > >                     I/O cycles
> > > >            Product: Drivers
> > > >            Version: 2.5
> > > >      KernelVersion: 2.6.24-git18
> > > >           Platform: All
> > > >         OS/Version: Linux
> > > >               Tree: Mainline
> > > >             Status: NEW
> > > >           Severity: normal
> > > >           Priority: P1
> > > >          Component: Network
> > > >         AssignedTo: jgarzik@pobox.com
> > > >         ReportedBy: ralf.hildebrandt@charite.de
> > > > 
> > > > 
> > 
> > That should be a simple matter of adding the right pci-ids to
> > tg3_get_invariants -- hopefully Ralf will respond and we can get that
> > knocked out quickly.
> > 
> > 
> 
> It doesn't look like it was re-ordered IO.  If it was, it should have
> self-recovered without hitting the BUG().
> 

Good catch, Michael!  I missed that it paniced since I expect to see
some sort of backtrace when that happens.  We should try and get that
bridge added to the list though, to avoid repeated complaints that there
is a tg3 bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
  2008-02-14 22:12       ` Andy Gospodarek
@ 2008-02-14 22:48         ` Michael Chan
  2008-02-14 23:21           ` Andy Gospodarek
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Chan @ 2008-02-14 22:48 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Andrew Morton, Matt Carlson, bugme-daemon, netdev,
	ralf.hildebrandt

On Thu, 2008-02-14 at 17:12 -0500, Andy Gospodarek wrote:
> On Thu, Feb 14, 2008 at 01:25:27PM -0800, Michael Chan wrote:
> > On Thu, 2008-02-14 at 13:56 -0500, Andy Gospodarek wrote:
> > > That should be a simple matter of adding the right pci-ids to
> > > tg3_get_invariants -- hopefully Ralf will respond and we can get that
> > > knocked out quickly.
> > > 
> > > 
> > 
> > It doesn't look like it was re-ordered IO.  If it was, it should have
> > self-recovered without hitting the BUG().
> > 
> 
> Good catch, Michael!  I missed that it paniced since I expect to see
> some sort of backtrace when that happens.  We should try and get that
> bridge added to the list though, to avoid repeated complaints that there
> is a tg3 bug.
> 
> 

Andy, I think you still missed my point.  I don't believe this problem
was caused by the bridge or the chipset at all.  Some corruption caused
us to not find the SKB in the TX ring where it was expected.  So the
driver assumed it was the bridge re-ordering I/O and printed that
warning message and took recovery action.  The recovery action had no
effect in this case since apparently it was caused by something else and
the corruption happened again later.  This 2nd time, we hit the BUG_ON()
seeing that the recovery action did not work.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
  2008-02-14 22:48         ` Michael Chan
@ 2008-02-14 23:21           ` Andy Gospodarek
  2008-02-15  0:03             ` Michael Chan
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Gospodarek @ 2008-02-14 23:21 UTC (permalink / raw)
  To: Michael Chan
  Cc: Andrew Morton, Matt Carlson, bugme-daemon, netdev,
	ralf.hildebrandt

On Thu, Feb 14, 2008 at 02:48:09PM -0800, Michael Chan wrote:
> On Thu, 2008-02-14 at 17:12 -0500, Andy Gospodarek wrote:
> > On Thu, Feb 14, 2008 at 01:25:27PM -0800, Michael Chan wrote:
> > > On Thu, 2008-02-14 at 13:56 -0500, Andy Gospodarek wrote:
> > > > That should be a simple matter of adding the right pci-ids to
> > > > tg3_get_invariants -- hopefully Ralf will respond and we can get that
> > > > knocked out quickly.
> > > > 
> > > > 
> > > 
> > > It doesn't look like it was re-ordered IO.  If it was, it should have
> > > self-recovered without hitting the BUG().
> > > 
> > 
> > Good catch, Michael!  I missed that it paniced since I expect to see
> > some sort of backtrace when that happens.  We should try and get that
> > bridge added to the list though, to avoid repeated complaints that there
> > is a tg3 bug.
> > 
> > 
> 
> Andy, I think you still missed my point.  I don't believe this problem
> was caused by the bridge or the chipset at all.  Some corruption caused
> us to not find the SKB in the TX ring where it was expected.  So the
> driver assumed it was the bridge re-ordering I/O and printed that
> warning message and took recovery action.  The recovery action had no
> effect in this case since apparently it was caused by something else and
> the corruption happened again later.  This 2nd time, we hit the BUG_ON()
> seeing that the recovery action did not work.
> 
> 

Ah, I see.  Due to at leat a 2 second delay between the message and the
panic, I figured it would be good data to gather....



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles
  2008-02-14 23:21           ` Andy Gospodarek
@ 2008-02-15  0:03             ` Michael Chan
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Chan @ 2008-02-15  0:03 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Andrew Morton, Matt Carlson, bugme-daemon, netdev,
	ralf.hildebrandt

On Thu, 2008-02-14 at 18:21 -0500, Andy Gospodarek wrote:
> On Thu, Feb 14, 2008 at 02:48:09PM -0800, Michael Chan wrote:
> > Andy, I think you still missed my point.  I don't believe this problem
> > was caused by the bridge or the chipset at all.  Some corruption caused
> > us to not find the SKB in the TX ring where it was expected.  So the
> > driver assumed it was the bridge re-ordering I/O and printed that
> > warning message and took recovery action.  The recovery action had no
> > effect in this case since apparently it was caused by something else and
> > the corruption happened again later.  This 2nd time, we hit the BUG_ON()
> > seeing that the recovery action did not work.
> > 
> > 
> 
> Ah, I see.  Due to at leat a 2 second delay between the message and the
> panic, I figured it would be good data to gather....
> 
> 
> 
Yeah, 2 seconds for the link to come up after chip reset to recover.  It
then panicked sometime later and rebooted about 90 seconds after the
initial warning message.

It was also running at the slower 100Mbps link speed.  Tx packets stay
longer in the TX ring at this slower speed, increasing the window of
time that the nr_frags in the SKB can be corrupted.

Ralf, please try the debug patch that I sent out earlier.  Thanks.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-02-15  0:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-9990-10286@http.bugzilla.kernel.org/>
2008-02-14 18:24 ` [Bugme-new] [Bug 9990] New: tg3: eth0: The system may be re-ordering memory-mapped I/O cycles Andrew Morton
2008-02-14 18:56   ` Andy Gospodarek
2008-02-14 21:25     ` Michael Chan
2008-02-14 22:12       ` Andy Gospodarek
2008-02-14 22:48         ` Michael Chan
2008-02-14 23:21           ` Andy Gospodarek
2008-02-15  0:03             ` Michael Chan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).