netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [PATCH] fix BUG in tg3_tx
@ 2004-05-26 17:43 Michael Chan
  2004-05-26 18:47 ` David S. Miller
  2004-05-26 23:50 ` Greg Banks
  0 siblings, 2 replies; 22+ messages in thread
From: Michael Chan @ 2004-05-26 17:43 UTC (permalink / raw)
  To: Greg Banks; +Cc: David S. Miller, netdev


> On a related note, is there a good reason why the tg3 driver 
> uses the on-chip SRAM send ring by default instead of the 
> host send ring? This seems like it would dramatically 
> increase the PIO load on the chipset for some of the 
> workloads I'm interested in.
> 

I can only speak for the Broadcom bcm5700 driver. We used to use NIC
send BDs by default before zero copy transmit and TSO were implemented
in the kernel. Using only one BD per packet at that time, we found that
performance on some machines were sometimes slightly better. Especially
with logic to save some PIO when some of the fields in the BD have not
changed. The driver has now been changed to use host send BDs to perform
better with zero copy and especially TSO where you may need many BDs per
packet. I would recommend tg3 to make the switch also.

Michael

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: [PATCH] fix BUG in tg3_tx
@ 2004-05-26  1:22 Michael Chan
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Chan @ 2004-05-26  1:22 UTC (permalink / raw)
  To: Greg Banks; +Cc: David S. Miller, netdev


> 
> I believe the SGI-branded cards ship with firmware fixes 
> beyond simply changing the PCI ids.  Also, AFAIK it dates 
> from about the time of the TSO experiments.  Can you check if 
> that firmware has the issue you describe?
> 

TSO firmware is downloaded by the driver and not shipped with the card.
tg3 is using TSO firmware version 1.4.0 which is the latest for 5703 and
5704.

Michael

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: [PATCH] fix BUG in tg3_tx
@ 2004-05-25 20:04 Michael Chan
  2004-05-26  0:54 ` Greg Banks
  2004-05-26 16:04 ` Greg Banks
  0 siblings, 2 replies; 22+ messages in thread
From: Michael Chan @ 2004-05-25 20:04 UTC (permalink / raw)
  To: David S. Miller, Greg Banks; +Cc: netdev

> Greg, did you see Micahel Chan's response?  A Broadcom 
> engineer is telling us "the hardware does not ACK partial TX packets."
> 
That's right. The hw is designed to always complete tx packets on packet
boundaries and not BD boundaries. The send data completion state machine
will create 1 single dma descriptor and 1 host coalescing descriptor for
the entire packet. All of our drivers do not handle individual BD
completions and I'm not aware of any problems caused by this. Actually
we did see some partial packet completions during the early
implementions of TSO/LSO. But those were firmware issues and have been
fixed long time ago. tg3 is not using those early TSO firmware.

> I don't argue that you aren't seeing something strange, but 
> perhaps that is due to corruption occuring elsewhere, or 
> perhaps something peculiar about your system hardware 
> (perhaps the PCI controller mis-orders PCI transactions or 
> something silly like that)?
Good point. A few years ago we saw cases where there were tx completions
on BDs that had not been sent. It turned out that on that machine, the
chipset was re-ordering the posted mmio writes to the send mailbox
register from 2 CPUs. For example, CPU 1 wrote index 1 and CPU wrote
index 2 a little later. On the PCI bus, we saw memory write of 2
followed by 1. When the chip saw 2, it would send both packets. When it
later saw 1, it thought that there were 512 new tx BDs and went ahead to
send them. The only effective workaround for this chipset problem was a
read of the send mailbox after the write to flush it.

Michael

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: [PATCH] fix BUG in tg3_tx
@ 2004-05-24 17:33 Michael Chan
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Chan @ 2004-05-24 17:33 UTC (permalink / raw)
  To: David S. Miller, Greg Banks; +Cc: netdev

> David,
> 
> The producer index of completed tx packets in the status 
> block will always be at whole packet boundaries (1 + the 
> index of the completed packet's last fragment). Even if it is 
> a TSO packet, it will be at the boundaries of entire TSO packets.
> 
> Michael
> 

Minor correction - I should have said the send BD consumer index instead
of the producer index.

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: [PATCH] fix BUG in tg3_tx
@ 2004-05-24 17:26 Michael Chan
  0 siblings, 0 replies; 22+ messages in thread
From: Michael Chan @ 2004-05-24 17:26 UTC (permalink / raw)
  To: David S. Miller, Greg Banks; +Cc: netdev

David,

The producer index of completed tx packets in the status block will
always be at whole packet boundaries (1 + the index of the completed
packet's last fragment). Even if it is a TSO packet, it will be at the
boundaries of entire TSO packets.

Michael

> -----Original Message-----
> From: David S. Miller [mailto:davem@redhat.com] 
> Sent: Monday, May 24, 2004 10:07 AM
> To: Greg Banks
> Cc: netdev@oss.sgi.com; Michael Chan
> Subject: Re: [PATCH] fix BUG in tg3_tx
> 
> 
> 
> [ Michael, the discussion here is about whether the tigon3 hardware
>   ever partially ACK's completion of a multi-frag TX frame.  I
>   believe it never does, but Greg claims he can trigger such a case
>   and has proposed a patch to the tg3 driver which attempts 
> to handle that. ]
> 
> On Mon, 24 May 2004 18:04:31 +1000
> Greg Banks <gnb@sgi.com> wrote:
> 
> > On Mon, May 24, 2004 at 12:40:45AM -0700, David S. Miller wrote:
> > > On Mon, 24 May 2004 17:26:58 +1000
> > > Greg Banks <gnb@sgi.com> wrote:
> > > 
> > > > The tg3 transmit code assumes that tg3_tx() will never have to 
> > > > clean up part of an skb queued for transmit.  This 
> assumption is 
> > > > wrong;
> > > 
> > > Greg, perhaps my reading of the tg3 chip docs is different from 
> > > yours.  The hardware is NEVER supposed to do this.
> > 
> > I'd like to know where you read that, because neither I nor 
> any of the 
> > other SGI engineers who have read the Broadcom docs can 
> find any such 
> > guarantee.
> 
> The most relevant (and accurate) piece of documentation for 
> the chip is Broadcom's own driver :-) And they do not account 
> at all for such a case of partial-packet TX completion 
> indication.  If the first frag is ACK'd they assume the whole 
> packet has been taken.  Here is the relevant code from the 
> bcm5700 driver in LM_ServiceTxInterrupt():
> 
>     while(SwConIdx != HwConIdx)
>     {
>         pPacket = pDevice->SendRing[SwConIdx];
>         pDevice->SendRing[SwConIdx] = 0;
> 
>         /* Set the return status. */
>         pPacket->PacketStatus = LM_STATUS_SUCCESS;
> 
>         /* Put the packet in the TxPacketXmittedQ for 
> indication later. */
>         QQ_PushTail(&pDevice->TxPacketXmittedQ.Container, pPacket);
> 
>         /* Move to the next packet's BD. */
>         SwConIdx = (SwConIdx + pPacket->u.Tx.FragCount) & 
>             T3_SEND_RCB_ENTRY_COUNT_MASK;
> 
>         /* Update the number of unused BDs. */
>         MM_ATOMIC_ADD(&pDevice->SendBdLeft, pPacket->u.Tx.FragCount);
> 
>         /* Get the new updated HwConIdx. */
>         HwConIdx = pDevice->pStatusBlkVirt->Idx[0].SendConIdx;
>     } /* while */
> 
> Imagine how badly this piece of code would fail if partial 
> ACK'ing of TX packets actually occurred.  It would loop past 
> HwConIdx and thus ACK really-not-completed packets, 
> potentially colliding with what the chip is transmitting and 
> thus causing massive data corruption and likely a crash.  
> Actually, it would most likely loop past all valid TX packets 
> and dereference a pPacket NULL pointer.
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread
* [PATCH] fix BUG in tg3_tx
@ 2004-05-24  7:26 Greg Banks
  2004-05-24  7:40 ` David S. Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Banks @ 2004-05-24  7:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: Linux Network Development List

G'day,

The tg3 transmit code assumes that tg3_tx() will never have to clean
up part of an skb queued for transmit.  This assumption is wrong;
sometimes tg3_tx() will be called partway through an skb, which trips
the first BUG() and drops the machine into kdb.  We've seen this happen
on 2.4 (rarely) and 2.6 (easy to reproduce, just run about 5 minutes'
worth of NFS or Samba load).

This patch against 2.6.6 fixes this problem by changing the shadow
transmit logic to put the non-NULL skb pointer in the *last* ring
entry instead of the first, so that it is freed only after all
parts have been DMAd onto the card, and can handle tg3_tx() being
called partway through.

SGI PV 903520 and PV 891640.



Index: linux/drivers/net/tg3.c
===================================================================
--- linux.orig/drivers/net/tg3.c	Mon May 10 12:32:38 2004
+++ linux/drivers/net/tg3.c	Mon May 24 17:23:12 2004
@@ -2103,10 +2103,6 @@
 	return err;
 }
 
-/* Tigon3 never reports partial packet sends.  So we do not
- * need special logic to handle SKBs that have not had all
- * of their frags sent yet, like SunGEM does.
- */
 static void tg3_tx(struct tg3 *tp)
 {
 	u32 hw_idx = tp->hw_status->idx[0].tx_consumer;
@@ -2115,37 +2111,26 @@
 	while (sw_idx != hw_idx) {
 		struct tx_ring_info *ri = &tp->tx_buffers[sw_idx];
 		struct sk_buff *skb = ri->skb;
-		int i;
-
-		if (unlikely(skb == NULL))
-			BUG();
-
-		pci_unmap_single(tp->pdev,
-				 pci_unmap_addr(ri, mapping),
-				 skb_headlen(skb),
-				 PCI_DMA_TODEVICE);
 
 		ri->skb = NULL;
-
-		sw_idx = NEXT_TX(sw_idx);
-
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-			if (unlikely(sw_idx == hw_idx))
-				BUG();
-
-			ri = &tp->tx_buffers[sw_idx];
-			if (unlikely(ri->skb != NULL))
-				BUG();
-
+		
+		if (ri->index == 0) {
+			pci_unmap_single(tp->pdev,
+					 pci_unmap_addr(ri, mapping),
+					 pci_unmap_len(ri, len),
+					 PCI_DMA_TODEVICE);
+		} else {
 			pci_unmap_page(tp->pdev,
 				       pci_unmap_addr(ri, mapping),
-				       skb_shinfo(skb)->frags[i].size,
+				       pci_unmap_len(ri, len),
 				       PCI_DMA_TODEVICE);
-
-			sw_idx = NEXT_TX(sw_idx);
 		}
-
-		dev_kfree_skb_irq(skb);
+		if (skb) {
+			if (unlikely(ri->index != skb_shinfo(skb)->nr_frags))
+				BUG();
+			dev_kfree_skb_irq(skb);
+		}
+		sw_idx = NEXT_TX(sw_idx);
 	}
 
 	tp->tx_cons = sw_idx;
@@ -2605,6 +2590,18 @@
 	schedule_work(&tp->reset_task);
 }
 
+static inline void tg3_set_txri(struct tg3 *tp, int entry, struct sk_buff *skb,
+				int index, dma_addr_t addr, int len)
+{
+	struct tx_ring_info *ri = &tp->tx_buffers[entry];
+
+	ri->skb = (index == skb_shinfo(skb)->nr_frags ? skb : NULL);
+	ri->index = index;
+	pci_unmap_addr_set(ri, mapping, addr);
+	pci_unmap_len_set(ri, len, len);
+}
+
+
 static void tg3_set_txd(struct tg3 *, int, dma_addr_t, int, u32, u32);
 
 static int tigon3_4gb_hwbug_workaround(struct tg3 *tp, struct sk_buff *skb,
@@ -2645,6 +2642,7 @@
 		if (i == 0) {
 			tp->tx_buffers[entry].skb = new_skb;
 			pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, new_addr);
+			pci_unmap_len_set(&tp->tx_buffers[entry], len, new_skb->len);
 		} else {
 			tp->tx_buffers[entry].skb = NULL;
 		}
@@ -2804,8 +2802,7 @@
 	/* Queue skb data, a.k.a. the main skb fragment. */
 	mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE);
 
-	tp->tx_buffers[entry].skb = skb;
-	pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping);
+	tg3_set_txri(tp, entry, skb, 0, mapping, len);
 
 	would_hit_hwbug = 0;
 
@@ -2831,8 +2828,7 @@
 					       frag->page_offset,
 					       len, PCI_DMA_TODEVICE);
 
-			tp->tx_buffers[entry].skb = NULL;
-			pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping);
+			tg3_set_txri(tp, entry, skb, i+1, mapping, len);
 
 			if (tg3_4g_overflow_test(mapping, len)) {
 				/* Only one should match. */
@@ -3002,8 +2998,7 @@
 	/* Queue skb data, a.k.a. the main skb fragment. */
 	mapping = pci_map_single(tp->pdev, skb->data, len, PCI_DMA_TODEVICE);
 
-	tp->tx_buffers[entry].skb = skb;
-	pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping);
+	tg3_set_txri(tp, entry, skb, 0, mapping, len);
 
 	tg3_set_txd(tp, entry, mapping, len, base_flags,
 		    (skb_shinfo(skb)->nr_frags == 0) | (mss << 1));
@@ -3025,8 +3020,7 @@
 					       frag->page_offset,
 					       len, PCI_DMA_TODEVICE);
 
-			tp->tx_buffers[entry].skb = NULL;
-			pci_unmap_addr_set(&tp->tx_buffers[entry], mapping, mapping);
+			tg3_set_txri(tp, entry, skb, i+1, mapping, len);
 
 			tg3_set_txd(tp, entry, mapping, len,
 				    base_flags, (i == last));
Index: linux/drivers/net/tg3.h
===================================================================
--- linux.orig/drivers/net/tg3.h	Mon May 10 12:32:02 2004
+++ linux/drivers/net/tg3.h	Mon May 24 16:58:28 2004
@@ -1774,6 +1774,8 @@
 struct tx_ring_info {
 	struct sk_buff			*skb;
 	DECLARE_PCI_UNMAP_ADDR(mapping)
+	DECLARE_PCI_UNMAP_LEN(len)
+	u32				index;
 	u32				prev_vlan_tag;
 };
 




Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-05-27  0:12 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-26 17:43 [PATCH] fix BUG in tg3_tx Michael Chan
2004-05-26 18:47 ` David S. Miller
2004-05-26 23:52   ` Greg Banks
2004-05-26 23:50 ` Greg Banks
  -- strict thread matches above, loose matches on Subject: below --
2004-05-26  1:22 Michael Chan
2004-05-25 20:04 Michael Chan
2004-05-26  0:54 ` Greg Banks
2004-05-26 16:04 ` Greg Banks
2004-05-26 18:01   ` David S. Miller
2004-05-26 23:47     ` Greg Banks
2004-05-26 23:52       ` David S. Miller
2004-05-27  0:12         ` Greg Banks
2004-05-24 17:33 Michael Chan
2004-05-24 17:26 Michael Chan
2004-05-24  7:26 Greg Banks
2004-05-24  7:40 ` David S. Miller
2004-05-24  8:04   ` Greg Banks
2004-05-24 17:06     ` David S. Miller
2004-05-25  1:04       ` Greg Banks
2004-05-25 17:51         ` David S. Miller
2004-05-26  0:12           ` Greg Banks
2004-05-25 17:52         ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).