From mboxrd@z Thu Jan 1 00:00:00 1970 From: Greg Banks Subject: Re: [PATCH] fix BUG in tg3_tx Date: Wed, 26 May 2004 10:54:29 +1000 Sender: netdev-bounce@oss.sgi.com Message-ID: <20040526005429.GC2689@sgi.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: "David S. Miller" , netdev@oss.sgi.com Return-path: To: Michael Chan Content-Disposition: inline In-Reply-To: Errors-to: netdev-bounce@oss.sgi.com List-Id: netdev.vger.kernel.org On Tue, May 25, 2004 at 01:04:24PM -0700, Michael Chan wrote: > > Greg, did you see Micahel Chan's response? A Broadcom > > engineer is telling us "the hardware does not ACK partial TX packets." > > > That's right. The hw is designed to always complete tx packets on packet > boundaries and not BD boundaries. The send data completion state machine > will create 1 single dma descriptor and 1 host coalescing descriptor for > the entire packet. All of our drivers do not handle individual BD > completions and I'm not aware of any problems caused by this. Actually > we did see some partial packet completions during the early > implementions of TSO/LSO. But those were firmware issues and have been > fixed long time ago. tg3 is not using those early TSO firmware. I believe the SGI-branded cards ship with firmware fixes beyond simply changing the PCI ids. Also, AFAIK it dates from about the time of the TSO experiments. Can you check if that firmware has the issue you describe? > > I don't argue that you aren't seeing something strange, but > > perhaps that is due to corruption occuring elsewhere, or > > perhaps something peculiar about your system hardware > > (perhaps the PCI controller mis-orders PCI transactions or > > something silly like that)? > Good point. A few years ago we saw cases where there were tx completions > on BDs that had not been sent. It turned out that on that machine, the > chipset was re-ordering the posted mmio writes to the send mailbox > register from 2 CPUs. For example, CPU 1 wrote index 1 and CPU wrote > index 2 a little later. On the PCI bus, we saw memory write of 2 > followed by 1. When the chip saw 2, it would send both packets. When it > later saw 1, it thought that there were 512 new tx BDs and went ahead to > send them. The only effective workaround for this chipset problem was a > read of the send mailbox after the write to flush it. The tg3 driver already does this if the TG3_FLAG_MBOX_WRITE_REORDER flag is set in tp->tg3_flags. There's been some discussion inside SGI about that behaviour. In short, our PCI hardware is susceptible to PIO write reordering, but experiment has shown that enabling that flag results in an unacceptable throughput degradation (about 10%). I have also noticed that under significant load the softirq portion of the driver gets scheduled on other CPUs than the interrupt CPU, including CPUs in other NUMA nodes. This sounds like a theory I can test. Greg. -- Greg Banks, R&D Software Engineer, SGI Australian Software Group. I don't speak for SGI.