From mboxrd@z Thu Jan 1 00:00:00 1970 From: linux@arm.linux.org.uk (Russell King - ARM Linux) Date: Wed, 2 Apr 2014 11:46:45 +0100 Subject: FEC ethernet issues [Was: PL310 errata workarounds] In-Reply-To: References: <201403242121.58705.marex@denx.de> <20140324234443.GS7528@n2100.arm.linux.org.uk> <20140326001135.GV7528@n2100.arm.linux.org.uk> <20140401092638.GA10224@n2100.arm.linux.org.uk> <20140401225149.GC7528@n2100.arm.linux.org.uk> <20140402085914.GG7528@n2100.arm.linux.org.uk> Message-ID: <20140402104644.GI7528@n2100.arm.linux.org.uk> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Wed, Apr 02, 2014 at 09:40:53AM +0000, fugang.duan at freescale.com wrote: > From: Russell King - ARM Linux > Data: Wednesday, April 02, 2014 4:59 PM > >I wonder whether you understand what is going on here, and why it is required. > >I doubt it somehow from your comments. Maybe if you were to read about the > >operation of the store buffer in the PL310, it may open your eyes to why it > >would be necessary for reliable operation. > > In kernel 3.0.35 internal BSP, BD memory is non-cacheable, non-bufferable > (we add new api to support it: dma_alloc_noncacheable()), As is the memory you get from dma_alloc_coherent(). So, why did you invent a new API which does something which the mainline kernel APIs already do? Maybe yours is doing something different but you haven't explained it in correct terminology. > So wmb() is not necessary. Even on non-cacheable normal memory, the wmb() is required. Please read up in the ARM architecture reference manual about memory types and their various attributes, followed by the memory ordering chapters. > Yes, it don't impact imx6q since cpu loading is not bottleneck due > rx/tx bandwidth is slow and multi-cores. But for imx6sx, enet rx can > reach at 940Mbps, tx can reach at 900Mbps, imx6sx is sigle core. What netdev features do you support to achieve that? > Enet IP don't support TSO feaure, cpu loading is the bottleneck. Wmb() > is very expensive which cause tx performance drop much. wmb() is very expensive because of the L2 cache code using a sledge hammer with it - particularly the spinlock, which has a large overhead if lockdep or spinlock debugging is enabled. > Yes, I agree. There have some arch/driver patches need to upstream to > align the performance with internal bsp. Well, post these patches so that people can test them. If your patches are indeed the worlds best thing since sliced bread, I've just wasted over two months of solid work on the iMX6 ethernet driver. However, hacks like dma_alloc_noncacheable will not be acceptable for mainline. -- FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly improving, and getting towards what was expected from it.