From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sowmini Varadhan Date: Thu, 11 Dec 2014 19:45:42 +0000 Subject: ixgbe/linux/sparc perf issues Message-Id: <20141211194542.GA5536@oracle.com> MIME-Version: 1 Content-Type: multipart/mixed; boundary="X1bOJ3K7DJ5YkBrT" List-Id: To: sparclinux@vger.kernel.org --X1bOJ3K7DJ5YkBrT Content-Type: text/plain; charset=us-ascii Content-Disposition: inline e1000-developers, [Cc-ing sparclinux due to the iommu observations..] I'm looking at an iperf issue running over ixgbe on linux on a sparc T5-2 platform (64 cpu) where we cannot get to line-speed (peaks at 3 Gbps on a 10Gbps link) and I'm trying to get to the bottom of this. I've run iperf with 8 threads. Observations are 1. lockstat and perf report that iommu->lock is the hot-lock (in a typical instance, I get about 21M contentions out of 27M acquisitions, 25 us avg wait time). Even if I fix this issue (see below), I see: 2. ethtool stat: rx_missed_errors and/or rx_no_dma_resources goes up (even with just one iperf thread). Disabling IOMMU is not an option on this arch (sun4v). But I tried a fix to mitigate #1 by breaking up the iommu map/lock into locks with finer granularity for map-pools, similar to the design for iommu on powerpc. That fix takes care of the lockstat output, but it shows lot of latency for packet receive (14 us wait time on socket lock without RPS, and even with RPS, the rcu lock has a high wait time), and throughput still cannot go beyond the 3 Gbps limit. This suggests that #2 needs to be solved first. I dont think this is a setup issue, though I could be mistaken: when I boot solaris on the exact same hardware config, I am able to get a throughput of approx 9.4 Gbps. There are other things one could do, to ameliorate iommu overhead on this e.g., keep a cache of premapped buffers for small packets (such as the TCP ACK, for example) with a configurable threshold defining "small". But before I go too far into experimenting with those things, I wanted to check with e1000-devel to see if this just sub-optimal tuning of the ixgbe driver. Attached are the output of the commands (listed in the order they appear below) for a single iperf thread (similar stats issues are there even in the 1 thread case) ethtool -i eth1 lspci -vvv -s ethtool -k eth1 ethtool -S eth1 Any insights or tuning-recommendations would be appreciated, --Sowmini --X1bOJ3K7DJ5YkBrT Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="eth1.out" #ethtool -i eth1 driver: ixgbe version: 3.19.1-k firmware-version: 0x800003ed bus-info: 0001:03:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: no # lspci -vvv -s 0001:03:00.1 0001:03:00.1 Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01) Subsystem: Oracle/SUN Device 4848 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR-