From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: Performance regression on kernels 3.10 and newer Date: Fri, 15 Aug 2014 16:23:43 -0700 Message-ID: <53EE967F.9090101@intel.com> References: <53ECFDAB.5010701@intel.com> <1408041962.6804.31.camel@edumazet-glaptop2.roam.corp.google.com> <53ED4354.9090904@intel.com> <20140814.162024.2218312002979492106.davem@davemloft.net> <53EE4023.6080902@intel.com> <53EE5B25.3040206@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: David Miller , Eric Dumazet , Linux Netdev List , Rick Jones To: Tom Herbert Return-path: Received: from mga03.intel.com ([143.182.124.21]:56262 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751138AbaHOXXq (ORCPT ); Fri, 15 Aug 2014 19:23:46 -0400 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On 08/15/2014 03:16 PM, Tom Herbert wrote: > On Fri, Aug 15, 2014 at 12:10 PM, Alexander Duyck > wrote: >> On 08/15/2014 11:49 AM, Tom Herbert wrote: >>> Alex, I tried to repro your problem running your script (on bnx2x). >>> Didn't see see the issue and in fact ip_dest_check did not appear in >>> top perf functions on perf. I assume this is more related to the >>> steering configuration rather than the device (although flow director >>> might be a fundamental difference). >>> >> >> So the original script I put out had a typo. It was supposed to run all >> 60 at the same time, not one at a time. So make sure you add an >> ampersand to the end of the netperf command line if you run the test so >> that it is 60 at once, not 60 in series. >> >> Also one other thing I had to do was disable tcp_autocork. Without that >> the test is a large packets test instead of a small packet test. >> > Okay, by running netperf in background, disabling autoconf, and > turning off RPS/RFS I'm able to get ipv4_dst_check to come up in perf; > but t's not nearly as bad as what you've reported though, only about > 1.5%. When I applied path to move rt_genid to different cacheline > ipv4_dst_check goes away (ipv4: move rt_genid to different cache > line). Can you try this patch in your setup? The issue doesn't occur for me until I start using netperf on both sockets with the same IP address on both ends. Then I see the dst bouncing between the two nodes and the CPU utilization skyrockets. If I am only on one node the dst bouncing is tolerable as it doesn't go any further than the LLC. With your patch applied I see ipv4_dst_check drop off to 5% CPU utilization from the 36% that it was. However ip_rcv_finish has climbed up to about 16% so it isn't as though much was saved. It just pushed it to the next item to hit in that cacheline. Throughput was 2.5Gb/s with 100% CPU utilization on the receiver. Even if the refcount issue is fixed the performance still suffers compared to the low_latency path in my testing. When I reverted the refcount change the CPU utilization dropped from 100% to about 25%, but that is still double the 12% I am seeing when tcp_low_latency is set. That is one of the reasons why I am not all that interested in the ref count fix as I am still likely going to have to work around other issues in the prequeue path. Another test I tried was to hack the nettest_bsd.c file in netperf to perform a poll() based receive. That resolved the issue and had all the performance of the tcp_low_latency case. I may see if I can work with Rick to push something like that into netperf as I really would prefer to avoid having to advise everyone on how to setup the sysctl for tcp_low_latency. Thanks, Alex