From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [RFC PATCH net-next 1/3] ixgbe: support netdev_ops->ndo_xmit_flush() Date: Tue, 26 Aug 2014 08:44:55 +0200 Message-ID: <20140826084455.28dd4058@redhat.com> References: <1408887738-7661-1-git-send-email-dborkman@redhat.com> <1408887738-7661-2-git-send-email-dborkman@redhat.com> <20140825140721.162a6c91@redhat.com> <53FBBE06.3020405@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Daniel Borkmann , davem@davemloft.net, netdev@vger.kernel.org, brouer@redhat.com To: Alexander Duyck Return-path: Received: from mx1.redhat.com ([209.132.183.28]:57123 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754354AbaHZGpA (ORCPT ); Tue, 26 Aug 2014 02:45:00 -0400 In-Reply-To: <53FBBE06.3020405@intel.com> Sender: netdev-owner@vger.kernel.org List-ID: On Mon, 25 Aug 2014 15:51:50 -0700 Alexander Duyck wrote: > On 08/25/2014 05:07 AM, Jesper Dangaard Brouer wrote: > > On Sun, 24 Aug 2014 15:42:16 +0200 > > Daniel Borkmann wrote: > > > >> This implements the deferred tail pointer flush API for the ixgbe > >> driver. Similar version also proposed longer time ago by Alexander Duyck. > > > > I've run some benchmarks with this patch only, which actually shows a > > performance regression. > > > > Using trafgen with QDISC_BYPASS and mmap mode, via cmdline: > > trafgen --cpp --dev eth5 --conf udp_example01.trafgen -V --cpus 1 > > > > BASELINE(no-patch): trafgen QDISC_BYPASS and mmap: > > - tx:1562539 pps > > > > (This patch only): ixgbe use of .ndo_xmit_flush. > > - tx:1532299 pps > > > > Regression: -30240 pps > > * In nanosec: (1/1562539*10^9)-(1/1532299*10^9) = -12.63 ns > > > > > > As DaveM points out, me might not need the mmiowb(). > > Result when not performing the mmiowb(): > > - tx:1548352 pps > > > > Still a small regression: -14187 pps > > * In nanosec: (1/1562539*10^9)-(1/1548352*10^9) = -5.86 ns > > > > > > I was not expecting this "slowdown", with this rather simple use of the > > new ndo_xmit_flush API. Can anyone explain why this is happening? > > One possibility is that we are now doing less stuff between the time we > write tail and when we grab the qdisc lock (locked transactions are > stalled by MMIO) so that we are spending more time stuck waiting for the > write to complete and doing nothing. In this testcase we are bypassing the qdisc code path, but still taking the HARD_TX_LOCK. I were only expecting in the area of -2ns due to the extra function call overhead. But when we start to include the qdisc code path, then the performance regression gets even worse. I would like an explanation for that, see: http://thread.gmane.org/gmane.linux.network/327254/focus=327431 > Then of course there are always the funny oddball quirks such as the > code changes might have changed the alignment of a loop resulting in Tx > cleanup more expensive than it was before. Yes, this is when it gets hairy! -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer