From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [PATCH 0/3] Basic deferred TX queue flushing infrastructure. Date: Sat, 23 Aug 2014 16:25:05 -0700 Message-ID: <53F922D1.70409@gmail.com> References: <20140823.132811.751469424156827125.davem@davemloft.net> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: therbert@google.com, jhs@mojatatu.com, hannes@stressinduktion.org, edumazet@google.com, jeffrey.t.kirsher@intel.com, rusty@rustcorp.com.au To: David Miller , netdev@vger.kernel.org Return-path: Received: from mail-pa0-f49.google.com ([209.85.220.49]:54525 "EHLO mail-pa0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751653AbaHWXZM (ORCPT ); Sat, 23 Aug 2014 19:25:12 -0400 Received: by mail-pa0-f49.google.com with SMTP id hz1so18287960pad.36 for ; Sat, 23 Aug 2014 16:25:07 -0700 (PDT) In-Reply-To: <20140823.132811.751469424156827125.davem@davemloft.net> Sender: netdev-owner@vger.kernel.org List-ID: On 08/23/2014 01:28 PM, David Miller wrote: > > Over time, and specifically and more recently at the Networking > Workshop during Kernel SUmmit in Chicago, we have discussed the idea > of having some way to optimize transmits of multiple TX packets at > a time. > > There are several areas of overhead that could be amortized with such > schemes. One has to do with locking and transactional overhead, the > other has to do with device specific costs. > > This patch set here is more aimed at device specific costs. > > Typically a device queues up a packet in the TX queue and then has to > do something to have the device start processing that new entry. > Sometimes this is composed of doing an MMIO write to a "tail" > register, and in other cases it can involve something as expensive as > a hypervisor call. The MMIO call isn't an issue until you encounter a locked operation, at least on x86 architecture. So this often shows up in perf traces as a hit on the qdisc lock right after completing a transmit. I've seen it at around 20% of CPU utilization when I was doing routing work with ixgbe. Thanks, Alex