From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Baron Subject: Re: macvlan: optimizing the receive path? Date: Tue, 07 Oct 2014 13:35:42 -0400 Message-ID: <5434246E.1000403@akamai.com> References: <542DB55D.3090601@akamai.com> <20141004.204203.2211720828886085354.davem@davemloft.net> <5432936D.7010906@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Cc: "netdev@vger.kernel.org" , "kaber@trash.net" To: Vlad Yasevich , David Miller Return-path: Received: from prod-mail-xrelay02.akamai.com ([72.246.2.14]:45326 "EHLO prod-mail-xrelay02.akamai.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754295AbaJGRfn (ORCPT ); Tue, 7 Oct 2014 13:35:43 -0400 In-Reply-To: <5432936D.7010906@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: On 10/06/2014 09:04 AM, Vlad Yasevich wrote: > On 10/04/2014 08:42 PM, David Miller wrote: >> From: Jason Baron >> Date: Thu, 02 Oct 2014 16:28:13 -0400 >> >>> --- a/drivers/net/macvlan.c >>> +++ b/drivers/net/macvlan.c >>> @@ -321,8 +321,8 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb) >>> skb->dev = dev; >>> skb->pkt_type = PACKET_HOST; >>> >>> - ret = netif_rx(skb); >>> - >>> + macvlan_count_rx(vlan, len, true, 0); >>> + return RX_HANDLER_ANOTHER; >>> out: >>> macvlan_count_rx(vlan, len, ret == NET_RX_SUCCESS, 0); >>> return RX_HANDLER_CONSUMED; >> >> That last argument to macvlan_count_rx() is a bool and thus should be >> specified as "false". Yes I know other areas of this file get it >> wrong too. >> ok. I can fix those up too while here. >> Also, what about GRO? Won't we get GRO processing if we do this via >> netif_rx() but not via the RX_HANDLER_ANOTHER route? Just curious... > > Wouldn't GRO already happen at the lower level? For macvlan-to-macvlan, > you'd typically have large packets so no need for GRO. > Yes, afaict gro is happening a layer below __netif_receive_skb_core(). Here are some results of this optimization on 3.17 using macvlan with lxc. Test case is (average of 3 runs): for i in {35,50,65,80,95,110,125,140,155}; do super_netperf $i netperf -H $ip -t TCP_RR; done trans./sec (3.17) 494016 612806 673100 696982 710494 716830 714729 713478 711056 trans./sec (3.17 + macvlan patch) 517159 +(4.684733558%) 628382 +(2.541860742%) 669688 -(0.5069080835%) 706181 +(1.319833855%) 716660 +(0.8677995555%) 719581 +(0.3838661811%) 718738 +(0.5609585358%) 718904 +(0.7605470482%) 718344 +(1.02509555%) On the host I can see that the idle time goes to 0, so this would appear to be an improvement. I also observed that enqueue_to_backlog() and process_backlog() are no longer in the 'perf' profiles as expected. So if there are no objections, I will post as a formal patch. Thanks, -Jason