From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Duyck Subject: Re: [PATCH net-next 1/2] net: Expose header length compution function Date: Wed, 21 May 2014 09:45:03 -0700 Message-ID: <537CD80F.2090909@intel.com> References: <1399553434-25617-1-git-send-email-amirv@mellanox.com> <1399553434-25617-2-git-send-email-amirv@mellanox.com> <20140509.162454.1317479460010270185.davem@davemloft.net> <536E5DF4.8010908@gmail.com> <1399744197.7973.11.camel@edumazet-glaptop2.roam.corp.google.com> <536E9FE7.6060809@gmail.com> <1400533291.5367.67.camel@edumazet-glaptop2.roam.corp.google.com> <537CC05E.6030705@intel.com> <1400687469.5367.152.camel@edumazet-glaptop2.roam.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: Alexander Duyck , David Miller , amirv@mellanox.com, netdev@vger.kernel.org, idos@mellanox.com, jeffrey.t.kirsher@intel.com, jesse.brandeburg@intel.com, bruce.w.allan@intel.com, carolyn.wyborny@intel.com, donald.c.skidmore@intel.com, gregory.v.rose@intel.com, john.ronciak@intel.com, mitch.a.williams@intel.com, yevgenyp@mellanox.com, ogerlitz@mellanox.com To: Eric Dumazet Return-path: Received: from mga09.intel.com ([134.134.136.24]:16933 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752676AbaEUQuJ (ORCPT ); Wed, 21 May 2014 12:50:09 -0400 In-Reply-To: <1400687469.5367.152.camel@edumazet-glaptop2.roam.corp.google.com> Sender: netdev-owner@vger.kernel.org List-ID: On 05/21/2014 08:51 AM, Eric Dumazet wrote: > On Wed, 2014-05-21 at 08:03 -0700, Alexander Duyck wrote: > >> So it looks like you did kind of what I expected you would, only you >> allocated a temporary sk_buff on the stack and then pointed the head to >> the start of the page. I'm not really a fan of this approach though it >> does give me a couple ideas. >> >> One thought I just had though, what if we were to do something like >> create an eth_build_skb function? > Well, it all depends if you use napi_get_frags() / napi_gro_frags(), > which are normally the way to get very fast GRO processing, since > you don't even have to allocate memory for the skbs at all, since > skb will likely be recycled in napi_reuse_skb() Another thought would be to possibly look into a GRO type approach. Something where we could place the length parsing functions in the offload_callbacks. If we could do that then we could just integrate the functionality with GRO and make use of those callbacks. Basically it would require doing the parsing as a part of napi_frags_skb() so that when we do the pull we get the full header length in one shot so that the entire frame is linear right from the start. Actually the more I think about this now the more it makes sense. We could probably pull out all the skb_gro_header_hard/skb_gro_header_slow length bits from the existing gro_receive functions and place them in another piece of the code. >> It would essentially be a cross >> between eth_type_trans, your new eth_frame_headlen function, and >> build_skb. It would allow us to avoid the unnecessary allocation of an >> skb on the stack and avoid any unnecessary data duplication since we >> already would be doing a number of the eth_type_trans steps in your >> eth_frame_headlen function. The one limitation is that we would need to >> allocate a block of memory for the head, but that would be done after we >> figure out what the size of the header is. > 'Allocating' an skb on stack has no cost. Exactly 0 added instructions. > > It only increases the size of stack, and at this point we are before all > the networking stacks, so it is safe. > > Have you seen the eBPF stuff adding more stack usage than this ? > > #define MAX_BPF_STACK 512 We have had stack smashing issues in the past with the ixgbe interrupt handlers and it wasn't consuming much memory on the stack as I recall. I prefer to err on the side of caution. Also the more I think about it I am not really comfortable putting a partially initialized sk_buff through any function calls. It seems like it is setting somebody up for a failure because if at some point the code changes and needs some other field out of the skb it won't be initialized here unless they catch this tricky bit of code.