From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CBAFC43381 for ; Thu, 14 Feb 2019 05:57:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3674D21934 for ; Thu, 14 Feb 2019 05:57:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=fomichev-me.20150623.gappssmtp.com header.i=@fomichev-me.20150623.gappssmtp.com header.b="W2EZTUgq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403777AbfBNF52 (ORCPT ); Thu, 14 Feb 2019 00:57:28 -0500 Received: from mail-pg1-f194.google.com ([209.85.215.194]:46522 "EHLO mail-pg1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727669AbfBNF52 (ORCPT ); Thu, 14 Feb 2019 00:57:28 -0500 Received: by mail-pg1-f194.google.com with SMTP id w7so2452229pgp.13 for ; Wed, 13 Feb 2019 21:57:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fomichev-me.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=RLegKzhNIRozDO4w8916iP0B+FybLWIvcyFBWrM9OMU=; b=W2EZTUgqcpTEiyhNs6nWwHVzhswroXGMKSVGCMf8wILk8qP4Xle1IK4L4xdt2Dnp7V vBeEeiSdzixZ2P+ulg3TtIDELP5U/7PUdcpy5QGRmi28NN9eoNOsICVSlr4hVFJZt31z nItaGbJVsQfd8UWVXfsples9xsfUg37UK1iI1Xgb8ee+sSBH9CqwDGjovZOFQ7TmIMuN wtg1nz6xlXAvr5jYu6i+VCca5NUxf9K8aRV7Xjbihr0yjYMIIBcZ7Hp8jzwRSpoSYsXK Gaw83BxD7E3WYNL736nfqTHS1u1633O4mrjDqL0J1DMR7F1wq6HuoIXdVX83LKfeSGTz 6VKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=RLegKzhNIRozDO4w8916iP0B+FybLWIvcyFBWrM9OMU=; b=mHYhk9vnOMNDOo92xRyG1Lain26M1ikZ/bmfo5HGhwaUACLZmNz/1a7qiu5I1/Xfja 5GUxROfej/t1K5gjHdNsgoFXi037O3F5BQsSrisYuj9hmdJ+PINKmzqyTqz8AvceQ7cI ikvwxpK9XELW5+c6CEESPKjGE+aCUHb1B4kwEr6kaRnoKxX9yNib2xSYJc/8lWBbvbkY bUo22lIGYd6vxJKgKYfAYWVmOLRkH8gcvtqGvZeB5e24NTq8XxUVUqjVCdPf9dPKg/PX 7FNHm47UTzmtUC+PyazS0kejCISK0R5TeesmCKIL1lyZ6ANtiKZRvl/JMfiR9P/onjdZ VgyA== X-Gm-Message-State: AHQUAubselNOZjaedFCHN+HeSxclP5puRsPkL1YIEZrJ/YNPJeKOV4qm Xle0cNUR7HWgSGdBO0i7XoWEZIgKwEQ= X-Google-Smtp-Source: AHgI3Ib1V3+npiBmWAIDAX/dTRQpwEMIzOYFTiYCLq5HqjnSq/9jTm5nae9HdDQBUZL6kjBCcUw0Bg== X-Received: by 2002:aa7:819a:: with SMTP id g26mr2235926pfi.187.1550123847021; Wed, 13 Feb 2019 21:57:27 -0800 (PST) Received: from localhost ([2601:646:8f00:18d9:d0fa:7a4b:764f:de48]) by smtp.gmail.com with ESMTPSA id l184sm1543167pfc.112.2019.02.13.21.57.25 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 13 Feb 2019 21:57:26 -0800 (PST) Date: Wed, 13 Feb 2019 21:57:25 -0800 From: Stanislav Fomichev To: Alexei Starovoitov Cc: Willem de Bruijn , Stanislav Fomichev , Network Development , David Miller , Alexei Starovoitov , Daniel Borkmann , simon.horman@netronome.com, Willem de Bruijn Subject: Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Message-ID: <20190214055725.GC10595@mini-arch> References: <20190205204003.GB10769@mini-arch> <20190206004714.pz44evow5uwgvt4x@ast-mbp.dhcp.thefacebook.com> <20190206005931.GF10769@mini-arch> <20190206031215.qldeh7pfgqr3frg3@ast-mbp> <20190206035619.GG10769@mini-arch> <20190206041112.h3ccahimp2uerdfk@ast-mbp> <20190206054946.GH10769@mini-arch> <20190212170232.GB10595@mini-arch> <20190214043931.hdpddeom57rnnlhm@ast-mbp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190214043931.hdpddeom57rnnlhm@ast-mbp> User-Agent: Mutt/1.11.3 (2019-02-01) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 02/13, Alexei Starovoitov wrote: > On Tue, Feb 12, 2019 at 09:02:32AM -0800, Stanislav Fomichev wrote: > > On 02/05, Stanislav Fomichev wrote: > > > On 02/05, Alexei Starovoitov wrote: > > > > On Tue, Feb 05, 2019 at 07:56:19PM -0800, Stanislav Fomichev wrote: > > > > > On 02/05, Alexei Starovoitov wrote: > > > > > > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote: > > > > > > > On 02/05, Alexei Starovoitov wrote: > > > > > > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote: > > > > > > > > > On 02/05, Willem de Bruijn wrote: > > > > > > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev wrote: > > > > > > > > > > > > > > > > > > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any > > > > > > > > > > > skb. Because we use passed skb to lookup associated networking namespace > > > > > > > > > > > to find whether we have a BPF program attached or not, we always use > > > > > > > > > > > C-based flow dissector in this case. > > > > > > > > > > > > > > > > > > > > > > The goal of this patch series is to add new networking namespace argument > > > > > > > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to > > > > > > > > > > > work in the skb-less case. > > > > > > > > > > > > > > > > > > > > > > The series goes like this: > > > > > > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to > > > > > > > > > > > initialize temporary skb > > > > > > > > > > > 2. introduce skb_net which can be used to get networking namespace > > > > > > > > > > > associated with an skb > > > > > > > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and > > > > > > > > > > > plumb through the callers > > > > > > > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb > > > > > > > > > > > (using __init_skb) and calls BPF flow dissector program > > > > > > > > > > > > > > > > > > > > The main concern I see with this series is this cost of skb zeroing > > > > > > > > > > for every packet in the device driver receive routine, *independent* > > > > > > > > > > from the real skb allocation and zeroing which will likely happen > > > > > > > > > > later. > > > > > > > > > Yes, plus ~200 bytes on the stack for the callers. > > > > > > > > > > > > > > > > > > Not sure how visible this zeroing though, I can probably try to get some > > > > > > > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with > > > > > > > > > on-stack skb). > > > > > > > > > > > > > > > > imo extra 256 byte memset for every packet is non starter. > > > > > > > We can put pre-allocated/initialized skbs without data into percpu or even > > > > > > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think > > > > > > > about having multiple percpu for irq/softirq/process contexts. > > > > > > > Any concerns with that approach? > > > > > > > Any other possible concerns with the overall series? > > > > > > > > > > > > I'm missing why the whole thing is needed. > > > > > > You're saying: > > > > > > " make BPF flow dissector programs be able to work in the skb-less case". > > > > > > What does it mean specifically? > > > > > > The only non-skb case is XDP. > > > > > > Are you saying you want flow_dissector prog to be run in XDP? > > > > > eth_get_headlen that drivers call on RX path on a chunk of data to > > > > > guesstimate the length of the headers calls flow dissector without an skb > > > > > (__skb_flow_dissect was a weird interface where it accepts skb or > > > > > data+len). Right now, there is no way to trigger BPF flow dissector > > > > > for this case (we don't have an skb to get associated namespace/etc/etc). > > > > > The patch series tries to fix that to make sure that we always trigger > > > > > BPF program if it's attached to a device's namespace. > > > > > > > > then why not to create flow_dissector prog type that works without skb? > > > > Why do you need to fake an skb? > > > > XDP progs work just fine without it. > > > What's the advantage of having another prog type? In this case we would have > > > to write the same flow dissector program twice: first time against __skb_buff > > > interface, second time against xdp_md. > > > By using fake skb, we make the same flow dissector __sk_buff BPF program > > > work in both contexts without a rewrite to an xdp interface (I don't > > > think users should care whether flow dissector was called form "xdp" vs skb > > > context; and we're sort of stuck with __sk_buff interface already). > > Should I follow up with v2 where I address memset(,,256) for each packet? > > Or you still have some questions/doubts/suggestions regarding the problem > > I'm trying to solve? > > sorry for delay. I'm still thinking what is the path forward here. No worries, thanks for sticking with me :-) > That 'stuck with __sk_buff' is what bothers me. I might have use the wrong word here. I don't think there is another option to be honest. Using __sk_buff makes flow dissector programs work with fragmented packets; if we were to use xdp_meta instead, it would not work in this case. Another point here: the fact that eth_get_headlen calls flow dissector on a chunk of data instead of skb feel like an implementation detail. Imo, application writers should not care about this context; coding against __sk_buff feels like the best we can do. > It's an indication that api wasn't thought through if first thing > it needs is this fake skb hack. > If bpf_flow.c is a realistic example of such flow dissector prog > it means that real skb fields are accessed. > In particular skb->vlan_proto, skb->protocol. I do manually set skb->protocol to eth->h_proto in my proposal. This is later correctly handled by bpf_flow.c: parse_eth_proto() is called on skb->protocol and we correctly handle bpf_htons(ETH_P_8021Q) there. So existing bpf_flow.c works as expected. Related: I was also thinking about moving this check out of bpf_flow.c and pass n_proto directly. I don't see why bpf_flow_keys "export" n_proto, we know it when we call flow dissector, there is no need to test for skb->vlan_present in the bpf program. > These fields in case of 'fake skb' will not be set, since eth_type_trans() > isn't called yet. Just to reiterate, I do set skb->protocol manually and skb->vlan_preset == false (and bpf_flow.c handles this case). We can also set correct vlan_preset with an additional check, but I decided to not do it since bpf_flow.c handles that. > So either flow_dissector needs a real __sk_buff and all of its fields > should be real or it's a different flow_dissector prog type that > needs ctx->data, ctx->data_end, ctx->flow_keys only. > Either way going with fake skb is incorrect, since bpf_flow.c example > will be broken and for program writers it will be hard to figure why > it's broken. It's fake in a sense that it's stack allocated in this particular case. To the bpf_flow.c it looks like a real __sk_buff. I don't see why bpf_flow.c example might be broken in this case, can you elaborate? The goal of this patch series was to essentially make this skb/no-skb context transparent to the bpf_flow.c (i.e. no changes from the user flow programs). Adding another flow dissector for eth_get_headlen case also seems as a no go.