From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul Chaignon" Subject: Re: [RFC PATCH 00/11] OVS eBPF datapath. Date: Thu, 5 Jul 2018 14:12:03 +0200 Message-ID: <20180705121202.GA27625@Nover> References: <1529756203-77067-1-git-send-email-u9012063@gmail.com> <20180628030023.pkgdu4brun3jy2bz@ast-mbp.dhcp.thefacebook.com> <20180703175623.fqrrtppjsjimznvv@ast-mbp.dhcp.thefacebook.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Alexei Starovoitov , "" , Tom Herbert via iovisor-dev , Linux Kernel Network Developers To: William Tu Return-path: In-Reply-To: List-Unsubscribe: Sender: iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org List-Post: Content-Disposition: inline List-Id: netdev.vger.kernel.org On Wed, Jul 04, 2018 at 07:25:50PM -0700, William Tu wrote: > On Tue, Jul 3, 2018 at 10:56 AM, Alexei Starovoitov > wrote: > > On Thu, Jun 28, 2018 at 07:19:35AM -0700, William Tu wrote: > >> Hi Alexei, > >> > >> Thanks a lot for the feedback! > >> > >> On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov > >> wrote: > >> > On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote: > >> >> > >> >> Discussion > >> >> ========== > >> >> We are still actively working on finishing the feature, currently > >> >> the basic forwarding and tunnel feature work, but still under > >> >> heavy debugging and development. The purpose of this RFC is to > >> >> get some early feedbacks and direction for finishing the complete > >> >> features in existing kernel's OVS datapath (the net/openvswitch/*). > >> > > >> > Thank you for sharing the patches. > >> > > >> >> Three major issues we are worried: > >> >> a. Megaflow support in BPF. > >> >> b. Connection Tracking support in BPF. > >> > > >> > my opinion on the above two didn't change. > >> > To recap: > >> > A. Non scalable megaflow map is no go. I'd like to see packet classification > >> > algorithm like hicuts or efficuts to be implemented instead, since it can be > >> > shared by generic bpf, bpftiler, ovs and likely others. > >> > >> We did try the decision tree approach using dpdk's acl lib. The lookup > >> speed is 6 times faster than the magaflow using tuple space. > >> However, the update/insertion requires rebuilding/re-balancing the decision > >> tree so it's way too slow. I think hicuts or efficuts suffers the same issue. > >> So decision tree algos are scalable only for lookup operation due to its > >> optimization over tree depth, but not scalable under > >> update/insert/delete operations. > >> > >> On customer's system we see megaflow update/insert rate around 10 rules/sec, > >> this makes decision tree unusable, unless we invent something to optimize the > >> update/insert time or incremental update of these decision tree algo. > > > > is this a typo? you probably meant 10K rule updates a second ? > I mean "new" rules being added at 10 rules/sec. > Update rate might be much higher. > > > Last time I've dealt with these algorithms we had 100K acl updates a second. > > It was an important metric that we were optimizing for. > > I'm pretty sure '*cuts' algos do many thousands per second non optimized. > > When adding a new rule, do these algorithms require rebuilding the > entire tree? > > In our evaluation, updating an existing entry in the decision tree > performs OK, because it is equal to lookup and replace, and lookup > is fast, update is just atomic swap. But inserting a new rule is slow, > because it requires re-building the tree using all existing rules. > And we see new rule being added at rate 10 rules per second. > So we are constantly rebuilding the entire tree. > > If the entire tree has 100k of rules, it takes around 2 seconds to rebuild, > based on the dpdk acl library. And without an incremental algorithm, > adding 1 new rule will trigger rebuilding the 100k of rules, and it is too slow. > > Reading through HyperCuts and EffiCuts, I'm not sure how it supports > incrementally adding a new rule, without rebuilding the entire tree. > http://ccr.sigcomm.org/online/files/p207.pdf > http://cseweb.ucsd.edu/~susingh/papers/hyp-sigcomm03.pdf > > The HyperCuts papers says > "A fast update algorithm can also be implemented; however we do not > go into the details of incremental update in this paper" > > > > >> >> c. Verifier limitation. > >> > > >> > Not sure what limitations you're concerned about. > >> > > >> > >> Mostly related to stack. The flow key OVS uses (struct sw_flow_key) > >> is 464 byte. We trim a lot, now around 300 byte, but still huge, considering > >> the BPF's stack limit is 512 byte. > > > > have you tried using per-cpu array of one element with large value > > instead of stack? > > yes, now we store the flow key in percpu array with 1 element. > > > In the latest verifier most of the operations that can be done with the stack > > pointer can be done with pointer to map value too. > > > Once the flow key is stored in map, another eBPF program > needs to use that key to lookup flow table (another map). > So we have to store the flow key on stack first, in order to > use it as key to lookup the flow table map. > > Is there a way to work around it? d71962f ("bpf: allow map helpers access to map values directly") removes that limitation from the verifier and should allow you to use map values as map keys directly. 4.18-rc1 has it. > Thanks > William -=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#1360): https://lists.iovisor.org/g/iovisor-dev/message/1360 Mute This Topic: https://lists.iovisor.org/mt/22656941/1132507 Group Owner: iovisor-dev+owner-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub [glki-iovisor-dev@m.gmane.org] -=-=-=-=-=-=-=-=-=-=-=-