From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Paul Chaignon" <paul.chaignon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: [RFC PATCH 00/11] OVS eBPF datapath.
Date: Thu, 5 Jul 2018 14:12:03 +0200
Message-ID: <20180705121202.GA27625@Nover>
References: <1529756203-77067-1-git-send-email-u9012063@gmail.com>
 <20180628030023.pkgdu4brun3jy2bz@ast-mbp.dhcp.thefacebook.com>
 <CALDO+SYzDDpTmJttghfjUYKbo3AHDaT4L154Acwn5BGqkytkHQ@mail.gmail.com>
 <20180703175623.fqrrtppjsjimznvv@ast-mbp.dhcp.thefacebook.com>
 <CALDO+SYUPgCtQqGkRepK5YTKZegAg3-DKWcssoqXwGEqgsodgw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Alexei Starovoitov <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"<dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org>" <dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org>,
	Tom Herbert via iovisor-dev <iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>,
	Linux Kernel Network Developers <netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
To: William Tu <u9012063-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Return-path: <bounce+19056+1360+1132507+2590247-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>
In-Reply-To: <CALDO+SYUPgCtQqGkRepK5YTKZegAg3-DKWcssoqXwGEqgsodgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
List-Unsubscribe: <https://lists.iovisor.org/g/iovisor-dev/unsub>
Sender: iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org
List-Post: <mailto:iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>
Content-Disposition: inline
List-Id: netdev.vger.kernel.org

On Wed, Jul 04, 2018 at 07:25:50PM -0700, William Tu wrote:
> On Tue, Jul 3, 2018 at 10:56 AM, Alexei Starovoitov
> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Thu, Jun 28, 2018 at 07:19:35AM -0700, William Tu wrote:
> >> Hi Alexei,
> >>
> >> Thanks a lot for the feedback!
> >>
> >> On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov
> >> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> > On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote:
> >> >>
> >> >> Discussion
> >> >> ==========
> >> >> We are still actively working on finishing the feature, currently
> >> >> the basic forwarding and tunnel feature work, but still under
> >> >> heavy debugging and development.  The purpose of this RFC is to
> >> >> get some early feedbacks and direction for finishing the complete
> >> >> features in existing kernel's OVS datapath (the net/openvswitch/*).
> >> >
> >> > Thank you for sharing the patches.
> >> >
> >> >> Three major issues we are worried:
> >> >>   a. Megaflow support in BPF.
> >> >>   b. Connection Tracking support in BPF.
> >> >
> >> > my opinion on the above two didn't change.
> >> > To recap:
> >> > A. Non scalable megaflow map is no go. I'd like to see packet classification
> >> > algorithm like hicuts or efficuts to be implemented instead, since it can be
> >> > shared by generic bpf, bpftiler, ovs and likely others.
> >>
> >> We did try the decision tree approach using dpdk's acl lib. The lookup
> >> speed is 6 times faster than the magaflow using tuple space.
> >> However, the update/insertion requires rebuilding/re-balancing the decision
> >> tree so it's way too slow. I think hicuts or efficuts suffers the same issue.
> >> So decision tree algos are scalable only for lookup operation due to its
> >> optimization over tree depth, but not scalable under
> >> update/insert/delete operations.
> >>
> >> On customer's system we see megaflow update/insert rate around 10 rules/sec,
> >> this makes decision tree unusable, unless we invent something to optimize the
> >> update/insert time or incremental update of these decision tree algo.
> >
> > is this a typo? you probably meant 10K rule updates a second ?
> I mean "new" rules being added at 10 rules/sec.
> Update rate might be much higher.
> 
> > Last time I've dealt with these algorithms we had 100K acl updates a second.
> > It was an important metric that we were optimizing for.
> > I'm pretty sure '*cuts' algos do many thousands per second non optimized.
> 
> When adding a new rule, do these algorithms require rebuilding the
> entire tree?
> 
> In our evaluation, updating an existing entry in the decision tree
> performs OK, because it is equal to lookup and replace, and lookup
> is fast, update is just atomic swap. But inserting a new rule is slow,
> because it requires re-building the tree using all existing rules.
> And we see new rule being added at rate 10 rules per second.
> So we are constantly rebuilding the entire tree.
> 
> If the entire tree has 100k of rules, it takes around 2 seconds to rebuild,
> based on the dpdk acl library.  And without an incremental algorithm,
> adding 1 new rule will trigger rebuilding the 100k of rules, and it is too slow.
> 
> Reading through HyperCuts and EffiCuts, I'm not sure how it supports
> incrementally adding a new rule, without rebuilding the entire tree.
> http://ccr.sigcomm.org/online/files/p207.pdf
> http://cseweb.ucsd.edu/~susingh/papers/hyp-sigcomm03.pdf
> 
> The HyperCuts papers says
> "A fast update algorithm can also be implemented; however we do not
> go into the details of incremental update in this paper"
> 
> >
> >> >>   c. Verifier limitation.
> >> >
> >> > Not sure what limitations you're concerned about.
> >> >
> >>
> >> Mostly related to stack.  The flow key OVS uses (struct sw_flow_key)
> >> is 464 byte. We trim a lot, now around 300 byte, but still huge, considering
> >> the BPF's stack limit is 512 byte.
> >
> > have you tried using per-cpu array of one element with large value
> > instead of stack?
> 
> yes, now we store the flow key in percpu array with 1 element.
> 
> > In the latest verifier most of the operations that can be done with the stack
> > pointer can be done with pointer to map value too.
> >
> Once the flow key is stored in map, another eBPF program
> needs to use that key to lookup flow table (another map).
> So we have to store the flow key on stack first, in order to
> use it as key to lookup the flow table map.
> 
> Is there a way to work around it?

d71962f ("bpf: allow map helpers access to map values directly") removes
that limitation from the verifier and should allow you to use map values
as map keys directly.  4.18-rc1 has it.

> Thanks
> William

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#1360): https://lists.iovisor.org/g/iovisor-dev/message/1360
Mute This Topic: https://lists.iovisor.org/mt/22656941/1132507
Group Owner: iovisor-dev+owner-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org
Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub  [glki-iovisor-dev@m.gmane.org]
-=-=-=-=-=-=-=-=-=-=-=-