From: "Paul Chaignon" <paul.chaignon-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: William Tu <u9012063-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Alexei Starovoitov
<alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
"<dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org>"
<dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org>,
Tom Herbert via iovisor-dev
<iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org>,
Linux Kernel Network Developers
<netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Subject: Re: [RFC PATCH 00/11] OVS eBPF datapath.
Date: Thu, 5 Jul 2018 14:12:03 +0200 [thread overview]
Message-ID: <20180705121202.GA27625@Nover> (raw)
In-Reply-To: <CALDO+SYUPgCtQqGkRepK5YTKZegAg3-DKWcssoqXwGEqgsodgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Wed, Jul 04, 2018 at 07:25:50PM -0700, William Tu wrote:
> On Tue, Jul 3, 2018 at 10:56 AM, Alexei Starovoitov
> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Thu, Jun 28, 2018 at 07:19:35AM -0700, William Tu wrote:
> >> Hi Alexei,
> >>
> >> Thanks a lot for the feedback!
> >>
> >> On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov
> >> <alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> >> > On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote:
> >> >>
> >> >> Discussion
> >> >> ==========
> >> >> We are still actively working on finishing the feature, currently
> >> >> the basic forwarding and tunnel feature work, but still under
> >> >> heavy debugging and development. The purpose of this RFC is to
> >> >> get some early feedbacks and direction for finishing the complete
> >> >> features in existing kernel's OVS datapath (the net/openvswitch/*).
> >> >
> >> > Thank you for sharing the patches.
> >> >
> >> >> Three major issues we are worried:
> >> >> a. Megaflow support in BPF.
> >> >> b. Connection Tracking support in BPF.
> >> >
> >> > my opinion on the above two didn't change.
> >> > To recap:
> >> > A. Non scalable megaflow map is no go. I'd like to see packet classification
> >> > algorithm like hicuts or efficuts to be implemented instead, since it can be
> >> > shared by generic bpf, bpftiler, ovs and likely others.
> >>
> >> We did try the decision tree approach using dpdk's acl lib. The lookup
> >> speed is 6 times faster than the magaflow using tuple space.
> >> However, the update/insertion requires rebuilding/re-balancing the decision
> >> tree so it's way too slow. I think hicuts or efficuts suffers the same issue.
> >> So decision tree algos are scalable only for lookup operation due to its
> >> optimization over tree depth, but not scalable under
> >> update/insert/delete operations.
> >>
> >> On customer's system we see megaflow update/insert rate around 10 rules/sec,
> >> this makes decision tree unusable, unless we invent something to optimize the
> >> update/insert time or incremental update of these decision tree algo.
> >
> > is this a typo? you probably meant 10K rule updates a second ?
> I mean "new" rules being added at 10 rules/sec.
> Update rate might be much higher.
>
> > Last time I've dealt with these algorithms we had 100K acl updates a second.
> > It was an important metric that we were optimizing for.
> > I'm pretty sure '*cuts' algos do many thousands per second non optimized.
>
> When adding a new rule, do these algorithms require rebuilding the
> entire tree?
>
> In our evaluation, updating an existing entry in the decision tree
> performs OK, because it is equal to lookup and replace, and lookup
> is fast, update is just atomic swap. But inserting a new rule is slow,
> because it requires re-building the tree using all existing rules.
> And we see new rule being added at rate 10 rules per second.
> So we are constantly rebuilding the entire tree.
>
> If the entire tree has 100k of rules, it takes around 2 seconds to rebuild,
> based on the dpdk acl library. And without an incremental algorithm,
> adding 1 new rule will trigger rebuilding the 100k of rules, and it is too slow.
>
> Reading through HyperCuts and EffiCuts, I'm not sure how it supports
> incrementally adding a new rule, without rebuilding the entire tree.
> http://ccr.sigcomm.org/online/files/p207.pdf
> http://cseweb.ucsd.edu/~susingh/papers/hyp-sigcomm03.pdf
>
> The HyperCuts papers says
> "A fast update algorithm can also be implemented; however we do not
> go into the details of incremental update in this paper"
>
> >
> >> >> c. Verifier limitation.
> >> >
> >> > Not sure what limitations you're concerned about.
> >> >
> >>
> >> Mostly related to stack. The flow key OVS uses (struct sw_flow_key)
> >> is 464 byte. We trim a lot, now around 300 byte, but still huge, considering
> >> the BPF's stack limit is 512 byte.
> >
> > have you tried using per-cpu array of one element with large value
> > instead of stack?
>
> yes, now we store the flow key in percpu array with 1 element.
>
> > In the latest verifier most of the operations that can be done with the stack
> > pointer can be done with pointer to map value too.
> >
> Once the flow key is stored in map, another eBPF program
> needs to use that key to lookup flow table (another map).
> So we have to store the flow key on stack first, in order to
> use it as key to lookup the flow table map.
>
> Is there a way to work around it?
d71962f ("bpf: allow map helpers access to map values directly") removes
that limitation from the verifier and should allow you to use map values
as map keys directly. 4.18-rc1 has it.
> Thanks
> William
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#1360): https://lists.iovisor.org/g/iovisor-dev/message/1360
Mute This Topic: https://lists.iovisor.org/mt/22656941/1132507
Group Owner: iovisor-dev+owner-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org
Unsubscribe: https://lists.iovisor.org/g/iovisor-dev/unsub [glki-iovisor-dev@m.gmane.org]
-=-=-=-=-=-=-=-=-=-=-=-
next prev parent reply other threads:[~2018-07-05 12:12 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1529756203-77067-1-git-send-email-u9012063@gmail.com>
2018-06-28 3:00 ` [iovisor-dev] [RFC PATCH 00/11] OVS eBPF datapath Alexei Starovoitov
[not found] ` <20180628030023.pkgdu4brun3jy2bz-+o4/htvd0TCa6kscz5V53/3mLCh9rsb+VpNB7YpNyf8@public.gmane.org>
2018-06-28 14:19 ` William Tu
[not found] ` <CALDO+SYzDDpTmJttghfjUYKbo3AHDaT4L154Acwn5BGqkytkHQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-07-03 17:56 ` Alexei Starovoitov
2018-07-05 2:25 ` [iovisor-dev] " William Tu
[not found] ` <CALDO+SYUPgCtQqGkRepK5YTKZegAg3-DKWcssoqXwGEqgsodgw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-07-05 12:12 ` Paul Chaignon [this message]
2018-07-05 15:32 ` William Tu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180705121202.GA27625@Nover \
--to=paul.chaignon-re5jqeeqqe8avxtiumwx3w@public.gmane.org \
--cc=alexei.starovoitov-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=dev-yBygre7rU0TnMu66kgdUjQ@public.gmane.org \
--cc=iovisor-dev-9jONkmmOlFHEE9lA1F8Ukti2O/JbrIOy@public.gmane.org \
--cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=u9012063-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).