From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Andy Gospodarek <andy@greyhouse.net>
Cc: Alexei Starovoitov <ast@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Daniel Borkmann <borkmann@iogearbox.net>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"xdp-newbies@vger.kernel.org" <xdp-newbies@vger.kernel.org>,
brouer@redhat.com
Subject: Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
Date: Thu, 27 Apr 2017 10:41:21 +0200 [thread overview]
Message-ID: <20170427104121.32df2178@redhat.com> (raw)
In-Reply-To: <20170426205544.GA40859@C02RW35GFVH8>
On Wed, 26 Apr 2017 16:55:44 -0400
Andy Gospodarek <andy@greyhouse.net> wrote:
> On Wed, Apr 26, 2017 at 10:58:45AM -0700, Alexei Starovoitov wrote:
> > On 4/26/17 9:35 AM, John Fastabend wrote:
> > >
> > > > As Alexei also mentioned before, ifindex vs port makes no real
> > > > difference seen from the bpf program side. It is userspace's
> > > > responsibility to add ifindex/port's to the bpf-maps, according to how
> > > > the bpf program "policy" want to "connect" these ports. The
> > > > port-table system add one extra step, of also adding this port to the
> > > > port-table (which lives inside the kernel).
> > > >
> > >
> > > I'm not sure I understand the "lives inside the kernel" bit. I assumed
> > > the 'map' should be a bpf map and behave like any other bpf map.
> > >
> > > I wanted a new map to be defined, something like this from the bpf programmer
> > > side.
> > >
> > > struct bpf_map_def SEC("maps") port_table =
> > > .type = BPF_MAP_TYPE_PORT_CONNECTION,
> > > .key_size = sizeof(u32),
> > > .value_size = BPF_PORT_CONNECTION_SIZE,
> > > .max_entries = 256,
> > > };
> >
> > I like the idea.
> > We have prog_array, perf_event_array, cgroup_array map specializations.
> > This one can be new netdev_array with some new bpf_redirect-like helper
> > accessing it.
> >
> > > > When loading the XDP program, we also need to pass along a port table
> > > > "id" this XDP program is associated with (and if it doesn't exists you
> > > > create it). And your userspace "control-plane" application also need
> > > > to know this port table "id", when adding a new port.
> > >
> > > So the user space application that is loading the program also needs
> > > to handle this map. This seems correct to me. But I don't see the
> > > value in making some new port table when we already have well understood
> > > framework for maps.
> >
> > +1
> >
> > > >
> > > > The concept of having multiple port tables is key. As this implies we
> > > > can have several simultaneous "data-planes" that is *isolated* from
> > > > each-other. Think about how network-namespaces/containers want
> > > > isolation. A subtle thing I'm afraid to mention, is that oppose to the
> > > > ifindex model, a port table with mapping to a net_device pointer, would
> > > > allow (faster) delivery into the container's inner net_device, which
> > > > sort of violates the isolation, but I would argue it is not a problem
> > > > as this net_device pointer could only be added from a process within the
> > > > namespace. I like this feature, but it could easily be disallowed via
> > > > port insertion-time validation.
> > > >
> > >
> > > I think the above optimization should be allowed. And agree multiple port
> > > tables (maps?) is needed. Again all this points to using standard maps
> > > logic in my mind. For permissions and different domains, which I think
> > > you were starting to touch on, it looks like we could extend the pinning API.
> > > At the moment it does an inode_permission(inode, MAY_WRITE) check but I
> > > presume this could be extended. None of this would be needed in v1 and
> > > could be added subsequently. read-only maps seems doable.
> >
> > this is great idea. Once BPF_MAP_TYPE_NETDEV_ARRAY is populated
> > the user space can make it readonly to prevent further changes.
> >
> > From user space it can be done similar to perf_events/cgroups as well.
> > bpf_map_update_elem(&netdev_array, &port_num, &ifindex)
> > should work.
> > For bpf_map_lookup_elem() from such netdev_array we can return
> > ifindex back.
> > The bpf_map_show_fdinfo() can be customized as well to pretty print
> > ifindexes of netdevs stored in there.
> >
>
> I agree with both of you on all of these points. Having the port
> redirection in a new type of map and/or array seems like the way to go.
>
> I understood Jesper's perspecitive when thinking about a way to pass a
> port-table id down, but I think the idea that the userspace loader code
> defining the maps is going to be the one making this link is the right
> idea and handling things like ifindex changes (rather than identifiers
> that perform lookups in other tables) is going to have to be yet another
> exercise left up to the...user. :-)
>
I love this idea. Integrating the port table closer with the bpf-maps
infrastructure makes sense. This gives me a place to hook the code into,
instead of (re)inventing a new infrastructure for this port table, and the
interface will be more natural from a bpf-API point of view.
When registering/attaching a XDP/bpf program, we would just send the
file-descriptor for this port-map along (like we do with the bpf_prog
FD). Plus, it own ingress-port number this program is in the port-map.
It is not clear to me, in-which-data-structure on the kernel-side we
store this reference to the port-map and ingress-port. As today we only
have the "raw" struct bpf_prog pointer. I see several options:
1. Create a new xdp_prog struct that contains existing bpf_prog,
a port-map pointer and ingress-port. (IMHO easiest solution)
2. Just create a new pointer to port-map and store it in driver rx-ring
struct (like existing bpf_prog), but this create a race-challenge
replacing (cmpxchg) the program (or perhaps it's not a problem as it
runs under rcu and RTNL-lock).
3. Extend bpf_prog to store this port-map and ingress-port, and have a
fast-way to access it. I assume it will be accessible via
bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
next prev parent reply other threads:[~2017-04-27 8:41 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-18 19:58 XDP question: best API for returning/setting egress port? Jesper Dangaard Brouer
2017-04-18 20:54 ` John Fastabend
2017-04-19 12:00 ` Jesper Dangaard Brouer
2017-04-19 12:33 ` Daniel Borkmann
2017-04-19 15:24 ` Jesper Dangaard Brouer
2017-04-19 12:25 ` Hannes Frederic Sowa
2017-04-19 20:02 ` Andy Gospodarek
2017-04-19 21:42 ` Daniel Borkmann
2017-04-20 17:12 ` Andy Gospodarek
2017-04-19 22:51 ` Daniel Borkmann
2017-04-20 2:56 ` xdp_redirect ifindex vs port. Was: " Alexei Starovoitov
2017-04-20 4:38 ` John Fastabend
2017-04-20 4:58 ` Alexei Starovoitov
2017-04-20 5:14 ` John Fastabend
2017-04-20 6:10 ` Jesper Dangaard Brouer
2017-04-20 17:10 ` Alexei Starovoitov
2017-04-25 9:34 ` Jesper Dangaard Brouer
2017-04-26 0:26 ` Alexei Starovoitov
2017-04-26 3:07 ` John Fastabend
2017-04-26 9:11 ` Jesper Dangaard Brouer
2017-04-26 16:35 ` John Fastabend
2017-04-26 17:58 ` Alexei Starovoitov
2017-04-26 20:55 ` Andy Gospodarek
2017-04-27 8:41 ` Jesper Dangaard Brouer [this message]
2017-04-27 23:31 ` Alexei Starovoitov
2017-04-28 5:06 ` John Fastabend
2017-04-28 5:30 ` Alexei Starovoitov
2017-04-28 19:43 ` Hannes Frederic Sowa
2017-04-30 1:35 ` Alexei Starovoitov
2017-04-28 10:58 ` Jesper Dangaard Brouer
2017-04-30 1:04 ` Alexei Starovoitov
2017-04-30 22:55 ` John Fastabend
2017-04-20 6:39 ` XDP question: " Jesper Dangaard Brouer
2017-04-20 4:43 ` John Fastabend
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170427104121.32df2178@redhat.com \
--to=brouer@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andy@greyhouse.net \
--cc=ast@fb.com \
--cc=borkmann@iogearbox.net \
--cc=daniel@iogearbox.net \
--cc=john.fastabend@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=xdp-newbies@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).