Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jesper Dangaard Brouer <brouer@redhat.com>
To: Andy Gospodarek <andy@greyhouse.net>
Cc: Alexei Starovoitov <ast@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Daniel Borkmann <borkmann@iogearbox.net>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"xdp-newbies@vger.kernel.org" <xdp-newbies@vger.kernel.org>,
	brouer@redhat.com
Subject: Re: xdp_redirect ifindex vs port. Was: best API for returning/setting egress port?
Date: Thu, 27 Apr 2017 10:41:21 +0200	[thread overview]
Message-ID: <20170427104121.32df2178@redhat.com> (raw)
In-Reply-To: <20170426205544.GA40859@C02RW35GFVH8>

On Wed, 26 Apr 2017 16:55:44 -0400
Andy Gospodarek <andy@greyhouse.net> wrote:

> On Wed, Apr 26, 2017 at 10:58:45AM -0700, Alexei Starovoitov wrote:
> > On 4/26/17 9:35 AM, John Fastabend wrote:  
> > >   
> > > > As Alexei also mentioned before, ifindex vs port makes no real
> > > > difference seen from the bpf program side.  It is userspace's
> > > > responsibility to add ifindex/port's to the bpf-maps, according to how
> > > > the bpf program "policy" want to "connect" these ports.  The
> > > > port-table system add one extra step, of also adding this port to the
> > > > port-table (which lives inside the kernel).
> > > >   
> > > 
> > > I'm not sure I understand the "lives inside the kernel" bit. I assumed
> > > the 'map' should be a bpf map and behave like any other bpf map.
> > > 
> > > I wanted a new map to be defined, something like this from the bpf programmer
> > > side.
> > > 
> > > struct bpf_map_def SEC("maps") port_table =
> > > 	.type = BPF_MAP_TYPE_PORT_CONNECTION,
> > > 	.key_size = sizeof(u32),
> > > 	.value_size = BPF_PORT_CONNECTION_SIZE,
> > > 	.max_entries = 256,
> > > };  
> > 
> > I like the idea.
> > We have prog_array, perf_event_array, cgroup_array map specializations.
> > This one can be new netdev_array with some new bpf_redirect-like helper
> > accessing it.
> >   
> > > > When loading the XDP program, we also need to pass along a port table
> > > > "id" this XDP program is associated with (and if it doesn't exists you
> > > > create it).  And your userspace "control-plane" application also need
> > > > to know this port table "id", when adding a new port.  
> > > 
> > > So the user space application that is loading the program also needs
> > > to handle this map. This seems correct to me. But I don't see the
> > > value in making some new port table when we already have well understood
> > > framework for maps.  
> > 
> > +1
> >   
> > > > 
> > > > The concept of having multiple port tables is key.  As this implies we
> > > > can have several simultaneous "data-planes" that is *isolated* from
> > > > each-other.  Think about how network-namespaces/containers want
> > > > isolation. A subtle thing I'm afraid to mention, is that oppose to the
> > > > ifindex model, a port table with mapping to a net_device pointer, would
> > > > allow (faster) delivery into the container's inner net_device, which
> > > > sort of violates the isolation, but I would argue it is not a problem
> > > > as this net_device pointer could only be added from a process within the
> > > > namespace.  I like this feature, but it could easily be disallowed via
> > > > port insertion-time validation.
> > > >   
> > > 
> > > I think the above optimization should be allowed. And agree multiple port
> > > tables (maps?) is needed. Again all this points to using standard maps
> > > logic in my mind. For permissions and different domains, which I think
> > > you were starting to touch on, it looks like we could extend the pinning API.
> > > At the moment it does an inode_permission(inode, MAY_WRITE) check but I
> > > presume this could be extended. None of this would be needed in v1 and
> > > could be added subsequently. read-only maps seems doable.  
> > 
> > this is great idea. Once BPF_MAP_TYPE_NETDEV_ARRAY is populated
> > the user space can make it readonly to prevent further changes.
> > 
> > From user space it can be done similar to perf_events/cgroups as well.
> > bpf_map_update_elem(&netdev_array, &port_num, &ifindex)
> > should work.
> > For bpf_map_lookup_elem() from such netdev_array we can return
> > ifindex back.
> > The bpf_map_show_fdinfo() can be customized as well to pretty print
> > ifindexes of netdevs stored in there.
> >   
> 
> I agree with both of you on all of these points.  Having the port
> redirection in a new type of map and/or array seems like the way to go.
> 
> I understood Jesper's perspecitive when thinking about a way to pass a
> port-table id down, but I think the idea that the userspace loader code
> defining the maps is going to be the one making this link is the right
> idea and handling things like ifindex changes (rather than identifiers
> that perform lookups in other tables) is going to have to be yet another
> exercise left up to the...user.  :-)
> 

I love this idea. Integrating the port table closer with the bpf-maps
infrastructure makes sense.  This gives me a place to hook the code into,
instead of (re)inventing a new infrastructure for this port table, and the
interface will be more natural from a bpf-API point of view.

When registering/attaching a XDP/bpf program, we would just send the
file-descriptor for this port-map along (like we do with the bpf_prog
FD). Plus, it own ingress-port number this program is in the port-map.

It is not clear to me, in-which-data-structure on the kernel-side we
store this reference to the port-map and ingress-port. As today we only
have the "raw" struct bpf_prog pointer. I see several options:

1. Create a new xdp_prog struct that contains existing bpf_prog,
a port-map pointer and ingress-port. (IMHO easiest solution)

2. Just create a new pointer to port-map and store it in driver rx-ring
struct (like existing bpf_prog), but this create a race-challenge
replacing (cmpxchg) the program (or perhaps it's not a problem as it
runs under rcu and RTNL-lock).

3. Extend bpf_prog to store this port-map and ingress-port, and have a
fast-way to access it.  I assume it will be accessible via
bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

next prev parent reply	other threads:[~2017-04-27  8:41 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-18 19:58 XDP question: best API for returning/setting egress port? Jesper Dangaard Brouer
2017-04-18 20:54 ` John Fastabend
2017-04-19 12:00   ` Jesper Dangaard Brouer
2017-04-19 12:33     ` Daniel Borkmann
2017-04-19 15:24       ` Jesper Dangaard Brouer
2017-04-19 12:25 ` Hannes Frederic Sowa
2017-04-19 20:02 ` Andy Gospodarek
2017-04-19 21:42   ` Daniel Borkmann
2017-04-20 17:12     ` Andy Gospodarek
2017-04-19 22:51   ` Daniel Borkmann
2017-04-20  2:56     ` xdp_redirect ifindex vs port. Was: " Alexei Starovoitov
2017-04-20  4:38       ` John Fastabend
2017-04-20  4:58         ` Alexei Starovoitov
2017-04-20  5:14           ` John Fastabend
2017-04-20  6:10       ` Jesper Dangaard Brouer
2017-04-20 17:10         ` Alexei Starovoitov
2017-04-25  9:34           ` Jesper Dangaard Brouer
2017-04-26  0:26             ` Alexei Starovoitov
2017-04-26  3:07               ` John Fastabend
2017-04-26  9:11                 ` Jesper Dangaard Brouer
2017-04-26 16:35                   ` John Fastabend
2017-04-26 17:58                     ` Alexei Starovoitov
2017-04-26 20:55                       ` Andy Gospodarek
2017-04-27  8:41                         ` Jesper Dangaard Brouer [this message]
2017-04-27 23:31                           ` Alexei Starovoitov
2017-04-28  5:06                             ` John Fastabend
2017-04-28  5:30                               ` Alexei Starovoitov
2017-04-28 19:43                                 ` Hannes Frederic Sowa
2017-04-30  1:35                                   ` Alexei Starovoitov
2017-04-28 10:58                             ` Jesper Dangaard Brouer
2017-04-30  1:04                               ` Alexei Starovoitov
2017-04-30 22:55                                 ` John Fastabend
2017-04-20  6:39     ` XDP question: " Jesper Dangaard Brouer
2017-04-20  4:43   ` John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170427104121.32df2178@redhat.com \
    --to=brouer@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andy@greyhouse.net \
    --cc=ast@fb.com \
    --cc=borkmann@iogearbox.net \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=xdp-newbies@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).