From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Stringer Subject: Re: [PATCH bpf-next 07/11] bpf: Add helper to retrieve socket in BPF Date: Thu, 13 Sep 2018 14:24:03 -0700 Message-ID: References: <20180913210158.b5r53sk6vy6vcj52@ast-mbp> <20180913212205.ght2mompuoyuhd4g@ast-mbp> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Joe Stringer , daniel@iogearbox.net, netdev , ast@kernel.org, john fastabend , tgraf@suug.ch, Martin KaFai Lau , Nitin Hande , mauricio.vasquez@polito.it To: Alexei Starovoitov Return-path: Received: from mail-qt0-f194.google.com ([209.85.216.194]:36466 "EHLO mail-qt0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726824AbeINCfb (ORCPT ); Thu, 13 Sep 2018 22:35:31 -0400 Received: by mail-qt0-f194.google.com with SMTP id t5-v6so6874285qtn.3 for ; Thu, 13 Sep 2018 14:24:15 -0700 (PDT) In-Reply-To: <20180913212205.ght2mompuoyuhd4g@ast-mbp> Sender: netdev-owner@vger.kernel.org List-ID: On Thu, 13 Sep 2018 at 14:22, Alexei Starovoitov wrote: > > On Thu, Sep 13, 2018 at 02:17:17PM -0700, Joe Stringer wrote: > > On Thu, 13 Sep 2018 at 14:02, Alexei Starovoitov > > wrote: > > > > > > On Thu, Sep 13, 2018 at 01:55:01PM -0700, Joe Stringer wrote: > > > > On Thu, 13 Sep 2018 at 12:06, Alexei Starovoitov > > > > wrote: > > > > > > > > > > On Wed, Sep 12, 2018 at 5:06 PM, Alexei Starovoitov > > > > > wrote: > > > > > > On Tue, Sep 11, 2018 at 05:36:36PM -0700, Joe Stringer wrote: > > > > > >> This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and > > > > > >> bpf_sk_lookup_udp() which allows BPF programs to find out if there is a > > > > > >> socket listening on this host, and returns a socket pointer which the > > > > > >> BPF program can then access to determine, for instance, whether to > > > > > >> forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the > > > > > >> socket, so when a BPF program makes use of this function, it must > > > > > >> subsequently pass the returned pointer into the newly added sk_release() > > > > > >> to return the reference. > > > > > >> > > > > > >> By way of example, the following pseudocode would filter inbound > > > > > >> connections at XDP if there is no corresponding service listening for > > > > > >> the traffic: > > > > > >> > > > > > >> struct bpf_sock_tuple tuple; > > > > > >> struct bpf_sock_ops *sk; > > > > > >> > > > > > >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet > > > > > >> sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0); > > > > > > ... > > > > > >> +struct bpf_sock_tuple { > > > > > >> + union { > > > > > >> + __be32 ipv6[4]; > > > > > >> + __be32 ipv4; > > > > > >> + } saddr; > > > > > >> + union { > > > > > >> + __be32 ipv6[4]; > > > > > >> + __be32 ipv4; > > > > > >> + } daddr; > > > > > >> + __be16 sport; > > > > > >> + __be16 dport; > > > > > >> + __u8 family; > > > > > >> +}; > > > > > > > > > > > > since we can pass ptr_to_packet into map lookup and other helpers now, > > > > > > can you move 'family' out of bpf_sock_tuple and combine with netns_id arg? > > > > > > then progs wouldn't need to copy bytes from the packet into tuple > > > > > > to do a lookup. > > > > > > > > If I follow, you're proposing that users should be able to pass a > > > > pointer to the source address field of the L3 header, and assuming > > > > that the L3 header ends with saddr+daddr (no options/extheaders), and > > > > is immediately followed by the sport/dport then a packet pointer > > > > should work for performing socket lookup. Then it is up to the BPF > > > > program writer to ensure that this is the case, or otherwise fall back > > > > to populating a copy of the sock tuple on the stack. > > > > > > yep. > > > > > > > > have been thinking more about it. > > > > > since only ipv4 and ipv6 supported may be use size of bpf_sock_tuple > > > > > to infer family inside the helper, so it doesn't need to be passed explicitly? > > > > > > > > Let me make sure I understand the proposal here. > > > > > > > > The current structure and function prototypes are: > > > > > > > > struct bpf_sock_tuple { > > > > union { > > > > __be32 ipv6[4]; > > > > __be32 ipv4; > > > > } saddr; > > > > union { > > > > __be32 ipv6[4]; > > > > __be32 ipv4; > > > > } daddr; > > > > __be16 sport; > > > > __be16 dport; > > > > __u8 family; > > > > }; > > > ... > > > > You're proposing something like: > > > > > > > > struct bpf_sock_tuple4 { > > > > __be32 saddr; > > > > __be32 daddr; > > > > __be16 sport; > > > > __be16 dport; > > > > __u8 family; > > > > }; > > > > > > > > struct bpf_sock_tuple6 { > > > > __be32 saddr[4]; > > > > __be32 daddr[4]; > > > > __be16 sport; > > > > __be16 dport; > > > > __u8 family; > > > > }; > > > > > > I think the split is unnecessary. > > > I'm proposing: > > > struct bpf_sock_tuple { > > > union { > > > __be32 ipv6[4]; > > > __be32 ipv4; > > > } saddr; > > > union { > > > __be32 ipv6[4]; > > > __be32 ipv4; > > > } daddr; > > > __be16 sport; > > > __be16 dport; > > > }; > > > > > > that points directly into the packet (when ipv4 options are not there) > > > and bpf_sk_lookup_tcp() uses 'size' argument to figure out ipv4/ipv6 family. > > > > Needs to be subtly different, the 'sport'/'dport' offset would be > > wrong in the IPv4 case otherwise: > > ahh. right. > > > > > We could take my definitions above and do the following if we want to > > try to type the helper definition: > > > > union bpf_sock_tuple { > > struct bpf_sock_tuple4 t4; > > struct bpf_sock_tuple6 t6; > > }; > > yes. sounds great to me. Much better than 'void *' in the helper. Could even do something like this: $ cat foo.c #include struct bpf_sock_tuple { union { struct { __be32 saddr; __be32 daddr; __be16 sport; __be16 dport; } ipv4; struct { __be32 saddr[4]; __be32 daddr[4]; __be16 sport; __be16 dport; } ipv6; }; }; int main(int argc, char *argv[]) { struct bpf_sock_tuple tuple; return 0; } $ gcc -g ./foo.c -o foo.o $ pahole foo.o struct bpf_sock_tuple { union { struct { __be32 saddr; /* 0 4 */ __be32 daddr; /* 4 4 */ __be16 sport; /* 8 2 */ __be16 dport; /* 10 2 */ } ipv4; /* 12 */ struct { __be32 saddr[4]; /* 0 16 */ __be32 daddr[4]; /* 16 16 */ __be16 sport; /* 32 2 */ __be16 dport; /* 34 2 */ } ipv6; /* 36 */ }; /* 0 36 */ /* size: 36, cachelines: 1, members: 1 */ /* last cacheline: 36 bytes */ };