From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Borkmann Subject: Re: [PATCH net-next v2 02/15] bpf: offload: add infrastructure for loading programs for a specific netdev Date: Mon, 06 Nov 2017 18:32:45 +0100 Message-ID: <5A009CBD.1080800@iogearbox.net> References: <20171103205630.1083-1-jakub.kicinski@netronome.com> <20171103205630.1083-3-jakub.kicinski@netronome.com> Mime-Version: 1.0 Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: oss-drivers@netronome.com, alexei.starovoitov@gmail.com To: Jakub Kicinski , netdev@vger.kernel.org Return-path: Received: from www62.your-server.de ([213.133.104.62]:45362 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753430AbdKFRcu (ORCPT ); Mon, 6 Nov 2017 12:32:50 -0500 In-Reply-To: <20171103205630.1083-3-jakub.kicinski@netronome.com> Sender: netdev-owner@vger.kernel.org List-ID: On 11/03/2017 09:56 PM, Jakub Kicinski wrote: > The fact that we don't know which device the program is going > to be used on is quite limiting in current eBPF infrastructure. > We have to reverse or limit the changes which kernel makes to > the loaded bytecode if we want it to be offloaded to a networking > device. We also have to invent new APIs for debugging and > troubleshooting support. > > Make it possible to load programs for a specific netdev. This > helps us to bring the debug information closer to the core > eBPF infrastructure (e.g. we will be able to reuse the verifer > log in device JIT). It allows device JITs to perform translation > on the original bytecode. > > __bpf_prog_get() when called to get a reference for an attachment > point will now refuse to give it if program has a device assigned. > Following patches will add a version of that function which passes > the expected netdev in. @type argument in __bpf_prog_get() is > renamed to attach_type to make it clearer that it's only set on > attachment. > > All calls to ndo_bpf are protected by rtnl, only verifier callbacks > are not. We need a wait queue to make sure netdev doesn't get > destroyed while verifier is still running and calling its driver. > > Signed-off-by: Jakub Kicinski > Reviewed-by: Simon Horman > Reviewed-by: Quentin Monnet First of all, great work, I went over the series and I really like the outcome. It's applied already anyway, but two minor comments further below. [...] > @@ -1549,6 +1555,8 @@ static void bpf_prog_free_deferred(struct work_struct *work) > struct bpf_prog_aux *aux; > > aux = container_of(work, struct bpf_prog_aux, work); > + if (bpf_prog_is_dev_bound(aux)) > + bpf_prog_offload_destroy(aux->prog); > bpf_jit_free(aux->prog); > } [...] > +static int bpf_offload_notification(struct notifier_block *notifier, > + ulong event, void *ptr) > +{ > + struct net_device *netdev = netdev_notifier_info_to_dev(ptr); > + struct bpf_dev_offload *offload, *tmp; > + > + ASSERT_RTNL(); > + > + switch (event) { > + case NETDEV_UNREGISTER: > + list_for_each_entry_safe(offload, tmp, &bpf_prog_offload_devs, > + offloads) { > + if (offload->netdev == netdev) > + __bpf_prog_offload_destroy(offload->prog); We would be calling this twice, right? Once here and then on prog destruction again. __bpf_prog_offload_destroy() looks it will handle this just fine, but we should probably add a comment to __bpf_prog_offload_destroy() such that when changes are made to it it's obvious that we need to be extra careful. [...] > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 323be2473c4b..1574b9f0f24e 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -824,7 +824,10 @@ static int find_prog_type(enum bpf_prog_type type, struct bpf_prog *prog) > if (type >= ARRAY_SIZE(bpf_prog_types) || !bpf_prog_types[type]) > return -EINVAL; > > - prog->aux->ops = bpf_prog_types[type]; > + if (!bpf_prog_is_dev_bound(prog->aux)) > + prog->aux->ops = bpf_prog_types[type]; > + else > + prog->aux->ops = &bpf_offload_prog_ops; > prog->type = type; > return 0; > } > @@ -1054,7 +1057,7 @@ struct bpf_prog *bpf_prog_inc_not_zero(struct bpf_prog *prog) > } > EXPORT_SYMBOL_GPL(bpf_prog_inc_not_zero); > > -static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type) > +static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type) > { > struct fd f = fdget(ufd); > struct bpf_prog *prog; > @@ -1062,7 +1065,7 @@ static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *type) > prog = ____bpf_prog_get(f); > if (IS_ERR(prog)) > return prog; > - if (type && prog->type != *type) { > + if (attach_type && (prog->type != *attach_type || prog->aux->offload)) { > prog = ERR_PTR(-EINVAL); > goto out; > } > @@ -1089,7 +1092,7 @@ struct bpf_prog *bpf_prog_get_type(u32 ufd, enum bpf_prog_type type) > EXPORT_SYMBOL_GPL(bpf_prog_get_type); > > /* last field in 'union bpf_attr' used by this command */ > -#define BPF_PROG_LOAD_LAST_FIELD prog_name > +#define BPF_PROG_LOAD_LAST_FIELD prog_target_ifindex For program types that are neither XDP nor cls_bpf, we should reject the request if something calls bpf(2) with non-0 prog_target_ifindex. That way, i) we don't burn the whole field and could perhaps reuse/union it for other prog types like tracing in future. Probably makes sense to do anyway since ii) for types like tracing, we would want to reject this upfront here and not when later attach happens. I probably missed something when reading the code, but if I spotted that correctly, we might otherwise even go and nfp-jit simple progs for non-networking types (we would bail out later though on in __bpf_prog_get() ... but we shouldn't let syscall return in first place)? > static int bpf_prog_load(union bpf_attr *attr) > { > @@ -1152,6 +1155,12 @@ static int bpf_prog_load(union bpf_attr *attr) > atomic_set(&prog->aux->refcnt, 1); > prog->gpl_compatible = is_gpl ? 1 : 0; > > + if (attr->prog_target_ifindex) { > + err = bpf_prog_offload_init(prog, attr); > + if (err) > + goto free_prog; > + } > + > /* find program type: socket_filter vs tracing_filter */ > err = find_prog_type(type, prog); [...]