All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/9 v2] netfilter: bpf base hook program generator
@ 2022-10-05 14:13 Florian Westphal
  2022-10-05 14:13 ` [RFC v2 1/9] netfilter: nf_queue: carry index in hook state Florian Westphal
                   ` (8 more replies)
  0 siblings, 9 replies; 15+ messages in thread
From: Florian Westphal @ 2022-10-05 14:13 UTC (permalink / raw)
  To: bpf; +Cc: Florian Westphal

Sending as another RFC even though patches are unchanged vs. last iteration
to provide background/context ahead of bpf office hours on Oct 6th, thus
deliberately omitting netdev@ and nf-devel@.

This series adds a bpf program generator for netfilter base hooks.
'netfilter base hooks' are c-functions that get called from the NF_HOOK()
stubs that can be found in a myriad of locations in the network stack.

Examples from ipv4 (ip_input.c):
254         return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN,
255                        net, NULL, skb, skb->dev, NULL,
256                        ip_local_deliver_finish);
[..]
564         return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING,
565                        net, NULL, skb, dev, NULL,
566                        ip_rcv_finish);

Well-known users of this facility are iptables, nftables,
but also connection tracking selinux.  Conntrack is also a greedy module,
with 5 hooks total (prerouting, input, output, postrouting) and another two
via nf_defrag(_ipv4) module dependency.

Eliding the static-key handling, NF_HOOK() expands to:

-----
struct nf_hook_entries *hooks = rcu_dereference(net->nf.hooks_ipv4[hook]);
/* where '[hook] is any one of prerouting, input, and so on */
ret = nf_hook_slow(skb, &state, hooks, 0);

if (ret == 1) /* packet is allowed to pass */
   okfn(net, sk, skb);
------

'hooks' is an array of function-address/void * arg pairs that is
iterated in nf_hook_slow():

for i in hooks[]; do
  verdict = hooks[i]->addr(hooks->[i].arg, skb, state);
  switch (verdict) { ....

Each hook can chose to toss the packet (NF_DROP), move to next hook
(NF_ACCEPT), or assume skb ownership (NF_STOLEN) and so on.

All hooks have access to the skb, to the private void *arg (used by
nf_tables and ip_tables -- the start of the user-defined ruleset to
evaluate) and a context structure that wraps extra data: incoming and
outgoing network interfaces, the net namespace the hook is registered in,
the protocol family, hook location (input, prerouting, forward, ...) ...

Even for simple iptables-filter + nat this results in multiple indirect
calls per packet.

The proposed autogenerator unrolls nf_hook_slow() and builds a bpf program
that performs those function calls sequentially, i.e.:

state->priv = hooks->[0].hook_arg;
v = firstfunction(state);
if (v != ACCEPT) goto out;
state->priv = hooks->[1].hook_arg;
v = secondfunction(state); ...
if (v != ACCEPT) goto out;

... and so on.  As the function arguments are still taken from struct net at runtime,
rather than added as constants, those programs can be shared across net namespaces if
they share the exact same registered hooks. (Example: 10 netns with iptables-filter table and
active conntrack will all share the same 5 programs (one for prerouting, input,
output and postrouting each), rather than 50 bpf programs.

Invocation of the autogenerated programs is done via bpf dispatcher from
nf_hook(); instead of

ret = nf_hook_slow( ... )

this is now:
------------------
struct bpf_prog *prog = READ_ONCE(e->hook_prog);

state.priv = (void *)e;
state.skb = skb;

migrate_disable();
ret = __bpf_prog_run(prog, state, BPF_DISPATCHER_FUNC(nf_hook_base));
migrate_enable();
------------------

As long as NF_QUEUE is not used -- which should be rare -- data path will not call
nf_hook_slow "interpreter" anymore.

No changes in BPF core or UAPI additions, although I suppose it would make sense to add a
'enable/disable' sysctl for this.

I think that it makes little sense to consider any form of nf_tables (or iptables) JIT
without indirect-call avoidance first, unless such 'jit' would be for the XDP hook.

I would propose 'xdptables' tool for that though (or 'xdp' family for nftables),
without kernel changes.

Comments welcome.

Florian Westphal (9):
  netfilter: nf_queue: carry index in hook state
  netfilter: nat: split nat hook iteration into a helper
  netfilter: remove hook index from nf_hook_slow arguments
  netfilter: make hook functions accept only one argument
  netfilter: reduce allowed hook count to 32
  netfilter: add bpf base hook program generator
  netfilter: core: do not rebuild bpf program on dying netns
  netfilter: netdev: switch to invocation via bpf
  netfilter: hook_jit: add prog cache

 drivers/net/ipvlan/ipvlan_l3s.c            |   4 +-
 include/linux/netfilter.h                  |  82 ++-
 include/linux/netfilter_arp/arp_tables.h   |   3 +-
 include/linux/netfilter_bridge/ebtables.h  |   3 +-
 include/linux/netfilter_ipv4/ip_tables.h   |   4 +-
 include/linux/netfilter_ipv6/ip6_tables.h  |   3 +-
 include/linux/netfilter_netdev.h           |  33 +-
 include/net/netfilter/br_netfilter.h       |   7 +-
 include/net/netfilter/nf_flow_table.h      |   6 +-
 include/net/netfilter/nf_hook_bpf.h        |  21 +
 include/net/netfilter/nf_queue.h           |   3 +-
 include/net/netfilter/nf_synproxy.h        |   6 +-
 net/bridge/br_input.c                      |   3 +-
 net/bridge/br_netfilter_hooks.c            |  30 +-
 net/bridge/br_netfilter_ipv6.c             |   5 +-
 net/bridge/netfilter/ebtable_broute.c      |   9 +-
 net/bridge/netfilter/ebtables.c            |   6 +-
 net/bridge/netfilter/nf_conntrack_bridge.c |   8 +-
 net/ipv4/netfilter/arp_tables.c            |   7 +-
 net/ipv4/netfilter/ip_tables.c             |   7 +-
 net/ipv4/netfilter/ipt_CLUSTERIP.c         |   6 +-
 net/ipv4/netfilter/iptable_mangle.c        |  15 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c        |   5 +-
 net/ipv6/ila/ila_xlat.c                    |   6 +-
 net/ipv6/netfilter/ip6_tables.c            |   6 +-
 net/ipv6/netfilter/ip6table_mangle.c       |  13 +-
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |   5 +-
 net/netfilter/Kconfig                      |  10 +
 net/netfilter/Makefile                     |   1 +
 net/netfilter/core.c                       | 121 ++++-
 net/netfilter/ipvs/ip_vs_core.c            |  13 +-
 net/netfilter/nf_conntrack_proto.c         |  34 +-
 net/netfilter/nf_flow_table_inet.c         |   8 +-
 net/netfilter/nf_flow_table_ip.c           |  12 +-
 net/netfilter/nf_hook_bpf.c                | 574 +++++++++++++++++++++
 net/netfilter/nf_nat_core.c                |  50 +-
 net/netfilter/nf_nat_proto.c               |  56 +-
 net/netfilter/nf_queue.c                   |  12 +-
 net/netfilter/nf_synproxy_core.c           |   8 +-
 net/netfilter/nft_chain_filter.c           |  48 +-
 net/netfilter/nft_chain_nat.c              |   7 +-
 net/netfilter/nft_chain_route.c            |  22 +-
 security/apparmor/lsm.c                    |   5 +-
 security/selinux/hooks.c                   |  22 +-
 security/smack/smack_netfilter.c           |   8 +-
 45 files changed, 1044 insertions(+), 273 deletions(-)
 create mode 100644 include/net/netfilter/nf_hook_bpf.h
 create mode 100644 net/netfilter/nf_hook_bpf.c

-- 
2.35.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-07 19:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-05 14:13 [RFC 0/9 v2] netfilter: bpf base hook program generator Florian Westphal
2022-10-05 14:13 ` [RFC v2 1/9] netfilter: nf_queue: carry index in hook state Florian Westphal
2022-10-05 14:13 ` [RFC v2 2/9] netfilter: nat: split nat hook iteration into a helper Florian Westphal
2022-10-05 14:13 ` [RFC v2 3/9] netfilter: remove hook index from nf_hook_slow arguments Florian Westphal
2022-10-05 14:13 ` [RFC v2 4/9] netfilter: make hook functions accept only one argument Florian Westphal
2022-10-05 14:13 ` [RFC v2 5/9] netfilter: reduce allowed hook count to 32 Florian Westphal
2022-10-05 14:13 ` [RFC v2 6/9] netfilter: add bpf base hook program generator Florian Westphal
2022-10-06  2:52   ` Alexei Starovoitov
2022-10-06 13:51     ` Florian Westphal
2022-10-07 11:45     ` Florian Westphal
2022-10-07 19:08       ` Alexei Starovoitov
2022-10-07 19:35         ` Florian Westphal
2022-10-05 14:13 ` [RFC v2 7/9] netfilter: core: do not rebuild bpf program on dying netns Florian Westphal
2022-10-05 14:13 ` [RFC v2 8/9] netfilter: netdev: switch to invocation via bpf Florian Westphal
2022-10-05 14:13 ` [RFC v2 9/9] netfilter: hook_jit: add prog cache Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.