* [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches @ 2015-09-16 15:42 Daniel Mack 2015-09-16 15:42 ` [PATCH RFC 1/3] netfilter: add socket to struct nft_pktinfo Daniel Mack ` (3 more replies) 0 siblings, 4 replies; 9+ messages in thread From: Daniel Mack @ 2015-09-16 15:42 UTC (permalink / raw) To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack I'm re-addressing the issue of matching socket meta information for non-established sockets that has been discussed a while ago: http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 Being able to reliably match on net_cls cgroup ids is crucial in order to build a per-application or per-container firewall rules which don't leak ingress packets. Such a feature would be very useful to have. A previous attempt to fix the currently existing issues was to call out to the early demuxing helper functions from the meta matching callbacks, but that doesn't suffice because it doesn't address the case of multicast UDP and other, more complex lookup methods implemented in various protocol handlers. This patch set outlines a different approach by adding a flag to 'struct sk_buff' called 'nf_postponed'. This flag is set by nft_meta_get_eval() in case a decision cannot be made due to a missing skb->sk. skbs flagged that way will then be ran through the netfilter chain processor again after the protocol handlers did the real socket lookup. A small addition to 'struct nft_pktinfo' is needed so that the matching callbacks can access the socket that was passed into nf_hook(). Note that the new flag does not actually bloat 'struct skb_buff', because it still fits into the 'flags1' bitfield. Also, the extra netfilter chain iteration will not be done by any subsequent packet in the same stream, as for those, the early demux code will set skb->sk. The patch set is obviously not yet finished, because a lot more protocol handlers need to be patched. Right now, I only addressed tcp_ipv4. Before I do that, I want to get some feedback on the approach, so please let me know what you think. Thanks, Daniel Daniel Mack (3): netfilter: add socket to struct nft_pktinfo netfilter: nft_meta: mark skbs for postponed filter processing net: tcp_ipv4: re-run netfilter chains for marked skbs include/linux/skbuff.h | 3 ++- include/net/netfilter/nf_tables.h | 2 ++ net/ipv4/tcp_ipv4.c | 10 ++++++++++ net/netfilter/nft_meta.c | 9 ++++++--- 4 files changed, 20 insertions(+), 4 deletions(-) -- 2.5.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH RFC 1/3] netfilter: add socket to struct nft_pktinfo 2015-09-16 15:42 [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Daniel Mack @ 2015-09-16 15:42 ` Daniel Mack 2015-09-16 15:42 ` [PATCH RFC 2/3] netfilter: nft_meta: mark skbs for postponed filter processing Daniel Mack ` (2 subsequent siblings) 3 siblings, 0 replies; 9+ messages in thread From: Daniel Mack @ 2015-09-16 15:42 UTC (permalink / raw) To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack The high-level netfilter hook API already enables users to pass a socket, but that information is lost when the chains are walked. In order to let internal eval callbacks use the passed filter rather than skb->sk, add a pointer of type 'struct sock' to 'struct nft_pktinfo' and set that field via nft_set_pktinfo(). This allows us to run filter chains from situations where skb->sk is unset. Fall back to skb->sk in case state->sk is NULL, so filter callbacks can be written in a generic way. Signed-off-by: Daniel Mack <daniel@zonque.org> --- include/net/netfilter/nf_tables.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index aa8bee7..05e97ed 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -13,6 +13,7 @@ #define NFT_JUMP_STACK_SIZE 16 struct nft_pktinfo { + struct sock *sk; struct sk_buff *skb; const struct net_device *in; const struct net_device *out; @@ -29,6 +30,7 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb, const struct nf_hook_state *state) { + pkt->sk = state->sk ?: skb->sk; pkt->skb = skb; pkt->in = pkt->xt.in = state->in; pkt->out = pkt->xt.out = state->out; -- 2.5.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC 2/3] netfilter: nft_meta: mark skbs for postponed filter processing 2015-09-16 15:42 [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Daniel Mack 2015-09-16 15:42 ` [PATCH RFC 1/3] netfilter: add socket to struct nft_pktinfo Daniel Mack @ 2015-09-16 15:42 ` Daniel Mack 2015-09-16 15:43 ` [PATCH RFC 3/3] net: tcp_ipv4: re-run netfilter chains for marked skbs Daniel Mack 2015-09-16 21:21 ` [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Florian Westphal 3 siblings, 0 replies; 9+ messages in thread From: Daniel Mack @ 2015-09-16 15:42 UTC (permalink / raw) To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack When the cgroup matching code in nft_meta is called without a socket to look at, it currently bails out and lets the packet pass. This is bad, because the reason for skb->sk being NULL is simply that the packet was directed to a socket that hasn't been looked up yet by early demux. This patch does two things: a) it uses the newly introduced pkt->sk pointer rather than skb->sk to check for the net class ID. This allows us to look at the socket the user passed into nf_hook(). b) in case the socket can't be accessed, it marks the skb as 'nf_postponed', so that later dispatchers have a chance to re-iterate the chain for such packets, after a full demux was conducted. Note that the added flag in 'struct skb' does not increase the size of the struct, as it fits in the 'flags1' bitfield. Signed-off-by: Daniel Mack <daniel@zonque.org> --- include/linux/skbuff.h | 3 ++- net/netfilter/nft_meta.c | 9 ++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2738d35..3590101 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -584,7 +584,8 @@ struct sk_buff { fclone:2, peeked:1, head_frag:1, - xmit_more:1; + xmit_more:1, + nf_postponed:1; /* one bit hole */ kmemcheck_bitfield_end(flags1); diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index cb2f13e..33b8d23 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -29,8 +29,9 @@ void nft_meta_get_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { const struct nft_meta *priv = nft_expr_priv(expr); - const struct sk_buff *skb = pkt->skb; const struct net_device *in = pkt->in, *out = pkt->out; + struct sk_buff *skb = pkt->skb; + struct sock *sk = pkt->sk; u32 *dest = ®s->data[priv->dreg]; switch (priv->key) { @@ -168,9 +169,11 @@ void nft_meta_get_eval(const struct nft_expr *expr, break; #ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: - if (skb->sk == NULL || !sk_fullsock(skb->sk)) + if (sk == NULL || !sk_fullsock(sk)) { + skb->nf_postponed = 1; goto err; - *dest = skb->sk->sk_classid; + } + *dest = sk->sk_classid; break; #endif default: -- 2.5.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH RFC 3/3] net: tcp_ipv4: re-run netfilter chains for marked skbs 2015-09-16 15:42 [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Daniel Mack 2015-09-16 15:42 ` [PATCH RFC 1/3] netfilter: add socket to struct nft_pktinfo Daniel Mack 2015-09-16 15:42 ` [PATCH RFC 2/3] netfilter: nft_meta: mark skbs for postponed filter processing Daniel Mack @ 2015-09-16 15:43 ` Daniel Mack 2015-09-16 21:21 ` [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Florian Westphal 3 siblings, 0 replies; 9+ messages in thread From: Daniel Mack @ 2015-09-16 15:43 UTC (permalink / raw) To: pablo; +Cc: daniel, netfilter-devel, netdev, fw, balazs.scheidler, Daniel Mack When an skb has been marked for later re-iteration through netfilter, do that after __inet_lookup_skb() has been called. This allows packets sent to unconnected sockets to be filtered reliably. Note that this will never happen for subsequent packets in the same stream, as skb->sk will be set due to early demux, and hence skb->nf_postponed will remain 0. Signed-off-by: Daniel Mack <daniel@zonque.org> --- net/ipv4/tcp_ipv4.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 93898e0..61e0cb4 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -78,6 +78,7 @@ #include <linux/inet.h> #include <linux/ipv6.h> +#include <linux/netfilter.h> #include <linux/stddef.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> @@ -1594,6 +1595,15 @@ int tcp_v4_rcv(struct sk_buff *skb) if (!sk) goto no_tcp_socket; + if (unlikely(skb->nf_postponed)) { + ret = nf_hook(NFPROTO_IPV4, NF_INET_LOCAL_IN, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + } + process: if (sk->sk_state == TCP_TIME_WAIT) goto do_time_wait; -- 2.5.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches 2015-09-16 15:42 [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Daniel Mack ` (2 preceding siblings ...) 2015-09-16 15:43 ` [PATCH RFC 3/3] net: tcp_ipv4: re-run netfilter chains for marked skbs Daniel Mack @ 2015-09-16 21:21 ` Florian Westphal 2015-09-17 10:04 ` Daniel Mack 3 siblings, 1 reply; 9+ messages in thread From: Florian Westphal @ 2015-09-16 21:21 UTC (permalink / raw) To: Daniel Mack; +Cc: pablo, daniel, netfilter-devel, netdev, fw, balazs.scheidler Daniel Mack <daniel@zonque.org> wrote: > I'm re-addressing the issue of matching socket meta information for > non-established sockets that has been discussed a while ago: > > http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 > > Being able to reliably match on net_cls cgroup ids is crucial in > order to build a per-application or per-container firewall rules > which don't leak ingress packets. Such a feature would be very > useful to have. Could you clarify what 'which don't leak ingress packets' means? > A previous attempt to fix the currently existing issues was to call > out to the early demuxing helper functions from the meta matching > callbacks, but that doesn't suffice because it doesn't address the > case of multicast UDP and other, more complex lookup methods > implemented in various protocol handlers. Yes, but see below. > This patch set outlines a different approach by adding a flag to > 'struct sk_buff' called 'nf_postponed'. This flag is set by > nft_meta_get_eval() in case a decision cannot be made due to a missing > skb->sk. skbs flagged that way will then be ran through the netfilter > chain processor again after the protocol handlers did the real socket > lookup. A small addition to 'struct nft_pktinfo' is needed so that the > matching callbacks can access the socket that was passed into > nf_hook(). > > Note that the new flag does not actually bloat 'struct skb_buff', > because it still fits into the 'flags1' bitfield. Also, the extra > netfilter chain iteration will not be done by any subsequent packet in > the same stream, as for those, the early demux code will set skb->sk. > > The patch set is obviously not yet finished, because a lot more > protocol handlers need to be patched. Right now, I only addressed > tcp_ipv4. Before I do that, I want to get some feedback on the > approach, so please let me know what you think. I think there are several issues. implementation problems: - i'm not sure its legal to call the hook input with skb->sk locked, some matches might want to aquire it. - what makes NFT_META_CGROUP special? (or was that just an example?) design issues: The assumption seems to be that a given skb can always be mapped to a particular socket, and hence a cgroup. Thats not necessarily the case, e.g. with broad-/multicasting or when the socket is e.g. in timewait state. Some skbs will now travel INPUT hooks twice. And once you'd extend this so that we re-invoke nf hooks for mcast packets, for each socket they've been received on, you change netfilter behaviour again (one skb, one traversal -> n traversals of ruleset, one for each sk). I think that this makes it a non-starter, sorry. I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves early-demux-esque code into the nft ruleset. Then you could do something like nft add rule ip filter input meta l4proto tcp demux meta cgroup 42 The caveat being that even in this case we cannot guarantee that skb->sk is set afterwards, or that a cgroup can be derived from it. Iff you absolutely need this, I'd seriously entertain the idea of adding NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset as a socket filter. But really, at that point, a much better question would be wheter net cgroups are the answer to whatever the question was, or what problem we are attempting to address here... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches 2015-09-16 21:21 ` [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Florian Westphal @ 2015-09-17 10:04 ` Daniel Mack 2015-09-17 16:00 ` Florian Westphal 0 siblings, 1 reply; 9+ messages in thread From: Daniel Mack @ 2015-09-17 10:04 UTC (permalink / raw) To: Florian Westphal; +Cc: pablo, daniel, netfilter-devel, netdev, balazs.scheidler Hi Florian, On 09/16/2015 11:21 PM, Florian Westphal wrote: > Daniel Mack <daniel@zonque.org> wrote: >> I'm re-addressing the issue of matching socket meta information for >> non-established sockets that has been discussed a while ago: >> >> http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 >> >> Being able to reliably match on net_cls cgroup ids is crucial in >> order to build a per-application or per-container firewall rules >> which don't leak ingress packets. Such a feature would be very >> useful to have. > > Could you clarify what 'which don't leak ingress packets' means? Well, currently, the existing cgroups matches only filter packets that are sent to an established socket. All other packets are ignored. So when users install such matches as advertised by the documented examples, and the chain policy is permissive, the firewall 'leaks' packets, which is unexpected. >> The patch set is obviously not yet finished, because a lot more >> protocol handlers need to be patched. Right now, I only addressed >> tcp_ipv4. Before I do that, I want to get some feedback on the >> approach, so please let me know what you think. > > I think there are several issues. > > implementation problems: > - i'm not sure its legal to call the hook input with skb->sk locked, > some matches might want to aquire it. In the code as it stands after my patch set, I don't see where skb->sk is locked? After all, skb->sk is NULL, even on the 2nd iteration, which is why I patched the newly looked up socket to be available in the nf hook. > - what makes NFT_META_CGROUP special? (or was that just an example?) It's what I want to get working, but other 'meta' hooks can be made working in a similar fashion. > design issues: > The assumption seems to be that a given skb can always be mapped to a > particular socket, and hence a cgroup. > > Thats not necessarily the case, e.g. with broad-/multicasting or when > the socket is e.g. in timewait state. Yes, that's true. The idea for multicast would be to just drop the cloned skb instead of delivering it to the final socket. > Some skbs will now travel INPUT hooks twice. > > And once you'd extend this so that we re-invoke nf hooks for mcast > packets, for each socket they've been received on, you change netfilter > behaviour again (one skb, one traversal -> n traversals of ruleset, one > for each sk). > > I think that this makes it a non-starter, sorry. Hmm, I see your point. > I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves > early-demux-esque code into the nft ruleset. > > Then you could do something like > > nft add rule ip filter input meta l4proto tcp demux meta cgroup 42 Ok, but how would that be different from the unconditional demuxing patches we've kicked around earlier, especially when it comes to multicast sockets? Could you explain what you have in mind here? > The caveat being that even in this case we cannot guarantee > that skb->sk is set afterwards, or that a cgroup can be derived from it. > > Iff you absolutely need this, I'd seriously entertain the idea of adding > NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset > as a socket filter. That would be a new netfilter hook then, something that is called after LOCAL_IN, for ingress only? In a sense, it would be called from the protocol handlers, just as my patches do right now, but instead of conditionally re-iterating the same rules again, we would walk a different chain? > But really, at that point, a much better question would be wheter net > cgroups are the answer to whatever the question was, or what problem we > are attempting to address here... The idea is simply to have a packet filter which is based on information derived from the task that sends or will eventually handle the packet. IOW: We want to be able to install netfilter rules that apply to all packets received or sent by tasks that match a certain criteria, without modifying the sources of those tasks. As we already have net_cls hooked up in netfilter rules, it seems easiest to just get this working. But with the multiple approaches we already had, it appears the real fix needs more thinking. Thanks, Daniel ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches 2015-09-17 10:04 ` Daniel Mack @ 2015-09-17 16:00 ` Florian Westphal 2015-09-21 16:52 ` Daniel Mack 0 siblings, 1 reply; 9+ messages in thread From: Florian Westphal @ 2015-09-17 16:00 UTC (permalink / raw) To: Daniel Mack Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev, balazs.scheidler Daniel Mack <daniel@zonque.org> wrote: > Hi Florian, > > On 09/16/2015 11:21 PM, Florian Westphal wrote: > > Daniel Mack <daniel@zonque.org> wrote: > >> I'm re-addressing the issue of matching socket meta information for > >> non-established sockets that has been discussed a while ago: > >> > >> http://article.gmane.org/gmane.comp.security.firewalls.netfilter.devel/56877 > >> > >> Being able to reliably match on net_cls cgroup ids is crucial in > >> order to build a per-application or per-container firewall rules > >> which don't leak ingress packets. Such a feature would be very > >> useful to have. > > > > Could you clarify what 'which don't leak ingress packets' means? > > Well, currently, the existing cgroups matches only filter packets that > are sent to an established socket. All other packets are ignored. So > when users install such matches as advertised by the documented > examples, and the chain policy is permissive, the firewall 'leaks' > packets, which is unexpected. Then 'the documentation' needs fixing. cgroup (and anything related to sk data, including uid, etc.) is not guaranteed to work. We can only match what is available in the packet payload, and some extra info that the stack can make available to us (e.g. VLAN id, or skb->sk in some cases on output) and conntrack state plus whatever extra data conntrack allows to attach. > >> The patch set is obviously not yet finished, because a lot more > >> protocol handlers need to be patched. Right now, I only addressed > >> tcp_ipv4. Before I do that, I want to get some feedback on the > >> approach, so please let me know what you think. > > > > I think there are several issues. > > > > implementation problems: > > - i'm not sure its legal to call the hook input with skb->sk locked, > > some matches might want to aquire it. > > In the code as it stands after my patch set, I don't see where skb->sk > is locked? True. [..] > > design issues: > > The assumption seems to be that a given skb can always be mapped to a > > particular socket, and hence a cgroup. > > > > Thats not necessarily the case, e.g. with broad-/multicasting or when > > the socket is e.g. in timewait state. > > Yes, that's true. The idea for multicast would be to just drop the > cloned skb instead of delivering it to the final socket. -v please. > > I would much rather see nft_demux_{udp,tcp,sctp,dccp,...}.c which moves > > early-demux-esque code into the nft ruleset. > > > > Then you could do something like > > > > nft add rule ip filter input meta l4proto tcp demux meta cgroup 42 > > Ok, but how would that be different from the unconditional demuxing > patches we've kicked around earlier, especially when it comes to > multicast sockets? Could you explain what you have in mind here? Two things: - keep it out of core network stack - make it explicit so we can document that 'demux' keyword is fishy and will not work reliably. F.e. I don't see how mcast could ever be made to work except by adding an entirely new filtering mechanism/new hooks in core stack. > > The caveat being that even in this case we cannot guarantee > > that skb->sk is set afterwards, or that a cgroup can be derived from it. > > > > Iff you absolutely need this, I'd seriously entertain the idea of adding > > NFPROTO_L4_TCP, etc, ... or, maybe better, allow to attach nft ruleset > > as a socket filter. > > That would be a new netfilter hook then, something that is called after > LOCAL_IN, for ingress only? In a sense, it would be called from the > protocol handlers, just as my patches do right now, but instead of > conditionally re-iterating the same rules again, we would walk a > different chain? Yes, something like that. Obviously, you'll need to dru^W brib^W convince a LOT of people before that could ever fly. I think we should not do this and that this 'match on ingress sk properties' is just bad[tm]. f.e. you'd also have to move all of the stuff you want into sock_common ... 8-( > > But really, at that point, a much better question would be wheter net > > cgroups are the answer to whatever the question was, or what problem we > > are attempting to address here... > > The idea is simply to have a packet filter which is based on information > derived from the task that sends or will eventually handle the packet. Starts to smell like snet (https://lwn.net/Articles/441587/) > IOW: We want to be able to install netfilter rules that apply to all > packets received or sent by tasks that match a certain criteria, without > modifying the sources of those tasks. Sorry, I think netfilter is wrong tool for this, at least for ingress. You could use conntrack to stash net_cls id in the connmark, though (for inbound reply packets). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches 2015-09-17 16:00 ` Florian Westphal @ 2015-09-21 16:52 ` Daniel Mack 2015-09-21 19:05 ` Florian Westphal 0 siblings, 1 reply; 9+ messages in thread From: Daniel Mack @ 2015-09-21 16:52 UTC (permalink / raw) To: Florian Westphal; +Cc: pablo, daniel, netfilter-devel, netdev, balazs.scheidler [-- Attachment #1: Type: text/plain, Size: 1967 bytes --] Hi, Thanks for your feedback, Florian! On 09/17/2015 06:00 PM, Florian Westphal wrote: > Daniel Mack <daniel@zonque.org> wrote: >> That would be a new netfilter hook then, something that is called after >> LOCAL_IN, for ingress only? In a sense, it would be called from the >> protocol handlers, just as my patches do right now, but instead of >> conditionally re-iterating the same rules again, we would walk a >> different chain? > > Yes, something like that. Obviously, you'll need to dru^W brib^W > convince a LOT of people before that could ever fly. > > I think we should not do this and that this 'match on ingress sk > properties' is just bad[tm]. > > f.e. you'd also have to move all of the stuff you want into > sock_common ... 8-( Hmm, I'm not sure whether I understand which problems you see, or which corner cases I am missing in my assessment. I did a quick test with the attached 4 patches that 1) Allow hook callbacks to look at the socket passed to nf_hook(), so skb->sk does not have to be set 2) Make nft_meta look at pkt->sk rather that skb->sk (only for cgroups as proof of concept) 3) Introduce a new POST_DEMUX netfilter chain (the name is not perfect, admittedly) 4) Iterate POST_DEMUX chains for v4 TCP and UDP unicast+multicast sockets. With some really trivial modifications to libnftnl/nftables (which just map strings to the new enum value), this works fine in my tests. Multicast receivers that match a netclass ID in the ruleset won't see any packets, while others do. Some more considerations: if we cannot determine a socket for a packet and hence don't deliver it, it's IMO perfectly fine not to run the netfilter rules for them. All we need to achieve with this chain is that for packets that _are_ delivered to a socket, all the necessary rules have been processed, at a time when we know who the final receiver of the skb is. I'm happy to discuss the side effects of such an approach. Thanks, Daniel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-netfilter-add-socket-to-struct-nft_pktinfo.patch --] [-- Type: text/x-diff; name="0001-netfilter-add-socket-to-struct-nft_pktinfo.patch", Size: 1553 bytes --] From 808dacb17308c7a1e00a2e6504cbe6468a25f0d1 Mon Sep 17 00:00:00 2001 From: Daniel Mack <daniel@zonque.org> Date: Wed, 16 Sep 2015 14:40:26 +0200 Subject: [PATCH RFC 1/4] netfilter: add socket to struct nft_pktinfo The high-level netfilter hook API already enables users to pass a socket, but that information is lost when the chains are walked. In order to let internal eval callbacks use the passed filter rather than skb->sk, add a pointer of type 'struct sock' to 'struct nft_pktinfo' and set that field via nft_set_pktinfo(). This allows us to run filter chains from situations where skb->sk is unset. Fall back to skb->sk in case state->sk is NULL, so filter callbacks can be written in a generic way. Signed-off-by: Daniel Mack <daniel@zonque.org> --- include/net/netfilter/nf_tables.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index aa8bee7..05e97ed 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -13,6 +13,7 @@ #define NFT_JUMP_STACK_SIZE 16 struct nft_pktinfo { + struct sock *sk; struct sk_buff *skb; const struct net_device *in; const struct net_device *out; @@ -29,6 +30,7 @@ static inline void nft_set_pktinfo(struct nft_pktinfo *pkt, struct sk_buff *skb, const struct nf_hook_state *state) { + pkt->sk = state->sk ?: skb->sk; pkt->skb = skb; pkt->in = pkt->xt.in = state->in; pkt->out = pkt->xt.out = state->out; -- 2.5.0 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #3: 0002-netfilter-nft_meta-look-at-pkt-sk-rather-than-skb-sk.patch --] [-- Type: text/x-diff; name="0002-netfilter-nft_meta-look-at-pkt-sk-rather-than-skb-sk.patch", Size: 1489 bytes --] From 1ae058eaedc9d20abcfe7c84496bbf1acb979026 Mon Sep 17 00:00:00 2001 From: Daniel Mack <daniel@zonque.org> Date: Fri, 18 Sep 2015 16:55:05 +0200 Subject: [PATCH RFC 2/4] netfilter: nft_meta: look at pkt->sk rather than skb->sk pkt->sk is set to whatever was passed to nh_hook() by the caller, and for post demux chains, this is the one that should be looked at, as skb->sk is still NULL at this point in time. Signed-off-by: Daniel Mack <daniel@zonque.org> --- net/netfilter/nft_meta.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index cb2f13e..f195bee 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -29,8 +29,9 @@ void nft_meta_get_eval(const struct nft_expr *expr, const struct nft_pktinfo *pkt) { const struct nft_meta *priv = nft_expr_priv(expr); - const struct sk_buff *skb = pkt->skb; const struct net_device *in = pkt->in, *out = pkt->out; + struct sk_buff *skb = pkt->skb; + struct sock *sk = pkt->sk; u32 *dest = ®s->data[priv->dreg]; switch (priv->key) { @@ -168,9 +169,9 @@ void nft_meta_get_eval(const struct nft_expr *expr, break; #ifdef CONFIG_CGROUP_NET_CLASSID case NFT_META_CGROUP: - if (skb->sk == NULL || !sk_fullsock(skb->sk)) + if (sk == NULL || !sk_fullsock(sk)) goto err; - *dest = skb->sk->sk_classid; + *dest = sk->sk_classid; break; #endif default: -- 2.5.0 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #4: 0003-netfilter-add-NF_INET_POST_DEMUX-chain-type.patch --] [-- Type: text/x-diff; name="0003-netfilter-add-NF_INET_POST_DEMUX-chain-type.patch", Size: 3018 bytes --] From 661e8f5c6e7f48bb03dbf8fcd7cbccb9ffb5cc5d Mon Sep 17 00:00:00 2001 From: Daniel Mack <daniel@zonque.org> Date: Fri, 18 Sep 2015 16:39:16 +0200 Subject: [PATCH RFC 3/4] netfilter: add NF_INET_POST_DEMUX chain type Add a new chain type NF_INET_POST_DEMUX which is ran after the input demux is complete and the final destination socket (if any) has been determined. This helps filtering packets based on information stored in the destination socket, such as cgroup controller supplied net class IDs. Signed-off-by: Daniel Mack <daniel@zonque.org> --- include/uapi/linux/netfilter.h | 1 + net/ipv4/netfilter/iptable_filter.c | 1 + net/ipv4/netfilter/nf_tables_ipv4.c | 4 +++- net/netfilter/nf_tables_inet.c | 3 ++- 4 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/uapi/linux/netfilter.h b/include/uapi/linux/netfilter.h index d93f949..d402679 100644 --- a/include/uapi/linux/netfilter.h +++ b/include/uapi/linux/netfilter.h @@ -49,6 +49,7 @@ enum nf_inet_hooks { NF_INET_FORWARD, NF_INET_LOCAL_OUT, NF_INET_POST_ROUTING, + NF_INET_POST_DEMUX, NF_INET_NUMHOOKS }; diff --git a/net/ipv4/netfilter/iptable_filter.c b/net/ipv4/netfilter/iptable_filter.c index a0f3bec..55b4795 100644 --- a/net/ipv4/netfilter/iptable_filter.c +++ b/net/ipv4/netfilter/iptable_filter.c @@ -21,6 +21,7 @@ MODULE_AUTHOR("Netfilter Core Team <coreteam@netfilter.org>"); MODULE_DESCRIPTION("iptables filter table"); #define FILTER_VALID_HOOKS ((1 << NF_INET_LOCAL_IN) | \ + (1 << NF_INET_POST_DEMUX) | \ (1 << NF_INET_FORWARD) | \ (1 << NF_INET_LOCAL_OUT)) diff --git a/net/ipv4/netfilter/nf_tables_ipv4.c b/net/ipv4/netfilter/nf_tables_ipv4.c index aa180d3..993f302 100644 --- a/net/ipv4/netfilter/nf_tables_ipv4.c +++ b/net/ipv4/netfilter/nf_tables_ipv4.c @@ -55,6 +55,7 @@ struct nft_af_info nft_af_ipv4 __read_mostly = { [NF_INET_FORWARD] = nft_do_chain_ipv4, [NF_INET_PRE_ROUTING] = nft_do_chain_ipv4, [NF_INET_POST_ROUTING] = nft_do_chain_ipv4, + [NF_INET_POST_DEMUX] = nft_do_chain_ipv4, }, }; EXPORT_SYMBOL_GPL(nft_af_ipv4); @@ -96,7 +97,8 @@ static const struct nf_chain_type filter_ipv4 = { (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_FORWARD) | (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_POST_ROUTING), + (1 << NF_INET_POST_ROUTING) | + (1 << NF_INET_POST_DEMUX), }; static int __init nf_tables_ipv4_init(void) diff --git a/net/netfilter/nf_tables_inet.c b/net/netfilter/nf_tables_inet.c index 9dd2d21..c01db78 100644 --- a/net/netfilter/nf_tables_inet.c +++ b/net/netfilter/nf_tables_inet.c @@ -75,7 +75,8 @@ static const struct nf_chain_type filter_inet = { (1 << NF_INET_LOCAL_OUT) | (1 << NF_INET_FORWARD) | (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_POST_ROUTING), + (1 << NF_INET_POST_ROUTING) | + (1 << NF_INET_POST_DEMUX), }; static int __init nf_tables_inet_init(void) -- 2.5.0 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #5: 0004-net-tcp_ipv4-udp_ipv4-hook-up-post-demux-netfilter-c.patch --] [-- Type: text/x-diff; name="0004-net-tcp_ipv4-udp_ipv4-hook-up-post-demux-netfilter-c.patch", Size: 2345 bytes --] From 1898df7d6a35967972bae412994623a8d7c262cd Mon Sep 17 00:00:00 2001 From: Daniel Mack <daniel@zonque.org> Date: Wed, 16 Sep 2015 14:58:08 +0200 Subject: [PATCH RFC 4/4] net: tcp_ipv4, udp_ipv4: hook up post demux netfilter chains Run the POST_DEMUX netfilter chain rules after the destination socket has been looked up. Signed-off-by: Daniel Mack <daniel@zonque.org> --- net/ipv4/tcp_ipv4.c | 8 ++++++++ net/ipv4/udp.c | 15 +++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 93898e0..33f968e 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -78,6 +78,7 @@ #include <linux/inet.h> #include <linux/ipv6.h> +#include <linux/netfilter.h> #include <linux/stddef.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> @@ -1594,6 +1595,13 @@ int tcp_v4_rcv(struct sk_buff *skb) if (!sk) goto no_tcp_socket; + ret = nf_hook(NFPROTO_IPV4, NF_INET_POST_DEMUX, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + process: if (sk->sk_state == TCP_TIME_WAIT) goto do_time_wait; diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c0a15e7..0056c20 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -97,6 +97,7 @@ #include <linux/mm.h> #include <linux/inet.h> #include <linux/netdevice.h> +#include <linux/netfilter.h> #include <linux/slab.h> #include <net/tcp_states.h> #include <linux/skbuff.h> @@ -1632,7 +1633,14 @@ static void flush_stack(struct sock **stack, unsigned int count, struct sock *sk; for (i = 0; i < count; i++) { + int ret; sk = stack[i]; + + ret = nf_hook(NFPROTO_IPV4, NF_INET_POST_DEMUX, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) + continue; + if (likely(!skb1)) skb1 = (i == final) ? skb : skb_clone(skb, GFP_ATOMIC); @@ -1819,6 +1827,13 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable, if (sk) { int ret; + ret = nf_hook(NFPROTO_IPV4, NF_INET_POST_DEMUX, sk, + skb, skb->dev, NULL, NULL); + if (ret != 1) { + sock_put(sk); + return 0; + } + if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk)) skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check, inet_compute_pseudo); -- 2.5.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches 2015-09-21 16:52 ` Daniel Mack @ 2015-09-21 19:05 ` Florian Westphal 0 siblings, 0 replies; 9+ messages in thread From: Florian Westphal @ 2015-09-21 19:05 UTC (permalink / raw) To: Daniel Mack Cc: Florian Westphal, pablo, daniel, netfilter-devel, netdev, balazs.scheidler Daniel Mack <daniel@zonque.org> wrote: > >> LOCAL_IN, for ingress only? In a sense, it would be called from the > >> protocol handlers, just as my patches do right now, but instead of > >> conditionally re-iterating the same rules again, we would walk a > >> different chain? > > > > Yes, something like that. Obviously, you'll need to dru^W brib^W > > convince a LOT of people before that could ever fly. > > > > I think we should not do this and that this 'match on ingress sk > > properties' is just bad[tm]. > > > > f.e. you'd also have to move all of the stuff you want into > > sock_common ... 8-( > > Hmm, I'm not sure whether I understand which problems you see, or which > corner cases I am missing in my assessment. I did a quick test with the > attached 4 patches that > > 1) Allow hook callbacks to look at the socket passed to nf_hook(), so > skb->sk does not have to be set > > 2) Make nft_meta look at pkt->sk rather that skb->sk (only for cgroups > as proof of concept) > > 3) Introduce a new POST_DEMUX netfilter chain (the name is not > perfect, admittedly) > > 4) Iterate POST_DEMUX chains for v4 TCP and UDP unicast+multicast > sockets. > > With some really trivial modifications to libnftnl/nftables (which just > map strings to the new enum value), this works fine in my tests. > Multicast receivers that match a netclass ID in the ruleset won't see > any packets, while others do. > > Some more considerations: if we cannot determine a socket for a packet > and hence don't deliver it, it's IMO perfectly fine not to run the > netfilter rules for them. All we need to achieve with this chain is that > for packets that _are_ delivered to a socket, all the necessary rules > have been processed, at a time when we know who the final receiver of > the skb is. Not sure if thats true. What about Timewait sockets? Its easy to imagine someone using this feature and then complaining that it doesn't match some packets, at which point we'd have to grow sock_common to accomondate all sk member we support matching for :-/ If we'd have kernel releases where we drop features this wouldn't be much of an issue since we could back out in case it causes issues later. But once we add your proposed feature we cannot go back... I'm not sure; I dislike this feature proposal but I can't think of any alternative [other than "don't do this"] :-( ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2015-09-21 19:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-09-16 15:42 [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Daniel Mack 2015-09-16 15:42 ` [PATCH RFC 1/3] netfilter: add socket to struct nft_pktinfo Daniel Mack 2015-09-16 15:42 ` [PATCH RFC 2/3] netfilter: nft_meta: mark skbs for postponed filter processing Daniel Mack 2015-09-16 15:43 ` [PATCH RFC 3/3] net: tcp_ipv4: re-run netfilter chains for marked skbs Daniel Mack 2015-09-16 21:21 ` [PATCH RFC 0/3] Allow postponed netfilter handling for socket matches Florian Westphal 2015-09-17 10:04 ` Daniel Mack 2015-09-17 16:00 ` Florian Westphal 2015-09-21 16:52 ` Daniel Mack 2015-09-21 19:05 ` Florian Westphal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).