* [PATCH nf-next 0/2] xt_cgroups fix
@ 2015-03-24 15:30 Daniel Borkmann
2015-03-24 15:30 ` [PATCH nf-next 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Daniel Borkmann @ 2015-03-24 15:30 UTC (permalink / raw)
To: pablo; +Cc: daniel, fw, a.perevalov, netfilter-devel, Daniel Borkmann
Hi Pablo,
here's a possible fix for xt_cgroups that was previously reported
by Daniel Mack.
The first patch refactors common helpers, which is later on being
used by the actual fix. Please see individual patches for more
details.
I have based the changes on nf-next as they're rather big, they
are, however, on top of Eric's a94070000388 ("netfilter: xt_socket:
prepare for TCP_NEW_SYN_RECV support") from net-next to avoid ugly
merge conflicts in xt_socket.
If you nevertheless think it's more suited for nf, or I should
ignore the above conflicting commit, I'd be happy to rebase.
Thanks a lot!
Daniel Borkmann (2):
netfilter: x_tables: refactor lookup helpers from xt_socket
netfilter: x_tables: fix NF_INET_LOCAL_IN sk lookups
net/netfilter/Kconfig | 5 +
net/netfilter/xt_cgroup.c | 86 ++++++++++---
net/netfilter/xt_sk_helper.h | 282 +++++++++++++++++++++++++++++++++++++++++
net/netfilter/xt_socket.c | 293 +++----------------------------------------
4 files changed, 373 insertions(+), 293 deletions(-)
create mode 100644 net/netfilter/xt_sk_helper.h
--
1.9.3
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH nf-next 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket
2015-03-24 15:30 [PATCH nf-next 0/2] xt_cgroups fix Daniel Borkmann
@ 2015-03-24 15:30 ` Daniel Borkmann
2015-03-24 15:30 ` [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
2015-03-24 15:42 ` [PATCH nf-next 0/2] xt_cgroups fix Florian Westphal
2 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2015-03-24 15:30 UTC (permalink / raw)
To: pablo; +Cc: daniel, fw, a.perevalov, netfilter-devel, Daniel Borkmann
The socket lookup helpers are also needed for fixing xt_cgroups,
therefore refactor them into shareable helper functions.
This simplifies and optimizes the xt_socket code itself a bit
as well, i.e. time to verdict for early demux sockets should be
much faster than previously:
We've unnecessarily extracted proto, {s,d}addr and {s,d}ports
from the skb data, accessing possible conntrack information,
etc even though we were not even calling into the socket lookup
via xt_socket_get_sock_v4() due to skb->sk hit.
After this patch, we only proceed the slow-path when we have a
skb->sk miss.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: Florian Westphal <fw@strlen.de>
---
net/netfilter/xt_sk_helper.h | 282 +++++++++++++++++++++++++++++++++++++++++
net/netfilter/xt_socket.c | 293 +++----------------------------------------
2 files changed, 300 insertions(+), 275 deletions(-)
create mode 100644 net/netfilter/xt_sk_helper.h
diff --git a/net/netfilter/xt_sk_helper.h b/net/netfilter/xt_sk_helper.h
new file mode 100644
index 0000000..604b7ac
--- /dev/null
+++ b/net/netfilter/xt_sk_helper.h
@@ -0,0 +1,282 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (C) 2007-2008 BalaBit IT Ltd.
+ * Author: Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/skbuff.h>
+
+#include <net/tcp.h>
+#include <net/udp.h>
+#include <net/icmp.h>
+#include <net/sock.h>
+
+#include <net/netfilter/ipv4/nf_defrag_ipv4.h>
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#define XT_HAVE_IPV6 1
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#include <net/inet6_hashtables.h>
+#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
+#endif
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#define XT_HAVE_CONNTRACK 1
+#include <net/netfilter/nf_conntrack.h>
+#endif
+
+static int extract_icmp4_fields(const struct sk_buff *skb, u8 *protocol,
+ __be32 *raddr, __be32 *laddr, __be16 *rport,
+ __be16 *lport)
+{
+ unsigned int outside_hdrlen = ip_hdrlen(skb);
+ struct iphdr *inside_iph, _inside_iph;
+ struct icmphdr *icmph, _icmph;
+ __be16 *ports, _ports[2];
+
+ icmph = skb_header_pointer(skb, outside_hdrlen,
+ sizeof(_icmph), &_icmph);
+ if (icmph == NULL)
+ return 1;
+
+ switch (icmph->type) {
+ case ICMP_DEST_UNREACH:
+ case ICMP_SOURCE_QUENCH:
+ case ICMP_REDIRECT:
+ case ICMP_TIME_EXCEEDED:
+ case ICMP_PARAMETERPROB:
+ break;
+ default:
+ return 1;
+ }
+
+ inside_iph = skb_header_pointer(skb, outside_hdrlen +
+ sizeof(struct icmphdr),
+ sizeof(_inside_iph), &_inside_iph);
+ if (inside_iph == NULL)
+ return 1;
+
+ if (inside_iph->protocol != IPPROTO_TCP &&
+ inside_iph->protocol != IPPROTO_UDP)
+ return 1;
+
+ ports = skb_header_pointer(skb, outside_hdrlen +
+ sizeof(struct icmphdr) +
+ (inside_iph->ihl << 2),
+ sizeof(_ports), &_ports);
+ if (ports == NULL)
+ return 1;
+
+ /* The inside IP packet is the one quoted from our side, thus
+ * its saddr is the local address.
+ */
+
+ *protocol = inside_iph->protocol;
+ *laddr = inside_iph->saddr;
+ *lport = ports[0];
+ *raddr = inside_iph->daddr;
+ *rport = ports[1];
+
+ return 0;
+}
+
+static struct sock *xt_get_sk_v4(struct net *net, const u8 protocol,
+ const __be32 saddr, const __be32 daddr,
+ const __be16 sport, const __be16 dport,
+ const struct net_device *in)
+{
+ switch (protocol) {
+ case IPPROTO_TCP:
+ return __inet_lookup(net, &tcp_hashinfo,
+ saddr, sport, daddr, dport,
+ in->ifindex);
+ case IPPROTO_UDP:
+ return udp4_lib_lookup(net, saddr, sport, daddr, dport,
+ in->ifindex);
+ }
+
+ return NULL;
+}
+
+static struct sock *xt_sk_lookup(const struct sk_buff *skb,
+ const struct net_device *indev)
+{
+ const struct iphdr *iph = ip_hdr(skb);
+ __be32 uninitialized_var(daddr), uninitialized_var(saddr);
+ __be16 uninitialized_var(dport), uninitialized_var(sport);
+ u8 uninitialized_var(protocol);
+#ifdef XT_HAVE_CONNTRACK
+ struct nf_conn const *ct;
+ enum ip_conntrack_info ctinfo;
+#endif
+
+ if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
+ struct udphdr _hdr, *hp = NULL;
+
+ hp = skb_header_pointer(skb, ip_hdrlen(skb),
+ sizeof(_hdr), &_hdr);
+ if (hp == NULL)
+ return NULL;
+
+ protocol = iph->protocol;
+
+ saddr = iph->saddr;
+ daddr = iph->daddr;
+
+ sport = hp->source;
+ dport = hp->dest;
+
+ } else if (iph->protocol == IPPROTO_ICMP) {
+ if (extract_icmp4_fields(skb, &protocol, &saddr, &daddr,
+ &sport, &dport))
+ return NULL;
+ } else {
+ return NULL;
+ }
+
+#ifdef XT_HAVE_CONNTRACK
+ /* Do the lookup with the original socket address in
+ * case this is a reply packet of an established
+ * SNAT-ted connection.
+ */
+ ct = nf_ct_get(skb, &ctinfo);
+ if (ct && !nf_ct_is_untracked(ct) &&
+ ((iph->protocol != IPPROTO_ICMP &&
+ ctinfo == IP_CT_ESTABLISHED_REPLY) ||
+ (iph->protocol == IPPROTO_ICMP &&
+ ctinfo == IP_CT_RELATED_REPLY)) &&
+ (ct->status & IPS_SRC_NAT_DONE)) {
+
+ daddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+ dport = (iph->protocol == IPPROTO_TCP) ?
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port :
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.udp.port;
+ }
+#endif
+ return xt_get_sk_v4(dev_net(skb->dev), protocol, saddr, daddr,
+ sport, dport, indev);
+}
+
+#ifdef XT_HAVE_IPV6
+static int extract_icmp6_fields(const struct sk_buff *skb,
+ unsigned int outside_hdrlen,
+ int *protocol,
+ const struct in6_addr **raddr,
+ const struct in6_addr **laddr,
+ __be16 *rport, __be16 *lport,
+ struct ipv6hdr *ipv6_var)
+{
+ const struct ipv6hdr *inside_iph;
+ struct icmp6hdr *icmph, _icmph;
+ __be16 *ports, _ports[2];
+ u8 inside_nexthdr;
+ __be16 inside_fragoff;
+ int inside_hdrlen;
+
+ icmph = skb_header_pointer(skb, outside_hdrlen,
+ sizeof(_icmph), &_icmph);
+ if (icmph == NULL)
+ return 1;
+
+ if (icmph->icmp6_type & ICMPV6_INFOMSG_MASK)
+ return 1;
+
+ inside_iph = skb_header_pointer(skb, outside_hdrlen + sizeof(_icmph),
+ sizeof(*ipv6_var), ipv6_var);
+ if (inside_iph == NULL)
+ return 1;
+
+ inside_nexthdr = inside_iph->nexthdr;
+ inside_hdrlen = ipv6_skip_exthdr(skb, outside_hdrlen + sizeof(_icmph) +
+ sizeof(*ipv6_var),
+ &inside_nexthdr, &inside_fragoff);
+ if (inside_hdrlen < 0)
+ return 1; /* hjm: Packet has no/incomplete transport layer headers. */
+
+ if (inside_nexthdr != IPPROTO_TCP &&
+ inside_nexthdr != IPPROTO_UDP)
+ return 1;
+
+ ports = skb_header_pointer(skb, inside_hdrlen,
+ sizeof(_ports), &_ports);
+ if (ports == NULL)
+ return 1;
+
+ /* The inside IP packet is the one quoted from our side, thus
+ * its saddr is the local address.
+ */
+
+ *protocol = inside_nexthdr;
+ *laddr = &inside_iph->saddr;
+ *lport = ports[0];
+ *raddr = &inside_iph->daddr;
+ *rport = ports[1];
+
+ return 0;
+}
+
+static struct sock *xt_get_sk_v6(struct net *net, const u8 protocol,
+ const struct in6_addr *saddr,
+ const struct in6_addr *daddr,
+ const __be16 sport, const __be16 dport,
+ const struct net_device *in)
+{
+ switch (protocol) {
+ case IPPROTO_TCP:
+ return inet6_lookup(net, &tcp_hashinfo,
+ saddr, sport, daddr, dport,
+ in->ifindex);
+ case IPPROTO_UDP:
+ return udp6_lib_lookup(net, saddr, sport, daddr, dport,
+ in->ifindex);
+ }
+
+ return NULL;
+}
+
+static struct sock *xt_sk_lookup6(const struct sk_buff *skb,
+ const struct net_device *indev)
+{
+ __be16 uninitialized_var(dport), uninitialized_var(sport);
+ const struct in6_addr *daddr = NULL, *saddr = NULL;
+ struct ipv6hdr *iph = ipv6_hdr(skb);
+ int thoff = 0, tproto;
+
+ tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
+ if (tproto < 0) {
+ pr_debug("unable to find transport header in IPv6 packet, dropping\n");
+ return NULL;
+ }
+
+ if (tproto == IPPROTO_UDP || tproto == IPPROTO_TCP) {
+ struct udphdr _hdr, *hp;
+
+ hp = skb_header_pointer(skb, thoff, sizeof(_hdr), &_hdr);
+ if (hp == NULL)
+ return NULL;
+
+ saddr = &iph->saddr;
+ daddr = &iph->daddr;
+
+ sport = hp->source;
+ dport = hp->dest;
+
+ } else if (tproto == IPPROTO_ICMPV6) {
+ struct ipv6hdr ipv6_var;
+
+ if (extract_icmp6_fields(skb, thoff, &tproto, &saddr, &daddr,
+ &sport, &dport, &ipv6_var))
+ return NULL;
+ } else {
+ return NULL;
+ }
+
+ return xt_get_sk_v6(dev_net(skb->dev), tproto, saddr, daddr,
+ sport, dport, indev);
+}
+#endif /* XT_HAVE_IPV6 */
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 895534e..570ae6e 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -7,90 +7,18 @@
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
- *
*/
+
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
#include <linux/module.h>
#include <linux/skbuff.h>
-#include <linux/netfilter/x_tables.h>
-#include <linux/netfilter_ipv4/ip_tables.h>
-#include <net/tcp.h>
-#include <net/udp.h>
-#include <net/icmp.h>
-#include <net/sock.h>
-#include <net/inet_sock.h>
-#include <net/netfilter/ipv4/nf_defrag_ipv4.h>
-
-#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
-#define XT_SOCKET_HAVE_IPV6 1
-#include <linux/netfilter_ipv6/ip6_tables.h>
-#include <net/inet6_hashtables.h>
-#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
-#endif
+#include <linux/netfilter/x_tables.h>
#include <linux/netfilter/xt_socket.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
-#if IS_ENABLED(CONFIG_NF_CONNTRACK)
-#define XT_SOCKET_HAVE_CONNTRACK 1
-#include <net/netfilter/nf_conntrack.h>
-#endif
-
-static int
-extract_icmp4_fields(const struct sk_buff *skb,
- u8 *protocol,
- __be32 *raddr,
- __be32 *laddr,
- __be16 *rport,
- __be16 *lport)
-{
- unsigned int outside_hdrlen = ip_hdrlen(skb);
- struct iphdr *inside_iph, _inside_iph;
- struct icmphdr *icmph, _icmph;
- __be16 *ports, _ports[2];
-
- icmph = skb_header_pointer(skb, outside_hdrlen,
- sizeof(_icmph), &_icmph);
- if (icmph == NULL)
- return 1;
-
- switch (icmph->type) {
- case ICMP_DEST_UNREACH:
- case ICMP_SOURCE_QUENCH:
- case ICMP_REDIRECT:
- case ICMP_TIME_EXCEEDED:
- case ICMP_PARAMETERPROB:
- break;
- default:
- return 1;
- }
-
- inside_iph = skb_header_pointer(skb, outside_hdrlen +
- sizeof(struct icmphdr),
- sizeof(_inside_iph), &_inside_iph);
- if (inside_iph == NULL)
- return 1;
-
- if (inside_iph->protocol != IPPROTO_TCP &&
- inside_iph->protocol != IPPROTO_UDP)
- return 1;
-
- ports = skb_header_pointer(skb, outside_hdrlen +
- sizeof(struct icmphdr) +
- (inside_iph->ihl << 2),
- sizeof(_ports), &_ports);
- if (ports == NULL)
- return 1;
-
- /* the inside IP packet is the one quoted from our side, thus
- * its saddr is the local address */
- *protocol = inside_iph->protocol;
- *laddr = inside_iph->saddr;
- *lport = ports[0];
- *raddr = inside_iph->daddr;
- *rport = ports[1];
-
- return 0;
-}
+#include "xt_sk_helper.h"
/* "socket" match based redirection (no specific rule)
* ===================================================
@@ -111,23 +39,6 @@ extract_icmp4_fields(const struct sk_buff *skb,
* then local services could intercept traffic going through the
* box.
*/
-static struct sock *
-xt_socket_get_sock_v4(struct net *net, const u8 protocol,
- const __be32 saddr, const __be32 daddr,
- const __be16 sport, const __be16 dport,
- const struct net_device *in)
-{
- switch (protocol) {
- case IPPROTO_TCP:
- return __inet_lookup(net, &tcp_hashinfo,
- saddr, sport, daddr, dport,
- in->ifindex);
- case IPPROTO_UDP:
- return udp4_lib_lookup(net, saddr, sport, daddr, dport,
- in->ifindex);
- }
- return NULL;
-}
static bool xt_socket_sk_is_transparent(struct sock *sk)
{
@@ -147,63 +58,13 @@ static bool
socket_match(const struct sk_buff *skb, struct xt_action_param *par,
const struct xt_socket_mtinfo1 *info)
{
- const struct iphdr *iph = ip_hdr(skb);
- struct udphdr _hdr, *hp = NULL;
struct sock *sk = skb->sk;
- __be32 uninitialized_var(daddr), uninitialized_var(saddr);
- __be16 uninitialized_var(dport), uninitialized_var(sport);
- u8 uninitialized_var(protocol);
-#ifdef XT_SOCKET_HAVE_CONNTRACK
- struct nf_conn const *ct;
- enum ip_conntrack_info ctinfo;
-#endif
-
- if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
- hp = skb_header_pointer(skb, ip_hdrlen(skb),
- sizeof(_hdr), &_hdr);
- if (hp == NULL)
- return false;
-
- protocol = iph->protocol;
- saddr = iph->saddr;
- sport = hp->source;
- daddr = iph->daddr;
- dport = hp->dest;
-
- } else if (iph->protocol == IPPROTO_ICMP) {
- if (extract_icmp4_fields(skb, &protocol, &saddr, &daddr,
- &sport, &dport))
- return false;
- } else {
- return false;
- }
-
-#ifdef XT_SOCKET_HAVE_CONNTRACK
- /* Do the lookup with the original socket address in case this is a
- * reply packet of an established SNAT-ted connection. */
-
- ct = nf_ct_get(skb, &ctinfo);
- if (ct && !nf_ct_is_untracked(ct) &&
- ((iph->protocol != IPPROTO_ICMP &&
- ctinfo == IP_CT_ESTABLISHED_REPLY) ||
- (iph->protocol == IPPROTO_ICMP &&
- ctinfo == IP_CT_RELATED_REPLY)) &&
- (ct->status & IPS_SRC_NAT_DONE)) {
-
- daddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
- dport = (iph->protocol == IPPROTO_TCP) ?
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port :
- ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.udp.port;
- }
-#endif
if (!sk)
- sk = xt_socket_get_sock_v4(dev_net(skb->dev), protocol,
- saddr, daddr, sport, dport,
- par->in);
+ sk = xt_sk_lookup(skb, par->in);
if (sk) {
- bool wildcard;
bool transparent = true;
+ bool wildcard;
/* Ignore sockets listening on INADDR_ANY,
* unless XT_SOCKET_NOWILDCARD is set
@@ -225,12 +86,7 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
sk = NULL;
}
- pr_debug("proto %hhu %pI4:%hu -> %pI4:%hu (orig %pI4:%hu) sock %p\n",
- protocol, &saddr, ntohs(sport),
- &daddr, ntohs(dport),
- &iph->daddr, hp ? ntohs(hp->dest) : 0, sk);
-
- return (sk != NULL);
+ return sk != NULL;
}
static bool
@@ -249,127 +105,18 @@ socket_mt4_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
return socket_match(skb, par, par->matchinfo);
}
-#ifdef XT_SOCKET_HAVE_IPV6
-
-static int
-extract_icmp6_fields(const struct sk_buff *skb,
- unsigned int outside_hdrlen,
- int *protocol,
- const struct in6_addr **raddr,
- const struct in6_addr **laddr,
- __be16 *rport,
- __be16 *lport,
- struct ipv6hdr *ipv6_var)
-{
- const struct ipv6hdr *inside_iph;
- struct icmp6hdr *icmph, _icmph;
- __be16 *ports, _ports[2];
- u8 inside_nexthdr;
- __be16 inside_fragoff;
- int inside_hdrlen;
-
- icmph = skb_header_pointer(skb, outside_hdrlen,
- sizeof(_icmph), &_icmph);
- if (icmph == NULL)
- return 1;
-
- if (icmph->icmp6_type & ICMPV6_INFOMSG_MASK)
- return 1;
-
- inside_iph = skb_header_pointer(skb, outside_hdrlen + sizeof(_icmph),
- sizeof(*ipv6_var), ipv6_var);
- if (inside_iph == NULL)
- return 1;
- inside_nexthdr = inside_iph->nexthdr;
-
- inside_hdrlen = ipv6_skip_exthdr(skb, outside_hdrlen + sizeof(_icmph) +
- sizeof(*ipv6_var),
- &inside_nexthdr, &inside_fragoff);
- if (inside_hdrlen < 0)
- return 1; /* hjm: Packet has no/incomplete transport layer headers. */
-
- if (inside_nexthdr != IPPROTO_TCP &&
- inside_nexthdr != IPPROTO_UDP)
- return 1;
-
- ports = skb_header_pointer(skb, inside_hdrlen,
- sizeof(_ports), &_ports);
- if (ports == NULL)
- return 1;
-
- /* the inside IP packet is the one quoted from our side, thus
- * its saddr is the local address */
- *protocol = inside_nexthdr;
- *laddr = &inside_iph->saddr;
- *lport = ports[0];
- *raddr = &inside_iph->daddr;
- *rport = ports[1];
-
- return 0;
-}
-
-static struct sock *
-xt_socket_get_sock_v6(struct net *net, const u8 protocol,
- const struct in6_addr *saddr, const struct in6_addr *daddr,
- const __be16 sport, const __be16 dport,
- const struct net_device *in)
-{
- switch (protocol) {
- case IPPROTO_TCP:
- return inet6_lookup(net, &tcp_hashinfo,
- saddr, sport, daddr, dport,
- in->ifindex);
- case IPPROTO_UDP:
- return udp6_lib_lookup(net, saddr, sport, daddr, dport,
- in->ifindex);
- }
-
- return NULL;
-}
-
+#ifdef XT_HAVE_IPV6
static bool
socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
{
- struct ipv6hdr ipv6_var, *iph = ipv6_hdr(skb);
- struct udphdr _hdr, *hp = NULL;
- struct sock *sk = skb->sk;
- const struct in6_addr *daddr = NULL, *saddr = NULL;
- __be16 uninitialized_var(dport), uninitialized_var(sport);
- int thoff = 0, uninitialized_var(tproto);
const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
-
- tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
- if (tproto < 0) {
- pr_debug("unable to find transport header in IPv6 packet, dropping\n");
- return NF_DROP;
- }
-
- if (tproto == IPPROTO_UDP || tproto == IPPROTO_TCP) {
- hp = skb_header_pointer(skb, thoff,
- sizeof(_hdr), &_hdr);
- if (hp == NULL)
- return false;
-
- saddr = &iph->saddr;
- sport = hp->source;
- daddr = &iph->daddr;
- dport = hp->dest;
-
- } else if (tproto == IPPROTO_ICMPV6) {
- if (extract_icmp6_fields(skb, thoff, &tproto, &saddr, &daddr,
- &sport, &dport, &ipv6_var))
- return false;
- } else {
- return false;
- }
+ struct sock *sk = skb->sk;
if (!sk)
- sk = xt_socket_get_sock_v6(dev_net(skb->dev), tproto,
- saddr, daddr, sport, dport,
- par->in);
+ sk = xt_sk_lookup6(skb, par->in);
if (sk) {
- bool wildcard;
bool transparent = true;
+ bool wildcard;
/* Ignore sockets listening on INADDR_ANY
* unless XT_SOCKET_NOWILDCARD is set
@@ -391,13 +138,7 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
sk = NULL;
}
- pr_debug("proto %hhd %pI6:%hu -> %pI6:%hu "
- "(orig %pI6:%hu) sock %p\n",
- tproto, saddr, ntohs(sport),
- daddr, ntohs(dport),
- &iph->daddr, hp ? ntohs(hp->dest) : 0, sk);
-
- return (sk != NULL);
+ return sk != NULL;
}
#endif
@@ -409,6 +150,7 @@ static int socket_mt_v1_check(const struct xt_mtchk_param *par)
pr_info("unknown flags 0x%x\n", info->flags & ~XT_SOCKET_FLAGS_V1);
return -EINVAL;
}
+
return 0;
}
@@ -420,6 +162,7 @@ static int socket_mt_v2_check(const struct xt_mtchk_param *par)
pr_info("unknown flags 0x%x\n", info->flags & ~XT_SOCKET_FLAGS_V2);
return -EINVAL;
}
+
return 0;
}
@@ -444,7 +187,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
(1 << NF_INET_LOCAL_IN),
.me = THIS_MODULE,
},
-#ifdef XT_SOCKET_HAVE_IPV6
+#ifdef XT_HAVE_IPV6
{
.name = "socket",
.revision = 1,
@@ -468,7 +211,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
(1 << NF_INET_LOCAL_IN),
.me = THIS_MODULE,
},
-#ifdef XT_SOCKET_HAVE_IPV6
+#ifdef XT_HAVE_IPV6
{
.name = "socket",
.revision = 2,
@@ -486,7 +229,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
static int __init socket_mt_init(void)
{
nf_defrag_ipv4_enable();
-#ifdef XT_SOCKET_HAVE_IPV6
+#ifdef XT_HAVE_IPV6
nf_defrag_ipv6_enable();
#endif
--
1.9.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-24 15:30 [PATCH nf-next 0/2] xt_cgroups fix Daniel Borkmann
2015-03-24 15:30 ` [PATCH nf-next 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
@ 2015-03-24 15:30 ` Daniel Borkmann
2015-03-25 16:03 ` Pablo Neira Ayuso
2015-03-25 20:26 ` Pablo Neira Ayuso
2015-03-24 15:42 ` [PATCH nf-next 0/2] xt_cgroups fix Florian Westphal
2 siblings, 2 replies; 12+ messages in thread
From: Daniel Borkmann @ 2015-03-24 15:30 UTC (permalink / raw)
To: pablo; +Cc: daniel, fw, a.perevalov, netfilter-devel, Daniel Borkmann
While originally only being intended for outgoing traffic, commit
a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for
LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook
as well, in order to allow for nfacct accounting.
This basically was under the assumption that socket early demux will
resolve it. It's correct that demux happens after PRE_ROUTING, but
before LOCAL_IN.
However, that as-is only partially works, i.e. it works for the case
of established TCP and connected UDP sockets when early demux is
enabled, but not for various other ingress scenarios e.g. unconnected
UDP, request sockets, etc.
Instead of reverting commit a00e76349f35, I think it's worth to fix
it up as there are applications requiring xt_cgroup to match on
ingress and egress side. In order to do so, we need to perform a
full lookup on skb->sk (ingress) miss, similarly as being done in
xt_socket.
Therefore, we need to make use of shared helpers xt_sk_lookup() and
xt_sk_lookup6(). Thanks to Daniel for the report and also additional
testing.
Reported-by: Daniel Mack <daniel@zonque.org>
Fixes: a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks")
Reference: http://thread.gmane.org/gmane.linux.network/355527
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Mack <daniel@zonque.org>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Cc: Florian Westphal <fw@strlen.de>
---
net/netfilter/Kconfig | 5 +++
net/netfilter/xt_cgroup.c | 86 +++++++++++++++++++++++++++++++++++++----------
2 files changed, 73 insertions(+), 18 deletions(-)
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 971cd75..044bd22 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -960,8 +960,13 @@ config NETFILTER_XT_MATCH_BPF
config NETFILTER_XT_MATCH_CGROUP
tristate '"control group" match support'
+ depends on NETFILTER_XTABLES
depends on NETFILTER_ADVANCED
+ depends on !NF_CONNTRACK || NF_CONNTRACK
+ depends on (IPV6 || IPV6=n)
depends on CGROUPS
+ select NF_DEFRAG_IPV4
+ select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
select CGROUP_NET_CLASSID
---help---
Socket/process control group matching allows you to match locally
diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
index 7198d66..cd2468d 100644
--- a/net/netfilter/xt_cgroup.c
+++ b/net/netfilter/xt_cgroup.c
@@ -16,8 +16,11 @@
#include <linux/module.h>
#include <linux/netfilter/x_tables.h>
#include <linux/netfilter/xt_cgroup.h>
+
#include <net/sock.h>
+#include "xt_sk_helper.h"
+
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
MODULE_DESCRIPTION("Xtables: process control group matching");
@@ -34,38 +37,85 @@ static int cgroup_mt_check(const struct xt_mtchk_param *par)
return 0;
}
-static bool
-cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
+static bool cgroup_mt(const struct sk_buff *skb,
+ const struct xt_action_param *par,
+ struct sock *(*cgroup_mt_slow)(const struct sk_buff *skb,
+ const struct net_device *indev))
{
const struct xt_cgroup_info *info = par->matchinfo;
+ struct sock *sk = skb->sk;
+ u32 sk_classid;
+
+ if (sk) {
+ sk_classid = sk->sk_classid;
+ } else {
+ if (par->in != NULL)
+ sk = cgroup_mt_slow(skb, par->in);
+ if (sk == NULL)
+ return false;
+
+ sk_classid = sk->sk_classid;
+ sock_gen_put(sk);
+ }
+
+ return (info->id == sk_classid) ^ info->invert;
+}
- if (skb->sk == NULL)
- return false;
+static bool
+cgroup_mt_v4(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ return cgroup_mt(skb, par, xt_sk_lookup);
+}
- return (info->id == skb->sk->sk_classid) ^ info->invert;
+#ifdef XT_HAVE_IPV6
+static bool
+cgroup_mt_v6(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ return cgroup_mt(skb, par, xt_sk_lookup6);
}
+#endif
-static struct xt_match cgroup_mt_reg __read_mostly = {
- .name = "cgroup",
- .revision = 0,
- .family = NFPROTO_UNSPEC,
- .checkentry = cgroup_mt_check,
- .match = cgroup_mt,
- .matchsize = sizeof(struct xt_cgroup_info),
- .me = THIS_MODULE,
- .hooks = (1 << NF_INET_LOCAL_OUT) |
- (1 << NF_INET_POST_ROUTING) |
- (1 << NF_INET_LOCAL_IN),
+static struct xt_match cgroup_mt_reg[] __read_mostly = {
+ {
+ .name = "cgroup",
+ .revision = 0,
+ .family = NFPROTO_IPV4,
+ .checkentry = cgroup_mt_check,
+ .match = cgroup_mt_v4,
+ .matchsize = sizeof(struct xt_cgroup_info),
+ .me = THIS_MODULE,
+ .hooks = (1 << NF_INET_LOCAL_OUT) |
+ (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_IN),
+ },
+#ifdef XT_HAVE_IPV6
+ {
+ .name = "cgroup",
+ .revision = 0,
+ .family = NFPROTO_IPV6,
+ .checkentry = cgroup_mt_check,
+ .match = cgroup_mt_v6,
+ .matchsize = sizeof(struct xt_cgroup_info),
+ .me = THIS_MODULE,
+ .hooks = (1 << NF_INET_LOCAL_OUT) |
+ (1 << NF_INET_POST_ROUTING) |
+ (1 << NF_INET_LOCAL_IN),
+ }
+#endif
};
static int __init cgroup_mt_init(void)
{
- return xt_register_match(&cgroup_mt_reg);
+ nf_defrag_ipv4_enable();
+#ifdef XT_HAVE_IPV6
+ nf_defrag_ipv6_enable();
+#endif
+ return xt_register_matches(cgroup_mt_reg, ARRAY_SIZE(cgroup_mt_reg));
}
static void __exit cgroup_mt_exit(void)
{
- xt_unregister_match(&cgroup_mt_reg);
+ xt_unregister_matches(cgroup_mt_reg, ARRAY_SIZE(cgroup_mt_reg));
}
module_init(cgroup_mt_init);
--
1.9.3
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 0/2] xt_cgroups fix
2015-03-24 15:30 [PATCH nf-next 0/2] xt_cgroups fix Daniel Borkmann
2015-03-24 15:30 ` [PATCH nf-next 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
2015-03-24 15:30 ` [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
@ 2015-03-24 15:42 ` Florian Westphal
2015-03-24 15:58 ` Daniel Borkmann
2 siblings, 1 reply; 12+ messages in thread
From: Florian Westphal @ 2015-03-24 15:42 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: pablo, daniel, fw, a.perevalov, netfilter-devel
Daniel Borkmann <daniel@iogearbox.net> wrote:
> here's a possible fix for xt_cgroups that was previously reported
> by Daniel Mack.
>
> The first patch refactors common helpers, which is later on being
> used by the actual fix. Please see individual patches for more
> details.
>
> I have based the changes on nf-next as they're rather big, they
> are, however, on top of Eric's a94070000388 ("netfilter: xt_socket:
> prepare for TCP_NEW_SYN_RECV support") from net-next to avoid ugly
> merge conflicts in xt_socket.
>
> If you nevertheless think it's more suited for nf, or I should
> ignore the above conflicting commit, I'd be happy to rebase.
My main problem with these patches is that we're now paving the way
for skb->sk testing in input, i.e. doing protocol demuxing steps
in iptables modules.
E.g. why not accept similar patch for xt_owner?
What about sctp (or any other protocol) support?
I don't see anything wrong with the implementation per se but I'm afraid
we're starting down a slippery slope here.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 0/2] xt_cgroups fix
2015-03-24 15:42 ` [PATCH nf-next 0/2] xt_cgroups fix Florian Westphal
@ 2015-03-24 15:58 ` Daniel Borkmann
0 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2015-03-24 15:58 UTC (permalink / raw)
To: Florian Westphal; +Cc: pablo, daniel, a.perevalov, netfilter-devel
On 03/24/2015 04:42 PM, Florian Westphal wrote:
> Daniel Borkmann <daniel@iogearbox.net> wrote:
>> here's a possible fix for xt_cgroups that was previously reported
>> by Daniel Mack.
>>
>> The first patch refactors common helpers, which is later on being
>> used by the actual fix. Please see individual patches for more
>> details.
>>
>> I have based the changes on nf-next as they're rather big, they
>> are, however, on top of Eric's a94070000388 ("netfilter: xt_socket:
>> prepare for TCP_NEW_SYN_RECV support") from net-next to avoid ugly
>> merge conflicts in xt_socket.
>>
>> If you nevertheless think it's more suited for nf, or I should
>> ignore the above conflicting commit, I'd be happy to rebase.
>
> My main problem with these patches is that we're now paving the way
> for skb->sk testing in input, i.e. doing protocol demuxing steps
> in iptables modules.
Well, we're doing this in xt_socket already, here it's just fixing
a real issue in xt_cgroup. The only alternative I currently see
would be to revert Alexey's commit, but that would limit the scope
of possibilities for xt_cgroup quite a lot. :(
> E.g. why not accept similar patch for xt_owner?
> What about sctp (or any other protocol) support?
Right, I see your concern, but at the same time I think that i.e.
SCTP has much more severe problems, i) being the horrible state of
the stack implementation itself ;), ii) being the lack of more
important features in iptables (i.e. NAT on multi-homing). Anyway,
taken that aside, you could restrict that support, as we currently
do only for available early demuxes, but given that activity on
SCTP I truly doubt that's coming any time soon.
> I don't see anything wrong with the implementation per se but I'm afraid
> we're starting down a slippery slope here.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-24 15:30 ` [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
@ 2015-03-25 16:03 ` Pablo Neira Ayuso
2015-03-25 16:39 ` Daniel Borkmann
2015-03-25 20:26 ` Pablo Neira Ayuso
1 sibling, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-25 16:03 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel
On Tue, Mar 24, 2015 at 04:30:29PM +0100, Daniel Borkmann wrote:
> diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
> index 7198d66..cd2468d 100644
> --- a/net/netfilter/xt_cgroup.c
> +++ b/net/netfilter/xt_cgroup.c
> @@ -16,8 +16,11 @@
> #include <linux/module.h>
> #include <linux/netfilter/x_tables.h>
> #include <linux/netfilter/xt_cgroup.h>
> +
> #include <net/sock.h>
>
> +#include "xt_sk_helper.h"
> +
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
> MODULE_DESCRIPTION("Xtables: process control group matching");
> @@ -34,38 +37,85 @@ static int cgroup_mt_check(const struct xt_mtchk_param *par)
> return 0;
> }
>
> -static bool
> -cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
> +static bool cgroup_mt(const struct sk_buff *skb,
> + const struct xt_action_param *par,
> + struct sock *(*cgroup_mt_slow)(const struct sk_buff *skb,
> + const struct net_device *indev))
> {
> const struct xt_cgroup_info *info = par->matchinfo;
> + struct sock *sk = skb->sk;
> + u32 sk_classid;
> +
> + if (sk) {
> + sk_classid = sk->sk_classid;
> + } else {
> + if (par->in != NULL)
> + sk = cgroup_mt_slow(skb, par->in);
Is this working with timewait sock?
> + if (sk == NULL)
> + return false;
> +
> + sk_classid = sk->sk_classid;
> + sock_gen_put(sk);
> + }
> +
> + return (info->id == sk_classid) ^ info->invert;
> +}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-25 16:03 ` Pablo Neira Ayuso
@ 2015-03-25 16:39 ` Daniel Borkmann
2015-03-25 17:17 ` Pablo Neira Ayuso
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2015-03-25 16:39 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel
On 03/25/2015 05:03 PM, Pablo Neira Ayuso wrote:
> On Tue, Mar 24, 2015 at 04:30:29PM +0100, Daniel Borkmann wrote:
>> diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
>> index 7198d66..cd2468d 100644
>> --- a/net/netfilter/xt_cgroup.c
>> +++ b/net/netfilter/xt_cgroup.c
>> @@ -16,8 +16,11 @@
>> #include <linux/module.h>
>> #include <linux/netfilter/x_tables.h>
>> #include <linux/netfilter/xt_cgroup.h>
>> +
>> #include <net/sock.h>
>>
>> +#include "xt_sk_helper.h"
>> +
>> MODULE_LICENSE("GPL");
>> MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
>> MODULE_DESCRIPTION("Xtables: process control group matching");
>> @@ -34,38 +37,85 @@ static int cgroup_mt_check(const struct xt_mtchk_param *par)
>> return 0;
>> }
>>
>> -static bool
>> -cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
>> +static bool cgroup_mt(const struct sk_buff *skb,
>> + const struct xt_action_param *par,
>> + struct sock *(*cgroup_mt_slow)(const struct sk_buff *skb,
>> + const struct net_device *indev))
>> {
>> const struct xt_cgroup_info *info = par->matchinfo;
>> + struct sock *sk = skb->sk;
>> + u32 sk_classid;
>> +
>> + if (sk) {
>> + sk_classid = sk->sk_classid;
>> + } else {
>> + if (par->in != NULL)
>> + sk = cgroup_mt_slow(skb, par->in);
>
> Is this working with timewait sock?
Yes, all socket objects that are allocated (sk_alloc()) get a
sk_classid of the current task. Given that both share the same
lookup handler, we don't ignore them here as some xt_socket
flags could after the lookup optionally do.
>> + if (sk == NULL)
>> + return false;
>> +
>> + sk_classid = sk->sk_classid;
>> + sock_gen_put(sk);
>> + }
>> +
>> + return (info->id == sk_classid) ^ info->invert;
>> +}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-25 16:39 ` Daniel Borkmann
@ 2015-03-25 17:17 ` Pablo Neira Ayuso
2015-03-25 17:27 ` Daniel Borkmann
0 siblings, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-25 17:17 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel
On Wed, Mar 25, 2015 at 05:39:04PM +0100, Daniel Borkmann wrote:
> On 03/25/2015 05:03 PM, Pablo Neira Ayuso wrote:
> >On Tue, Mar 24, 2015 at 04:30:29PM +0100, Daniel Borkmann wrote:
> >>diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
> >>index 7198d66..cd2468d 100644
> >>--- a/net/netfilter/xt_cgroup.c
> >>+++ b/net/netfilter/xt_cgroup.c
> >>@@ -16,8 +16,11 @@
> >> #include <linux/module.h>
> >> #include <linux/netfilter/x_tables.h>
> >> #include <linux/netfilter/xt_cgroup.h>
> >>+
> >> #include <net/sock.h>
> >>
> >>+#include "xt_sk_helper.h"
> >>+
> >> MODULE_LICENSE("GPL");
> >> MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
> >> MODULE_DESCRIPTION("Xtables: process control group matching");
> >>@@ -34,38 +37,85 @@ static int cgroup_mt_check(const struct xt_mtchk_param *par)
> >> return 0;
> >> }
> >>
> >>-static bool
> >>-cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
> >>+static bool cgroup_mt(const struct sk_buff *skb,
> >>+ const struct xt_action_param *par,
> >>+ struct sock *(*cgroup_mt_slow)(const struct sk_buff *skb,
> >>+ const struct net_device *indev))
> >> {
> >> const struct xt_cgroup_info *info = par->matchinfo;
> >>+ struct sock *sk = skb->sk;
> >>+ u32 sk_classid;
> >>+
> >>+ if (sk) {
> >>+ sk_classid = sk->sk_classid;
> >>+ } else {
> >>+ if (par->in != NULL)
> >>+ sk = cgroup_mt_slow(skb, par->in);
> >
> >Is this working with timewait sock?
>
> Yes, all socket objects that are allocated (sk_alloc()) get a
> sk_classid of the current task. Given that both share the same
> lookup handler, we don't ignore them here as some xt_socket
> flags could after the lookup optionally do.
I mean, we may get a packet from the input path while in TIME_WAIT, and
sk will be actually a inet_timewait_sock, which has a different
layout (no sk_classid).
> >>+ if (sk == NULL)
> >>+ return false;
> >>+
> >>+ sk_classid = sk->sk_classid;
> >>+ sock_gen_put(sk);
> >>+ }
> >>+
> >>+ return (info->id == sk_classid) ^ info->invert;
> >>+}
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-25 17:17 ` Pablo Neira Ayuso
@ 2015-03-25 17:27 ` Daniel Borkmann
0 siblings, 0 replies; 12+ messages in thread
From: Daniel Borkmann @ 2015-03-25 17:27 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel
On 03/25/2015 06:17 PM, Pablo Neira Ayuso wrote:
...
> I mean, we may get a packet from the input path while in TIME_WAIT, and
> sk will be actually a inet_timewait_sock, which has a different
> layout (no sk_classid).
Sorry, you are correct, thanks for the hint. We would actually
need to test if we deal with a full socket, iow sk_fullsock(sk).
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-24 15:30 ` [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
2015-03-25 16:03 ` Pablo Neira Ayuso
@ 2015-03-25 20:26 ` Pablo Neira Ayuso
2015-03-25 21:34 ` Daniel Borkmann
1 sibling, 1 reply; 12+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-25 20:26 UTC (permalink / raw)
To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel
On Tue, Mar 24, 2015 at 04:30:29PM +0100, Daniel Borkmann wrote:
> While originally only being intended for outgoing traffic, commit
> a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for
> LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook
> as well, in order to allow for nfacct accounting.
>
> This basically was under the assumption that socket early demux will
> resolve it. It's correct that demux happens after PRE_ROUTING, but
> before LOCAL_IN.
>
> However, that as-is only partially works, i.e. it works for the case
> of established TCP and connected UDP sockets when early demux is
> enabled, but not for various other ingress scenarios e.g. unconnected
> UDP, request sockets, etc.
>
> Instead of reverting commit a00e76349f35, I think it's worth to fix
> it up as there are applications requiring xt_cgroup to match on
> ingress and egress side. In order to do so, we need to perform a
> full lookup on skb->sk (ingress) miss, similarly as being done in
> xt_socket.
>
> Therefore, we need to make use of shared helpers xt_sk_lookup() and
> xt_sk_lookup6(). Thanks to Daniel for the report and also additional
> testing.
So this is basically needed when early demux is disabled?
This is a rather large rework, I would like to know what scenarios
we're not currently catching with the existing code.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-25 20:26 ` Pablo Neira Ayuso
@ 2015-03-25 21:34 ` Daniel Borkmann
2015-03-25 21:54 ` Daniel Mack
0 siblings, 1 reply; 12+ messages in thread
From: Daniel Borkmann @ 2015-03-25 21:34 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel
On 03/25/2015 09:26 PM, Pablo Neira Ayuso wrote:
> On Tue, Mar 24, 2015 at 04:30:29PM +0100, Daniel Borkmann wrote:
>> While originally only being intended for outgoing traffic, commit
>> a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for
>> LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook
>> as well, in order to allow for nfacct accounting.
>>
>> This basically was under the assumption that socket early demux will
>> resolve it. It's correct that demux happens after PRE_ROUTING, but
>> before LOCAL_IN.
>>
>> However, that as-is only partially works, i.e. it works for the case
>> of established TCP and connected UDP sockets when early demux is
>> enabled, but not for various other ingress scenarios e.g. unconnected
>> UDP, request sockets, etc.
>>
>> Instead of reverting commit a00e76349f35, I think it's worth to fix
>> it up as there are applications requiring xt_cgroup to match on
>> ingress and egress side. In order to do so, we need to perform a
>> full lookup on skb->sk (ingress) miss, similarly as being done in
>> xt_socket.
>>
>> Therefore, we need to make use of shared helpers xt_sk_lookup() and
>> xt_sk_lookup6(). Thanks to Daniel for the report and also additional
>> testing.
>
> So this is basically needed when early demux is disabled?
>
> This is a rather large rework, I would like to know what scenarios
> we're not currently catching with the existing code.
Hm, perhaps Daniel can elaborate better, what I have seen in my
testing when xt_cgroup fails to match the cgroup on ingress traffic
is i) early demux sysctl disabled, ii) udp on unconnected sockets
(which I understand is the majority of udp traffic), iii) tcp and
udp (any kind) on localhost communications. Daniel's original report
can be found here [1].
Thanks,
Daniel
[1] http://thread.gmane.org/gmane.linux.network/355527
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
2015-03-25 21:34 ` Daniel Borkmann
@ 2015-03-25 21:54 ` Daniel Mack
0 siblings, 0 replies; 12+ messages in thread
From: Daniel Mack @ 2015-03-25 21:54 UTC (permalink / raw)
To: Daniel Borkmann, Pablo Neira Ayuso; +Cc: fw, a.perevalov, netfilter-devel
On 03/25/2015 10:34 PM, Daniel Borkmann wrote:
> On 03/25/2015 09:26 PM, Pablo Neira Ayuso wrote:
>> So this is basically needed when early demux is disabled?
>>
>> This is a rather large rework, I would like to know what scenarios
>> we're not currently catching with the existing code.
>
> Hm, perhaps Daniel can elaborate better, what I have seen in my
> testing when xt_cgroup fails to match the cgroup on ingress traffic
> is i) early demux sysctl disabled, ii) udp on unconnected sockets
> (which I understand is the majority of udp traffic), iii) tcp and
> udp (any kind) on localhost communications. Daniel's original report
> can be found here [1].
Currently, ingress matching fails if the xt_cgroup module's match
callback is called with skb->sk == NULL, which is the case in the
scenarios described above. Also, according to Cong, this is as well
always the case if the ingress network device is 'lo'.
We want to use xt_cgroup to realize a per-application firewall for both
filtering and accounting. For this, being able to catch every network
packet that is destined for or originated by a task that is assigned to
a certain net_cls CGroup controller is essential. Also, the match has to
be effective regardless of the network interface in use.
In my tests, Daniel's patches work perfectly fine.
Thanks,
Daniel
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2015-03-25 21:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-24 15:30 [PATCH nf-next 0/2] xt_cgroups fix Daniel Borkmann
2015-03-24 15:30 ` [PATCH nf-next 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
2015-03-24 15:30 ` [PATCH nf-next 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
2015-03-25 16:03 ` Pablo Neira Ayuso
2015-03-25 16:39 ` Daniel Borkmann
2015-03-25 17:17 ` Pablo Neira Ayuso
2015-03-25 17:27 ` Daniel Borkmann
2015-03-25 20:26 ` Pablo Neira Ayuso
2015-03-25 21:34 ` Daniel Borkmann
2015-03-25 21:54 ` Daniel Mack
2015-03-24 15:42 ` [PATCH nf-next 0/2] xt_cgroups fix Florian Westphal
2015-03-24 15:58 ` Daniel Borkmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).