[PATCH nf-next v2 0/2] xt

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH nf-next v2 0/2] xt_cgroups fix
@ 2015-03-26 19:14 Daniel Borkmann
  2015-03-26 19:14 ` [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-26 19:14 UTC (permalink / raw)
  To: pablo; +Cc: daniel, fw, a.perevalov, netfilter-devel, Daniel Borkmann

Hi Pablo,

here's a possible fix for xt_cgroups that was previously reported
by Daniel Mack.

I respinned the set based on your previous feedback wrt tw sockets.

The first patch refactors common helpers, which is later on being
used by the actual fix. Please see individual patches for details.

I have rebased it against nf-next as in the previous version.

Thanks a lot!

v1->v2:
  - patch1 as is
  - patch2 checks for full socket

Daniel Borkmann (2):
  netfilter: x_tables: refactor lookup helpers from xt_socket
  netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups

 net/netfilter/Kconfig        |   5 +
 net/netfilter/xt_cgroup.c    |  92 +++++++++++---
 net/netfilter/xt_sk_helper.h | 282 +++++++++++++++++++++++++++++++++++++++++
 net/netfilter/xt_socket.c    | 293 +++----------------------------------------
 4 files changed, 379 insertions(+), 293 deletions(-)
 create mode 100644 net/netfilter/xt_sk_helper.h

-- 
1.9.3


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket
  2015-03-26 19:14 [PATCH nf-next v2 0/2] xt_cgroups fix Daniel Borkmann
@ 2015-03-26 19:14 ` Daniel Borkmann
  2015-03-27  0:06   ` Pablo Neira Ayuso
  2015-03-26 19:14 ` [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
  2015-03-27  0:40 ` [PATCH nf-next v2 0/2] xt_cgroups fix Pablo Neira Ayuso
  2 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-26 19:14 UTC (permalink / raw)
  To: pablo; +Cc: daniel, fw, a.perevalov, netfilter-devel, Daniel Borkmann

The socket lookup helpers are also needed for fixing xt_cgroups,
therefore refactor them into shareable helper functions.

This simplifies and optimizes the xt_socket code itself a bit
as well, i.e. time to verdict for early demux sockets should be
much faster than previously:

We've unnecessarily extracted proto, {s,d}addr and {s,d}ports
from the skb data, accessing possible conntrack information,
etc even though we were not even calling into the socket lookup
via xt_socket_get_sock_v4() due to skb->sk hit.

After this patch, we only proceed the slow-path when we have an
actual skb->sk miss.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Daniel Mack <daniel@zonque.org>
Cc: Florian Westphal <fw@strlen.de>
---
 net/netfilter/xt_sk_helper.h | 282 +++++++++++++++++++++++++++++++++++++++++
 net/netfilter/xt_socket.c    | 293 +++----------------------------------------
 2 files changed, 300 insertions(+), 275 deletions(-)
 create mode 100644 net/netfilter/xt_sk_helper.h

diff --git a/net/netfilter/xt_sk_helper.h b/net/netfilter/xt_sk_helper.h
new file mode 100644
index 0000000..604b7ac
--- /dev/null
+++ b/net/netfilter/xt_sk_helper.h
@@ -0,0 +1,282 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (C) 2007-2008 BalaBit IT Ltd.
+ * Author: Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/skbuff.h>
+
+#include <net/tcp.h>
+#include <net/udp.h>
+#include <net/icmp.h>
+#include <net/sock.h>
+
+#include <net/netfilter/ipv4/nf_defrag_ipv4.h>
+
+#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
+#define XT_HAVE_IPV6 1
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#include <net/inet6_hashtables.h>
+#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
+#endif
+
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+#define XT_HAVE_CONNTRACK 1
+#include <net/netfilter/nf_conntrack.h>
+#endif
+
+static int extract_icmp4_fields(const struct sk_buff *skb, u8 *protocol,
+				__be32 *raddr, __be32 *laddr, __be16 *rport,
+				__be16 *lport)
+{
+	unsigned int outside_hdrlen = ip_hdrlen(skb);
+	struct iphdr *inside_iph, _inside_iph;
+	struct icmphdr *icmph, _icmph;
+	__be16 *ports, _ports[2];
+
+	icmph = skb_header_pointer(skb, outside_hdrlen,
+				   sizeof(_icmph), &_icmph);
+	if (icmph == NULL)
+		return 1;
+
+	switch (icmph->type) {
+	case ICMP_DEST_UNREACH:
+	case ICMP_SOURCE_QUENCH:
+	case ICMP_REDIRECT:
+	case ICMP_TIME_EXCEEDED:
+	case ICMP_PARAMETERPROB:
+		break;
+	default:
+		return 1;
+	}
+
+	inside_iph = skb_header_pointer(skb, outside_hdrlen +
+					sizeof(struct icmphdr),
+					sizeof(_inside_iph), &_inside_iph);
+	if (inside_iph == NULL)
+		return 1;
+
+	if (inside_iph->protocol != IPPROTO_TCP &&
+	    inside_iph->protocol != IPPROTO_UDP)
+		return 1;
+
+	ports = skb_header_pointer(skb, outside_hdrlen +
+				   sizeof(struct icmphdr) +
+				   (inside_iph->ihl << 2),
+				   sizeof(_ports), &_ports);
+	if (ports == NULL)
+		return 1;
+
+	/* The inside IP packet is the one quoted from our side, thus
+	 * its saddr is the local address.
+	 */
+
+	*protocol = inside_iph->protocol;
+	*laddr = inside_iph->saddr;
+	*lport = ports[0];
+	*raddr = inside_iph->daddr;
+	*rport = ports[1];
+
+	return 0;
+}
+
+static struct sock *xt_get_sk_v4(struct net *net, const u8 protocol,
+				 const __be32 saddr, const __be32 daddr,
+				 const __be16 sport, const __be16 dport,
+				 const struct net_device *in)
+{
+	switch (protocol) {
+	case IPPROTO_TCP:
+		return __inet_lookup(net, &tcp_hashinfo,
+				     saddr, sport, daddr, dport,
+				     in->ifindex);
+	case IPPROTO_UDP:
+		return udp4_lib_lookup(net, saddr, sport, daddr, dport,
+				       in->ifindex);
+	}
+
+	return NULL;
+}
+
+static struct sock *xt_sk_lookup(const struct sk_buff *skb,
+				 const struct net_device *indev)
+{
+	const struct iphdr *iph = ip_hdr(skb);
+	__be32 uninitialized_var(daddr), uninitialized_var(saddr);
+	__be16 uninitialized_var(dport), uninitialized_var(sport);
+	u8 uninitialized_var(protocol);
+#ifdef XT_HAVE_CONNTRACK
+	struct nf_conn const *ct;
+	enum ip_conntrack_info ctinfo;
+#endif
+
+	if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
+		struct udphdr _hdr, *hp = NULL;
+
+		hp = skb_header_pointer(skb, ip_hdrlen(skb),
+					sizeof(_hdr), &_hdr);
+		if (hp == NULL)
+			return NULL;
+
+		protocol = iph->protocol;
+
+		saddr = iph->saddr;
+		daddr = iph->daddr;
+
+		sport = hp->source;
+		dport = hp->dest;
+
+	} else if (iph->protocol == IPPROTO_ICMP) {
+		if (extract_icmp4_fields(skb, &protocol, &saddr, &daddr,
+					 &sport, &dport))
+			return NULL;
+	} else {
+		return NULL;
+	}
+
+#ifdef XT_HAVE_CONNTRACK
+	/* Do the lookup with the original socket address in
+	 * case this is a reply packet of an established
+	 * SNAT-ted connection.
+	 */
+	ct = nf_ct_get(skb, &ctinfo);
+	if (ct && !nf_ct_is_untracked(ct) &&
+	    ((iph->protocol != IPPROTO_ICMP &&
+	      ctinfo == IP_CT_ESTABLISHED_REPLY) ||
+	     (iph->protocol == IPPROTO_ICMP &&
+	      ctinfo == IP_CT_RELATED_REPLY)) &&
+	    (ct->status & IPS_SRC_NAT_DONE)) {
+
+		daddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
+		dport = (iph->protocol == IPPROTO_TCP) ?
+			ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port :
+			ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.udp.port;
+	}
+#endif
+	return xt_get_sk_v4(dev_net(skb->dev), protocol, saddr, daddr,
+			    sport, dport, indev);
+}
+
+#ifdef XT_HAVE_IPV6
+static int extract_icmp6_fields(const struct sk_buff *skb,
+				unsigned int outside_hdrlen,
+				int *protocol,
+				const struct in6_addr **raddr,
+				const struct in6_addr **laddr,
+				__be16 *rport, __be16 *lport,
+				struct ipv6hdr *ipv6_var)
+{
+	const struct ipv6hdr *inside_iph;
+	struct icmp6hdr *icmph, _icmph;
+	__be16 *ports, _ports[2];
+	u8 inside_nexthdr;
+	__be16 inside_fragoff;
+	int inside_hdrlen;
+
+	icmph = skb_header_pointer(skb, outside_hdrlen,
+				   sizeof(_icmph), &_icmph);
+	if (icmph == NULL)
+		return 1;
+
+	if (icmph->icmp6_type & ICMPV6_INFOMSG_MASK)
+		return 1;
+
+	inside_iph = skb_header_pointer(skb, outside_hdrlen + sizeof(_icmph),
+					sizeof(*ipv6_var), ipv6_var);
+	if (inside_iph == NULL)
+		return 1;
+
+	inside_nexthdr = inside_iph->nexthdr;
+	inside_hdrlen = ipv6_skip_exthdr(skb, outside_hdrlen + sizeof(_icmph) +
+					      sizeof(*ipv6_var),
+					 &inside_nexthdr, &inside_fragoff);
+	if (inside_hdrlen < 0)
+		return 1; /* hjm: Packet has no/incomplete transport layer headers. */
+
+	if (inside_nexthdr != IPPROTO_TCP &&
+	    inside_nexthdr != IPPROTO_UDP)
+		return 1;
+
+	ports = skb_header_pointer(skb, inside_hdrlen,
+				   sizeof(_ports), &_ports);
+	if (ports == NULL)
+		return 1;
+
+	/* The inside IP packet is the one quoted from our side, thus
+	 * its saddr is the local address.
+	 */
+
+	*protocol = inside_nexthdr;
+	*laddr = &inside_iph->saddr;
+	*lport = ports[0];
+	*raddr = &inside_iph->daddr;
+	*rport = ports[1];
+
+	return 0;
+}
+
+static struct sock *xt_get_sk_v6(struct net *net, const u8 protocol,
+				 const struct in6_addr *saddr,
+				 const struct in6_addr *daddr,
+				 const __be16 sport, const __be16 dport,
+				 const struct net_device *in)
+{
+	switch (protocol) {
+	case IPPROTO_TCP:
+		return inet6_lookup(net, &tcp_hashinfo,
+				    saddr, sport, daddr, dport,
+				    in->ifindex);
+	case IPPROTO_UDP:
+		return udp6_lib_lookup(net, saddr, sport, daddr, dport,
+				       in->ifindex);
+	}
+
+	return NULL;
+}
+
+static struct sock *xt_sk_lookup6(const struct sk_buff *skb,
+				  const struct net_device *indev)
+{
+	__be16 uninitialized_var(dport), uninitialized_var(sport);
+	const struct in6_addr *daddr = NULL, *saddr = NULL;
+	struct ipv6hdr *iph = ipv6_hdr(skb);
+	int thoff = 0, tproto;
+
+	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
+	if (tproto < 0) {
+		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
+		return NULL;
+	}
+
+	if (tproto == IPPROTO_UDP || tproto == IPPROTO_TCP) {
+		struct udphdr _hdr, *hp;
+
+		hp = skb_header_pointer(skb, thoff, sizeof(_hdr), &_hdr);
+		if (hp == NULL)
+			return NULL;
+
+		saddr = &iph->saddr;
+		daddr = &iph->daddr;
+
+		sport = hp->source;
+		dport = hp->dest;
+
+	} else if (tproto == IPPROTO_ICMPV6) {
+		struct ipv6hdr ipv6_var;
+
+		if (extract_icmp6_fields(skb, thoff, &tproto, &saddr, &daddr,
+					 &sport, &dport, &ipv6_var))
+			return NULL;
+	} else {
+		return NULL;
+	}
+
+	return xt_get_sk_v6(dev_net(skb->dev), tproto, saddr, daddr,
+			    sport, dport, indev);
+}
+#endif /* XT_HAVE_IPV6 */
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 895534e..570ae6e 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -7,90 +7,18 @@
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
- *
  */
+
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include <linux/module.h>
 #include <linux/skbuff.h>
-#include <linux/netfilter/x_tables.h>
-#include <linux/netfilter_ipv4/ip_tables.h>
-#include <net/tcp.h>
-#include <net/udp.h>
-#include <net/icmp.h>
-#include <net/sock.h>
-#include <net/inet_sock.h>
-#include <net/netfilter/ipv4/nf_defrag_ipv4.h>
-
-#if IS_ENABLED(CONFIG_IP6_NF_IPTABLES)
-#define XT_SOCKET_HAVE_IPV6 1
-#include <linux/netfilter_ipv6/ip6_tables.h>
-#include <net/inet6_hashtables.h>
-#include <net/netfilter/ipv6/nf_defrag_ipv6.h>
-#endif
 
+#include <linux/netfilter/x_tables.h>
 #include <linux/netfilter/xt_socket.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
 
-#if IS_ENABLED(CONFIG_NF_CONNTRACK)
-#define XT_SOCKET_HAVE_CONNTRACK 1
-#include <net/netfilter/nf_conntrack.h>
-#endif
-
-static int
-extract_icmp4_fields(const struct sk_buff *skb,
-		    u8 *protocol,
-		    __be32 *raddr,
-		    __be32 *laddr,
-		    __be16 *rport,
-		    __be16 *lport)
-{
-	unsigned int outside_hdrlen = ip_hdrlen(skb);
-	struct iphdr *inside_iph, _inside_iph;
-	struct icmphdr *icmph, _icmph;
-	__be16 *ports, _ports[2];
-
-	icmph = skb_header_pointer(skb, outside_hdrlen,
-				   sizeof(_icmph), &_icmph);
-	if (icmph == NULL)
-		return 1;
-
-	switch (icmph->type) {
-	case ICMP_DEST_UNREACH:
-	case ICMP_SOURCE_QUENCH:
-	case ICMP_REDIRECT:
-	case ICMP_TIME_EXCEEDED:
-	case ICMP_PARAMETERPROB:
-		break;
-	default:
-		return 1;
-	}
-
-	inside_iph = skb_header_pointer(skb, outside_hdrlen +
-					sizeof(struct icmphdr),
-					sizeof(_inside_iph), &_inside_iph);
-	if (inside_iph == NULL)
-		return 1;
-
-	if (inside_iph->protocol != IPPROTO_TCP &&
-	    inside_iph->protocol != IPPROTO_UDP)
-		return 1;
-
-	ports = skb_header_pointer(skb, outside_hdrlen +
-				   sizeof(struct icmphdr) +
-				   (inside_iph->ihl << 2),
-				   sizeof(_ports), &_ports);
-	if (ports == NULL)
-		return 1;
-
-	/* the inside IP packet is the one quoted from our side, thus
-	 * its saddr is the local address */
-	*protocol = inside_iph->protocol;
-	*laddr = inside_iph->saddr;
-	*lport = ports[0];
-	*raddr = inside_iph->daddr;
-	*rport = ports[1];
-
-	return 0;
-}
+#include "xt_sk_helper.h"
 
 /* "socket" match based redirection (no specific rule)
  * ===================================================
@@ -111,23 +39,6 @@ extract_icmp4_fields(const struct sk_buff *skb,
  *     then local services could intercept traffic going through the
  *     box.
  */
-static struct sock *
-xt_socket_get_sock_v4(struct net *net, const u8 protocol,
-		      const __be32 saddr, const __be32 daddr,
-		      const __be16 sport, const __be16 dport,
-		      const struct net_device *in)
-{
-	switch (protocol) {
-	case IPPROTO_TCP:
-		return __inet_lookup(net, &tcp_hashinfo,
-				     saddr, sport, daddr, dport,
-				     in->ifindex);
-	case IPPROTO_UDP:
-		return udp4_lib_lookup(net, saddr, sport, daddr, dport,
-				       in->ifindex);
-	}
-	return NULL;
-}
 
 static bool xt_socket_sk_is_transparent(struct sock *sk)
 {
@@ -147,63 +58,13 @@ static bool
 socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 	     const struct xt_socket_mtinfo1 *info)
 {
-	const struct iphdr *iph = ip_hdr(skb);
-	struct udphdr _hdr, *hp = NULL;
 	struct sock *sk = skb->sk;
-	__be32 uninitialized_var(daddr), uninitialized_var(saddr);
-	__be16 uninitialized_var(dport), uninitialized_var(sport);
-	u8 uninitialized_var(protocol);
-#ifdef XT_SOCKET_HAVE_CONNTRACK
-	struct nf_conn const *ct;
-	enum ip_conntrack_info ctinfo;
-#endif
-
-	if (iph->protocol == IPPROTO_UDP || iph->protocol == IPPROTO_TCP) {
-		hp = skb_header_pointer(skb, ip_hdrlen(skb),
-					sizeof(_hdr), &_hdr);
-		if (hp == NULL)
-			return false;
-
-		protocol = iph->protocol;
-		saddr = iph->saddr;
-		sport = hp->source;
-		daddr = iph->daddr;
-		dport = hp->dest;
-
-	} else if (iph->protocol == IPPROTO_ICMP) {
-		if (extract_icmp4_fields(skb, &protocol, &saddr, &daddr,
-					&sport, &dport))
-			return false;
-	} else {
-		return false;
-	}
-
-#ifdef XT_SOCKET_HAVE_CONNTRACK
-	/* Do the lookup with the original socket address in case this is a
-	 * reply packet of an established SNAT-ted connection. */
-
-	ct = nf_ct_get(skb, &ctinfo);
-	if (ct && !nf_ct_is_untracked(ct) &&
-	    ((iph->protocol != IPPROTO_ICMP &&
-	      ctinfo == IP_CT_ESTABLISHED_REPLY) ||
-	     (iph->protocol == IPPROTO_ICMP &&
-	      ctinfo == IP_CT_RELATED_REPLY)) &&
-	    (ct->status & IPS_SRC_NAT_DONE)) {
-
-		daddr = ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.ip;
-		dport = (iph->protocol == IPPROTO_TCP) ?
-			ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port :
-			ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.udp.port;
-	}
-#endif
 
 	if (!sk)
-		sk = xt_socket_get_sock_v4(dev_net(skb->dev), protocol,
-					   saddr, daddr, sport, dport,
-					   par->in);
+		sk = xt_sk_lookup(skb, par->in);
 	if (sk) {
-		bool wildcard;
 		bool transparent = true;
+		bool wildcard;
 
 		/* Ignore sockets listening on INADDR_ANY,
 		 * unless XT_SOCKET_NOWILDCARD is set
@@ -225,12 +86,7 @@ socket_match(const struct sk_buff *skb, struct xt_action_param *par,
 			sk = NULL;
 	}
 
-	pr_debug("proto %hhu %pI4:%hu -> %pI4:%hu (orig %pI4:%hu) sock %p\n",
-		 protocol, &saddr, ntohs(sport),
-		 &daddr, ntohs(dport),
-		 &iph->daddr, hp ? ntohs(hp->dest) : 0, sk);
-
-	return (sk != NULL);
+	return sk != NULL;
 }
 
 static bool
@@ -249,127 +105,18 @@ socket_mt4_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
 	return socket_match(skb, par, par->matchinfo);
 }
 
-#ifdef XT_SOCKET_HAVE_IPV6
-
-static int
-extract_icmp6_fields(const struct sk_buff *skb,
-		     unsigned int outside_hdrlen,
-		     int *protocol,
-		     const struct in6_addr **raddr,
-		     const struct in6_addr **laddr,
-		     __be16 *rport,
-		     __be16 *lport,
-		     struct ipv6hdr *ipv6_var)
-{
-	const struct ipv6hdr *inside_iph;
-	struct icmp6hdr *icmph, _icmph;
-	__be16 *ports, _ports[2];
-	u8 inside_nexthdr;
-	__be16 inside_fragoff;
-	int inside_hdrlen;
-
-	icmph = skb_header_pointer(skb, outside_hdrlen,
-				   sizeof(_icmph), &_icmph);
-	if (icmph == NULL)
-		return 1;
-
-	if (icmph->icmp6_type & ICMPV6_INFOMSG_MASK)
-		return 1;
-
-	inside_iph = skb_header_pointer(skb, outside_hdrlen + sizeof(_icmph),
-					sizeof(*ipv6_var), ipv6_var);
-	if (inside_iph == NULL)
-		return 1;
-	inside_nexthdr = inside_iph->nexthdr;
-
-	inside_hdrlen = ipv6_skip_exthdr(skb, outside_hdrlen + sizeof(_icmph) +
-					      sizeof(*ipv6_var),
-					 &inside_nexthdr, &inside_fragoff);
-	if (inside_hdrlen < 0)
-		return 1; /* hjm: Packet has no/incomplete transport layer headers. */
-
-	if (inside_nexthdr != IPPROTO_TCP &&
-	    inside_nexthdr != IPPROTO_UDP)
-		return 1;
-
-	ports = skb_header_pointer(skb, inside_hdrlen,
-				   sizeof(_ports), &_ports);
-	if (ports == NULL)
-		return 1;
-
-	/* the inside IP packet is the one quoted from our side, thus
-	 * its saddr is the local address */
-	*protocol = inside_nexthdr;
-	*laddr = &inside_iph->saddr;
-	*lport = ports[0];
-	*raddr = &inside_iph->daddr;
-	*rport = ports[1];
-
-	return 0;
-}
-
-static struct sock *
-xt_socket_get_sock_v6(struct net *net, const u8 protocol,
-		      const struct in6_addr *saddr, const struct in6_addr *daddr,
-		      const __be16 sport, const __be16 dport,
-		      const struct net_device *in)
-{
-	switch (protocol) {
-	case IPPROTO_TCP:
-		return inet6_lookup(net, &tcp_hashinfo,
-				    saddr, sport, daddr, dport,
-				    in->ifindex);
-	case IPPROTO_UDP:
-		return udp6_lib_lookup(net, saddr, sport, daddr, dport,
-				       in->ifindex);
-	}
-
-	return NULL;
-}
-
+#ifdef XT_HAVE_IPV6
 static bool
 socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
 {
-	struct ipv6hdr ipv6_var, *iph = ipv6_hdr(skb);
-	struct udphdr _hdr, *hp = NULL;
-	struct sock *sk = skb->sk;
-	const struct in6_addr *daddr = NULL, *saddr = NULL;
-	__be16 uninitialized_var(dport), uninitialized_var(sport);
-	int thoff = 0, uninitialized_var(tproto);
 	const struct xt_socket_mtinfo1 *info = (struct xt_socket_mtinfo1 *) par->matchinfo;
-
-	tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
-	if (tproto < 0) {
-		pr_debug("unable to find transport header in IPv6 packet, dropping\n");
-		return NF_DROP;
-	}
-
-	if (tproto == IPPROTO_UDP || tproto == IPPROTO_TCP) {
-		hp = skb_header_pointer(skb, thoff,
-					sizeof(_hdr), &_hdr);
-		if (hp == NULL)
-			return false;
-
-		saddr = &iph->saddr;
-		sport = hp->source;
-		daddr = &iph->daddr;
-		dport = hp->dest;
-
-	} else if (tproto == IPPROTO_ICMPV6) {
-		if (extract_icmp6_fields(skb, thoff, &tproto, &saddr, &daddr,
-					 &sport, &dport, &ipv6_var))
-			return false;
-	} else {
-		return false;
-	}
+	struct sock *sk = skb->sk;
 
 	if (!sk)
-		sk = xt_socket_get_sock_v6(dev_net(skb->dev), tproto,
-					   saddr, daddr, sport, dport,
-					   par->in);
+		sk = xt_sk_lookup6(skb, par->in);
 	if (sk) {
-		bool wildcard;
 		bool transparent = true;
+		bool wildcard;
 
 		/* Ignore sockets listening on INADDR_ANY
 		 * unless XT_SOCKET_NOWILDCARD is set
@@ -391,13 +138,7 @@ socket_mt6_v1_v2(const struct sk_buff *skb, struct xt_action_param *par)
 			sk = NULL;
 	}
 
-	pr_debug("proto %hhd %pI6:%hu -> %pI6:%hu "
-		 "(orig %pI6:%hu) sock %p\n",
-		 tproto, saddr, ntohs(sport),
-		 daddr, ntohs(dport),
-		 &iph->daddr, hp ? ntohs(hp->dest) : 0, sk);
-
-	return (sk != NULL);
+	return sk != NULL;
 }
 #endif
 
@@ -409,6 +150,7 @@ static int socket_mt_v1_check(const struct xt_mtchk_param *par)
 		pr_info("unknown flags 0x%x\n", info->flags & ~XT_SOCKET_FLAGS_V1);
 		return -EINVAL;
 	}
+
 	return 0;
 }
 
@@ -420,6 +162,7 @@ static int socket_mt_v2_check(const struct xt_mtchk_param *par)
 		pr_info("unknown flags 0x%x\n", info->flags & ~XT_SOCKET_FLAGS_V2);
 		return -EINVAL;
 	}
+
 	return 0;
 }
 
@@ -444,7 +187,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
 				  (1 << NF_INET_LOCAL_IN),
 		.me		= THIS_MODULE,
 	},
-#ifdef XT_SOCKET_HAVE_IPV6
+#ifdef XT_HAVE_IPV6
 	{
 		.name		= "socket",
 		.revision	= 1,
@@ -468,7 +211,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
 				  (1 << NF_INET_LOCAL_IN),
 		.me		= THIS_MODULE,
 	},
-#ifdef XT_SOCKET_HAVE_IPV6
+#ifdef XT_HAVE_IPV6
 	{
 		.name		= "socket",
 		.revision	= 2,
@@ -486,7 +229,7 @@ static struct xt_match socket_mt_reg[] __read_mostly = {
 static int __init socket_mt_init(void)
 {
 	nf_defrag_ipv4_enable();
-#ifdef XT_SOCKET_HAVE_IPV6
+#ifdef XT_HAVE_IPV6
 	nf_defrag_ipv6_enable();
 #endif
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket
  2015-03-26 19:14 ` [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
@ 2015-03-27  0:06   ` Pablo Neira Ayuso
  2015-03-27  8:18     ` Daniel Borkmann
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-27  0:06 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel

On Thu, Mar 26, 2015 at 08:14:47PM +0100, Daniel Borkmann wrote:
> The socket lookup helpers are also needed for fixing xt_cgroups,
> therefore refactor them into shareable helper functions.
> 
> This simplifies and optimizes the xt_socket code itself a bit
> as well, i.e. time to verdict for early demux sockets should be
> much faster than previously:
> 
> We've unnecessarily extracted proto, {s,d}addr and {s,d}ports
> from the skb data, accessing possible conntrack information,
> etc even though we were not even calling into the socket lookup
> via xt_socket_get_sock_v4() due to skb->sk hit.
> 
> After this patch, we only proceed the slow-path when we have an
> actual skb->sk miss.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Daniel Mack <daniel@zonque.org>
> Cc: Florian Westphal <fw@strlen.de>
> ---
>  net/netfilter/xt_sk_helper.h | 282 +++++++++++++++++++++++++++++++++++++++++
>  net/netfilter/xt_socket.c    | 293 +++----------------------------------------
>  2 files changed, 300 insertions(+), 275 deletions(-)
>  create mode 100644 net/netfilter/xt_sk_helper.h
> 
> diff --git a/net/netfilter/xt_sk_helper.h b/net/netfilter/xt_sk_helper.h
> new file mode 100644
> index 0000000..604b7ac
> --- /dev/null
> +++ b/net/netfilter/xt_sk_helper.h

Please, no code in a header file. Instead split the content of this
file in two:

* net/ipv4/netfilter/nf_sock_ipv4.c
* net/ipv6/netfilter/nf_sock_ipv6.c

You will have the corresponding Kconfig and Makefile trickery too.

Also rename all those functions to the prefix nf_sock_*

The Kconfig for xt_socket should contain:

select NF_SOCK_IPV4
select NF_SOCK_IPV6 if IP6_NF_IPTABLES

This is how we're doing with other extensions to share code between xt
and nft, you will help us if you do it like that.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket
  2015-03-27  0:06   ` Pablo Neira Ayuso
@ 2015-03-27  8:18     ` Daniel Borkmann
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-27  8:18 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel

On 03/27/2015 01:06 AM, Pablo Neira Ayuso wrote:
> On Thu, Mar 26, 2015 at 08:14:47PM +0100, Daniel Borkmann wrote:
>> The socket lookup helpers are also needed for fixing xt_cgroups,
>> therefore refactor them into shareable helper functions.
>>
>> This simplifies and optimizes the xt_socket code itself a bit
>> as well, i.e. time to verdict for early demux sockets should be
>> much faster than previously:
>>
>> We've unnecessarily extracted proto, {s,d}addr and {s,d}ports
>> from the skb data, accessing possible conntrack information,
>> etc even though we were not even calling into the socket lookup
>> via xt_socket_get_sock_v4() due to skb->sk hit.
>>
>> After this patch, we only proceed the slow-path when we have an
>> actual skb->sk miss.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Cc: Daniel Mack <daniel@zonque.org>
>> Cc: Florian Westphal <fw@strlen.de>
>> ---
>>   net/netfilter/xt_sk_helper.h | 282 +++++++++++++++++++++++++++++++++++++++++
>>   net/netfilter/xt_socket.c    | 293 +++----------------------------------------
>>   2 files changed, 300 insertions(+), 275 deletions(-)
>>   create mode 100644 net/netfilter/xt_sk_helper.h
>>
>> diff --git a/net/netfilter/xt_sk_helper.h b/net/netfilter/xt_sk_helper.h
>> new file mode 100644
>> index 0000000..604b7ac
>> --- /dev/null
>> +++ b/net/netfilter/xt_sk_helper.h
>
> Please, no code in a header file. Instead split the content of this
> file in two:
>
> * net/ipv4/netfilter/nf_sock_ipv4.c
> * net/ipv6/netfilter/nf_sock_ipv6.c
>
> You will have the corresponding Kconfig and Makefile trickery too.
>
> Also rename all those functions to the prefix nf_sock_*
>
> The Kconfig for xt_socket should contain:
>
> select NF_SOCK_IPV4
> select NF_SOCK_IPV6 if IP6_NF_IPTABLES
>
> This is how we're doing with other extensions to share code between xt
> and nft, you will help us if you do it like that.

Okay, sure. I wasn't aware of that, but it sounds like a better
way to go and would also ease the migration of xt_socket into
nft, which is even better. Will do.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
  2015-03-26 19:14 [PATCH nf-next v2 0/2] xt_cgroups fix Daniel Borkmann
  2015-03-26 19:14 ` [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
@ 2015-03-26 19:14 ` Daniel Borkmann
  2015-03-27  0:14   ` Pablo Neira Ayuso
  2015-03-27  0:40 ` [PATCH nf-next v2 0/2] xt_cgroups fix Pablo Neira Ayuso
  2 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-26 19:14 UTC (permalink / raw)
  To: pablo; +Cc: daniel, fw, a.perevalov, netfilter-devel, Daniel Borkmann

While originally only being intended for outgoing traffic, commit
a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for
LOCAL_IN nf hooks") enabled xt_cgroups for the NF_INET_LOCAL_IN hook
as well, in order to allow for nfacct accounting.

This basically was under the assumption that socket early demux will
resolve it. It's correct that demux happens after PRE_ROUTING, but
before LOCAL_IN.

However, that as-is only partially works, i.e. it works for the case
of established TCP and connected UDP sockets when early demux is
enabled, but not for various other ingress scenarios: i) early demux
disabled (sysctl), ii) udp on unconnected sockets, iii) tcp and udp
(any kind) on localhost communications.

Instead of reverting commit a00e76349f35, I think it's worth to fix
it up as there are applications requiring xt_cgroup to work/match on
ingress and egress side. In order to do so, we need to perform a
lookup on skb->sk ingress miss, similarly as being done in xt_socket.

Therefore, we need to make use of shared helpers xt_sk_lookup() and
xt_sk_lookup6(). Thanks to Daniel for the report and also additional
testing.

Reported-by: Daniel Mack <daniel@zonque.org>
Fixes: a00e76349f35 ("netfilter: x_tables: allow to use cgroup match for LOCAL_IN nf hooks")
Reference: http://thread.gmane.org/gmane.linux.network/355527
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Daniel Mack <daniel@zonque.org>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Cc: Florian Westphal <fw@strlen.de>
---
 net/netfilter/Kconfig     |  5 +++
 net/netfilter/xt_cgroup.c | 92 +++++++++++++++++++++++++++++++++++++----------
 2 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 971cd75..044bd22 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -960,8 +960,13 @@ config NETFILTER_XT_MATCH_BPF
 
 config NETFILTER_XT_MATCH_CGROUP
 	tristate '"control group" match support'
+	depends on NETFILTER_XTABLES
 	depends on NETFILTER_ADVANCED
+	depends on !NF_CONNTRACK || NF_CONNTRACK
+	depends on (IPV6 || IPV6=n)
 	depends on CGROUPS
+	select NF_DEFRAG_IPV4
+	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
 	select CGROUP_NET_CLASSID
 	---help---
 	Socket/process control group matching allows you to match locally
diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
index 7198d66..17f5a98 100644
--- a/net/netfilter/xt_cgroup.c
+++ b/net/netfilter/xt_cgroup.c
@@ -16,14 +16,20 @@
 #include <linux/module.h>
 #include <linux/netfilter/x_tables.h>
 #include <linux/netfilter/xt_cgroup.h>
+
 #include <net/sock.h>
 
+#include "xt_sk_helper.h"
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
 MODULE_DESCRIPTION("Xtables: process control group matching");
 MODULE_ALIAS("ipt_cgroup");
 MODULE_ALIAS("ip6t_cgroup");
 
+typedef struct sock *(*cgroup_lookup_t)(const struct sk_buff *skb,
+				        const struct net_device *indev);
+
 static int cgroup_mt_check(const struct xt_mtchk_param *par)
 {
 	struct xt_cgroup_info *info = par->matchinfo;
@@ -34,38 +40,88 @@ static int cgroup_mt_check(const struct xt_mtchk_param *par)
 	return 0;
 }
 
-static bool
-cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
+static bool cgroup_mt(const struct sk_buff *skb,
+		      const struct xt_action_param *par,
+		      cgroup_lookup_t cgroup_mt_slow)
 {
 	const struct xt_cgroup_info *info = par->matchinfo;
+	struct sock *sk = skb->sk;
+	u32 sk_classid;
 
-	if (skb->sk == NULL)
-		return false;
+	if (sk) {
+		sk_classid = sk->sk_classid;
+	} else {
+		if (par->in != NULL)
+			sk = cgroup_mt_slow(skb, par->in);
+		if (sk == NULL)
+			return false;
+		if (!sk_fullsock(sk)) {
+			sock_gen_put(sk);
+			return false;
+		}
+
+		sk_classid = sk->sk_classid;
+		sock_gen_put(sk);
+	}
+
+	return (info->id == sk_classid) ^ info->invert;
+}
 
-	return (info->id == skb->sk->sk_classid) ^ info->invert;
+static bool
+cgroup_mt_v4(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	return cgroup_mt(skb, par, xt_sk_lookup);
+}
+
+#ifdef XT_HAVE_IPV6
+static bool
+cgroup_mt_v6(const struct sk_buff *skb, struct xt_action_param *par)
+{
+	return cgroup_mt(skb, par, xt_sk_lookup6);
 }
+#endif
 
-static struct xt_match cgroup_mt_reg __read_mostly = {
-	.name       = "cgroup",
-	.revision   = 0,
-	.family     = NFPROTO_UNSPEC,
-	.checkentry = cgroup_mt_check,
-	.match      = cgroup_mt,
-	.matchsize  = sizeof(struct xt_cgroup_info),
-	.me         = THIS_MODULE,
-	.hooks      = (1 << NF_INET_LOCAL_OUT) |
-		      (1 << NF_INET_POST_ROUTING) |
-		      (1 << NF_INET_LOCAL_IN),
+static struct xt_match cgroup_mt_reg[] __read_mostly = {
+	{
+		.name       = "cgroup",
+		.revision   = 0,
+		.family     = NFPROTO_IPV4,
+		.checkentry = cgroup_mt_check,
+		.match      = cgroup_mt_v4,
+		.matchsize  = sizeof(struct xt_cgroup_info),
+		.me         = THIS_MODULE,
+		.hooks      = (1 << NF_INET_LOCAL_OUT) |
+			      (1 << NF_INET_POST_ROUTING) |
+			      (1 << NF_INET_LOCAL_IN),
+	},
+#ifdef XT_HAVE_IPV6
+	{
+		.name       = "cgroup",
+		.revision   = 0,
+		.family     = NFPROTO_IPV6,
+		.checkentry = cgroup_mt_check,
+		.match      = cgroup_mt_v6,
+		.matchsize  = sizeof(struct xt_cgroup_info),
+		.me         = THIS_MODULE,
+		.hooks      = (1 << NF_INET_LOCAL_OUT) |
+			      (1 << NF_INET_POST_ROUTING) |
+			      (1 << NF_INET_LOCAL_IN),
+	}
+#endif
 };
 
 static int __init cgroup_mt_init(void)
 {
-	return xt_register_match(&cgroup_mt_reg);
+	nf_defrag_ipv4_enable();
+#ifdef XT_HAVE_IPV6
+	nf_defrag_ipv6_enable();
+#endif
+	return xt_register_matches(cgroup_mt_reg, ARRAY_SIZE(cgroup_mt_reg));
 }
 
 static void __exit cgroup_mt_exit(void)
 {
-	xt_unregister_match(&cgroup_mt_reg);
+	xt_unregister_matches(cgroup_mt_reg, ARRAY_SIZE(cgroup_mt_reg));
 }
 
 module_init(cgroup_mt_init);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
  2015-03-26 19:14 ` [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
@ 2015-03-27  0:14   ` Pablo Neira Ayuso
  2015-03-27  2:10     ` Pablo Neira Ayuso
  2015-03-27  8:40     ` Daniel Borkmann
  0 siblings, 2 replies; 13+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-27  0:14 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel

On Thu, Mar 26, 2015 at 08:14:48PM +0100, Daniel Borkmann wrote:
[...]
> However, that as-is only partially works, i.e. it works for the case
> of established TCP and connected UDP sockets when early demux is
> enabled, but not for various other ingress scenarios: i) early demux
> disabled (sysctl), ii) udp on unconnected sockets, iii) tcp and udp
> (any kind) on localhost communications.

This extension has been around since Dec 2013, I'd rather see a new
revision that includes an option --lookup-sock.

More comments below.

>  net/netfilter/Kconfig     |  5 +++
>  net/netfilter/xt_cgroup.c | 92 +++++++++++++++++++++++++++++++++++++----------
>  2 files changed, 79 insertions(+), 18 deletions(-)
> 
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index 971cd75..044bd22 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -960,8 +960,13 @@ config NETFILTER_XT_MATCH_BPF
>  
>  config NETFILTER_XT_MATCH_CGROUP
>  	tristate '"control group" match support'
> +	depends on NETFILTER_XTABLES

why this? I think NETFILTER_ADVANCED is sufficient.

>  	depends on NETFILTER_ADVANCED
> +	depends on !NF_CONNTRACK || NF_CONNTRACK

why conntrack?

> +	depends on (IPV6 || IPV6=n)

Do we depend on any ipv6 symbol?

>  	depends on CGROUPS
> +	select NF_DEFRAG_IPV4
> +	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES

No need for defrag either.

Please, revisit the Kconfig trickery.

>  	select CGROUP_NET_CLASSID
>  	---help---
>  	Socket/process control group matching allows you to match locally
> diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
> index 7198d66..17f5a98 100644
> --- a/net/netfilter/xt_cgroup.c
> +++ b/net/netfilter/xt_cgroup.c
> @@ -16,14 +16,20 @@
>  #include <linux/module.h>
>  #include <linux/netfilter/x_tables.h>
>  #include <linux/netfilter/xt_cgroup.h>
> +
>  #include <net/sock.h>
>  
> +#include "xt_sk_helper.h"
> +
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Daniel Borkmann <dborkman@redhat.com>");
>  MODULE_DESCRIPTION("Xtables: process control group matching");
>  MODULE_ALIAS("ipt_cgroup");
>  MODULE_ALIAS("ip6t_cgroup");
>  
> +typedef struct sock *(*cgroup_lookup_t)(const struct sk_buff *skb,
> +				        const struct net_device *indev);
> +
>  static int cgroup_mt_check(const struct xt_mtchk_param *par)
>  {
>  	struct xt_cgroup_info *info = par->matchinfo;
> @@ -34,38 +40,88 @@ static int cgroup_mt_check(const struct xt_mtchk_param *par)
>  	return 0;
>  }
>  
> -static bool
> -cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
> +static bool cgroup_mt(const struct sk_buff *skb,
> +		      const struct xt_action_param *par,
> +		      cgroup_lookup_t cgroup_mt_slow)
>  {
>  	const struct xt_cgroup_info *info = par->matchinfo;
> +	struct sock *sk = skb->sk;
> +	u32 sk_classid;
>  
> -	if (skb->sk == NULL)
> -		return false;
> +	if (sk) {
> +		sk_classid = sk->sk_classid;
> +	} else {
> +		if (par->in != NULL)
> +			sk = cgroup_mt_slow(skb, par->in);
> +		if (sk == NULL)
> +			return false;
> +		if (!sk_fullsock(sk)) {
> +			sock_gen_put(sk);
> +			return false;
> +		}
> +
> +		sk_classid = sk->sk_classid;
> +		sock_gen_put(sk);
> +	}
> +
> +	return (info->id == sk_classid) ^ info->invert;
> +}
>  
> -	return (info->id == skb->sk->sk_classid) ^ info->invert;
> +static bool
> +cgroup_mt_v4(const struct sk_buff *skb, struct xt_action_param *par)
> +{
> +	return cgroup_mt(skb, par, xt_sk_lookup);
> +}
> +
> +#ifdef XT_HAVE_IPV6

Please, kill this custom XT_HAVE_IPV6 and now use IS_ENABLED(NF_SOCK_IPV6)

> +static bool
> +cgroup_mt_v6(const struct sk_buff *skb, struct xt_action_param *par)
> +{
> +	return cgroup_mt(skb, par, xt_sk_lookup6);
>  }
> +#endif
>  
> -static struct xt_match cgroup_mt_reg __read_mostly = {
> -	.name       = "cgroup",
> -	.revision   = 0,
> -	.family     = NFPROTO_UNSPEC,
> -	.checkentry = cgroup_mt_check,
> -	.match      = cgroup_mt,
> -	.matchsize  = sizeof(struct xt_cgroup_info),
> -	.me         = THIS_MODULE,
> -	.hooks      = (1 << NF_INET_LOCAL_OUT) |
> -		      (1 << NF_INET_POST_ROUTING) |
> -		      (1 << NF_INET_LOCAL_IN),
> +static struct xt_match cgroup_mt_reg[] __read_mostly = {
> +	{
> +		.name       = "cgroup",
> +		.revision   = 0,
> +		.family     = NFPROTO_IPV4,
> +		.checkentry = cgroup_mt_check,
> +		.match      = cgroup_mt_v4,
> +		.matchsize  = sizeof(struct xt_cgroup_info),
> +		.me         = THIS_MODULE,
> +		.hooks      = (1 << NF_INET_LOCAL_OUT) |
> +			      (1 << NF_INET_POST_ROUTING) |
> +			      (1 << NF_INET_LOCAL_IN),
> +	},
> +#ifdef XT_HAVE_IPV6
> +	{
> +		.name       = "cgroup",
> +		.revision   = 0,
> +		.family     = NFPROTO_IPV6,
> +		.checkentry = cgroup_mt_check,
> +		.match      = cgroup_mt_v6,
> +		.matchsize  = sizeof(struct xt_cgroup_info),
> +		.me         = THIS_MODULE,
> +		.hooks      = (1 << NF_INET_LOCAL_OUT) |
> +			      (1 << NF_INET_POST_ROUTING) |
> +			      (1 << NF_INET_LOCAL_IN),
> +	}
> +#endif
>  };
>  
>  static int __init cgroup_mt_init(void)
>  {
> -	return xt_register_match(&cgroup_mt_reg);
> +	nf_defrag_ipv4_enable();

Why did you add this?

> +#ifdef XT_HAVE_IPV6
> +	nf_defrag_ipv6_enable();
> +#endif
> +	return xt_register_matches(cgroup_mt_reg, ARRAY_SIZE(cgroup_mt_reg));
>  }
>  
>  static void __exit cgroup_mt_exit(void)
>  {
> -	xt_unregister_match(&cgroup_mt_reg);
> +	xt_unregister_matches(cgroup_mt_reg, ARRAY_SIZE(cgroup_mt_reg));
>  }
>  
>  module_init(cgroup_mt_init);
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
  2015-03-27  0:14   ` Pablo Neira Ayuso
@ 2015-03-27  2:10     ` Pablo Neira Ayuso
  2015-03-27  9:48       ` Daniel Borkmann
  2015-03-27  8:40     ` Daniel Borkmann
  1 sibling, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-27  2:10 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel

On Fri, Mar 27, 2015 at 01:14:08AM +0100, Pablo Neira Ayuso wrote:
> > diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> > index 971cd75..044bd22 100644
> > --- a/net/netfilter/Kconfig
> > +++ b/net/netfilter/Kconfig
> > @@ -960,8 +960,13 @@ config NETFILTER_XT_MATCH_BPF
> >  
> >  config NETFILTER_XT_MATCH_CGROUP
> >  	tristate '"control group" match support'
> > +	depends on NETFILTER_XTABLES
> 
> why this? I think NETFILTER_ADVANCED is sufficient.
> 
> >  	depends on NETFILTER_ADVANCED
> > +	depends on !NF_CONNTRACK || NF_CONNTRACK
> 
> why conntrack?
> 
> > +	depends on (IPV6 || IPV6=n)
> 
> Do we depend on any ipv6 symbol?
> 
> >  	depends on CGROUPS
> > +	select NF_DEFRAG_IPV4
> > +	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
> 
> No need for defrag either.

Wait, now I see why you need this.

What started a simple cgroup match extension is turning into a more
complicated thing. And you want to do firewalling with this, which
doesn't work for other socket families than TCP and UDP.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
  2015-03-27  2:10     ` Pablo Neira Ayuso
@ 2015-03-27  9:48       ` Daniel Borkmann
  2015-03-27 10:47         ` Pablo Neira Ayuso
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-27  9:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel

On 03/27/2015 03:10 AM, Pablo Neira Ayuso wrote:
> On Fri, Mar 27, 2015 at 01:14:08AM +0100, Pablo Neira Ayuso wrote:
>>> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
>>> index 971cd75..044bd22 100644
>>> --- a/net/netfilter/Kconfig
>>> +++ b/net/netfilter/Kconfig
>>> @@ -960,8 +960,13 @@ config NETFILTER_XT_MATCH_BPF
>>>
>>>   config NETFILTER_XT_MATCH_CGROUP
>>>   	tristate '"control group" match support'
>>> +	depends on NETFILTER_XTABLES
>>
>> why this? I think NETFILTER_ADVANCED is sufficient.
>>
>>>   	depends on NETFILTER_ADVANCED
>>> +	depends on !NF_CONNTRACK || NF_CONNTRACK
>>
>> why conntrack?
>>
>>> +	depends on (IPV6 || IPV6=n)
>>
>> Do we depend on any ipv6 symbol?
>>
>>>   	depends on CGROUPS
>>> +	select NF_DEFRAG_IPV4
>>> +	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES
>>
>> No need for defrag either.
>
> Wait, now I see why you need this.
>
> What started a simple cgroup match extension is turning into a more
> complicated thing. And you want to do firewalling with this, which
> doesn't work for other socket families than TCP and UDP.

Right, so for me it started out as a simple outgoing match extension
for skb->sk and this should be protocol agnostic, for example, SCTP
sets the skb owner in its output path, so the cgroup id would work
there, too. (That should be the case for every protocol that's doing
proper socket accounting.)

People have since then seen a use case for accounting, so support
was added for local-in (which we try to fix), where it's being used
in Tizen OS apparently, but the idea for realizing a per-application,
per-container, ... firewall for both filtering and accounting sounds
appealing to me.

So, I'd like to get this right for iptables and am also eager to help
out fixing this in nft. I was thinking that if we add --lookup-sock
in a second revision, the man-page would _clearly_ need to describe
that when being used w/o the lookup option, it only works for protocols
making use of early demuxes on ingress, and when being being used with
the lookup option, we would have TCP/UDP covered on ingress. Would
that be fine as a start to have this documented? Or, would nft also
require niche protocols like SCTP/DCCP to be supported for the lookup
up-front?

What I've seen so far is, that besides the basic xt_sctp matching,
the perhaps biggest request SCTP users might have, is that association
tracking currently is missing for the conntracker and ipvs to make
their multi-homed use-cases work, but I guess I'm starting to get
off-topic. :)

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
  2015-03-27  9:48       ` Daniel Borkmann
@ 2015-03-27 10:47         ` Pablo Neira Ayuso
  2015-03-27 12:02           ` Daniel Borkmann
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-27 10:47 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel

On Fri, Mar 27, 2015 at 10:48:51AM +0100, Daniel Borkmann wrote:
> On 03/27/2015 03:10 AM, Pablo Neira Ayuso wrote:
> >
> > What started a simple cgroup match extension is turning into a more
> > complicated thing. And you want to do firewalling with this, which
> > doesn't work for other socket families than TCP and UDP.
> 
> Right, so for me it started out as a simple outgoing match extension
> for skb->sk and this should be protocol agnostic, for example, SCTP
> sets the skb owner in its output path, so the cgroup id would work
> there, too. (That should be the case for every protocol that's doing
> proper socket accounting.)
> 
> People have since then seen a use case for accounting, so support
> was added for local-in (which we try to fix), where it's being used
> in Tizen OS apparently, but the idea for realizing a per-application,
> per-container, ... firewall for both filtering and accounting sounds
> appealing to me.

Yes, but the more I look into this the more I'm convinced that the nf
socket infrastructure was not designed for generic socket-based
firewalling.

This is basically there for TPROXY and very simple socket filtering
scenarios. This really needs more thinking.

> So, I'd like to get this right for iptables and am also eager to help
> out fixing this in nft.

I'm just going to send two-liner patch for nft to get this working at
least in the limited supported scenarios that we already have.

> I was thinking that if we add --lookup-sock in a second revision,
> the man-page would _clearly_ need to describe that when being used
> w/o the lookup option, it only works for protocols making use of
> early demuxes on ingress, and when being being used with the lookup
> option, we would have TCP/UDP covered on ingress.

Not even that, it seems to me this will not work for UDP multicast
either.

> Would that be fine as a start to have this documented?

I think this is not going to work the way users expect, so I would
either schedule INPUT cgroup filtering for removal (to get this
aligned with the owner match) or document how limited this is.

> Or, would nft also require niche protocols like SCTP/DCCP to be
> supported for the lookup up-front?
>
> What I've seen so far is, that besides the basic xt_sctp matching,
> the perhaps biggest request SCTP users might have, is that association
> tracking currently is missing for the conntracker and ipvs to make
> their multi-homed use-cases work, but I guess I'm starting to get
> off-topic. :)

Yes, that's a different front ;-).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
  2015-03-27 10:47         ` Pablo Neira Ayuso
@ 2015-03-27 12:02           ` Daniel Borkmann
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-27 12:02 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel

On 03/27/2015 11:47 AM, Pablo Neira Ayuso wrote:
> On Fri, Mar 27, 2015 at 10:48:51AM +0100, Daniel Borkmann wrote:
>> On 03/27/2015 03:10 AM, Pablo Neira Ayuso wrote:
>>>
>>> What started a simple cgroup match extension is turning into a more
>>> complicated thing. And you want to do firewalling with this, which
>>> doesn't work for other socket families than TCP and UDP.
>>
>> Right, so for me it started out as a simple outgoing match extension
>> for skb->sk and this should be protocol agnostic, for example, SCTP
>> sets the skb owner in its output path, so the cgroup id would work
>> there, too. (That should be the case for every protocol that's doing
>> proper socket accounting.)
>>
>> People have since then seen a use case for accounting, so support
>> was added for local-in (which we try to fix), where it's being used
>> in Tizen OS apparently, but the idea for realizing a per-application,
>> per-container, ... firewall for both filtering and accounting sounds
>> appealing to me.
>
> Yes, but the more I look into this the more I'm convinced that the nf
> socket infrastructure was not designed for generic socket-based
> firewalling.
>
> This is basically there for TPROXY and very simple socket filtering
> scenarios. This really needs more thinking.

Okay, I understand, if we can come up with a better, more generic solution
to this use-case, I'm all for it.

>> So, I'd like to get this right for iptables and am also eager to help
>> out fixing this in nft.
>
> I'm just going to send two-liner patch for nft to get this working at
> least in the limited supported scenarios that we already have.

Okay.

Thanks again,
Daniel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups
  2015-03-27  0:14   ` Pablo Neira Ayuso
  2015-03-27  2:10     ` Pablo Neira Ayuso
@ 2015-03-27  8:40     ` Daniel Borkmann
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-27  8:40 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel

On 03/27/2015 01:14 AM, Pablo Neira Ayuso wrote:
> On Thu, Mar 26, 2015 at 08:14:48PM +0100, Daniel Borkmann wrote:
> [...]
>> However, that as-is only partially works, i.e. it works for the case
>> of established TCP and connected UDP sockets when early demux is
>> enabled, but not for various other ingress scenarios: i) early demux
>> disabled (sysctl), ii) udp on unconnected sockets, iii) tcp and udp
>> (any kind) on localhost communications.
>
> This extension has been around since Dec 2013, I'd rather see a new
> revision that includes an option --lookup-sock.

Okay, I'm totally fine with that.

Please note, the commit I'm trying to fix is _not_ the original
xt_cgroup inclusion, but rather a00e76349f35 ("netfilter: x_tables:
allow to use cgroup match for LOCAL_IN nf hooks"), which is March
2014, fwiw.

> More comments below.
...
>> +#ifdef XT_HAVE_IPV6
>
> Please, kill this custom XT_HAVE_IPV6 and now use IS_ENABLED(NF_SOCK_IPV6)

Will do, thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 0/2] xt_cgroups fix
  2015-03-26 19:14 [PATCH nf-next v2 0/2] xt_cgroups fix Daniel Borkmann
  2015-03-26 19:14 ` [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
  2015-03-26 19:14 ` [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
@ 2015-03-27  0:40 ` Pablo Neira Ayuso
  2015-03-27  8:48   ` Daniel Borkmann
  2 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2015-03-27  0:40 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: daniel, fw, a.perevalov, netfilter-devel

On Thu, Mar 26, 2015 at 08:14:46PM +0100, Daniel Borkmann wrote:
> Hi Pablo,
> 
> here's a possible fix for xt_cgroups that was previously reported
> by Daniel Mack.
> 
> I respinned the set based on your previous feedback wrt tw sockets.
> 
> The first patch refactors common helpers, which is later on being
> used by the actual fix. Please see individual patches for details.
> 
> I have rebased it against nf-next as in the previous version.

The existing cgroup support for nf_tables is quite broken (see patch
attached), that needs some care too. Would you also help us to get
that in good shape? It will be half way done after your patches.

This makes me think that you can place something generic to fetch the
sk_classid:

        static bool nf_sock_classid(u32 *sk_classid);

this would go in net/ipv4/netfilter/nf_sock_ipv4.c.

And also the ipv6 version:

static bool nf_sock6_classid(u32 *sk_classid);

so we can use the same function to fetch the sk_classid that can be
shared by xt and nft. Please, give it another spin, you can probably
come up with a better interface.

BTW, we also have two more families: inet and bridge. Inet should be
easy, it's basically a special family for dual-stack setups, 'inet'
chain see both ipv4 and ipv6 traffic, you have to use pkt->ops->pf to
pass this to right ipv4/ipv6 function.

See net/netfilter/nft_reject.c, net/ipv{4,6}/nft_reject_ipv{4,6}.c and
net/netfilter/nft_reject_inet.c for reference.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH nf-next v2 0/2] xt_cgroups fix
  2015-03-27  0:40 ` [PATCH nf-next v2 0/2] xt_cgroups fix Pablo Neira Ayuso
@ 2015-03-27  8:48   ` Daniel Borkmann
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2015-03-27  8:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: daniel, fw, a.perevalov, netfilter-devel

On 03/27/2015 01:40 AM, Pablo Neira Ayuso wrote:
> On Thu, Mar 26, 2015 at 08:14:46PM +0100, Daniel Borkmann wrote:
>> Hi Pablo,
>>
>> here's a possible fix for xt_cgroups that was previously reported
>> by Daniel Mack.
>>
>> I respinned the set based on your previous feedback wrt tw sockets.
>>
>> The first patch refactors common helpers, which is later on being
>> used by the actual fix. Please see individual patches for details.
>>
>> I have rebased it against nf-next as in the previous version.
>
> The existing cgroup support for nf_tables is quite broken (see patch
> attached), that needs some care too. Would you also help us to get
> that in good shape? It will be half way done after your patches.

Sure, I can look into that. If you're ok, I would look into that
after the xt_cgroup bits are carved out first.

> This makes me think that you can place something generic to fetch the
> sk_classid:
>
>          static bool nf_sock_classid(u32 *sk_classid);
>
> this would go in net/ipv4/netfilter/nf_sock_ipv4.c.
>
> And also the ipv6 version:
>
> static bool nf_sock6_classid(u32 *sk_classid);
>
> so we can use the same function to fetch the sk_classid that can be
> shared by xt and nft. Please, give it another spin, you can probably
> come up with a better interface.

Ok, will do. Thanks for the feedback, Pablo!

> BTW, we also have two more families: inet and bridge. Inet should be
> easy, it's basically a special family for dual-stack setups, 'inet'
> chain see both ipv4 and ipv6 traffic, you have to use pkt->ops->pf to
> pass this to right ipv4/ipv6 function.
>
> See net/netfilter/nft_reject.c, net/ipv{4,6}/nft_reject_ipv{4,6}.c and
> net/netfilter/nft_reject_inet.c for reference.
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-03-27 12:02 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-26 19:14 [PATCH nf-next v2 0/2] xt_cgroups fix Daniel Borkmann
2015-03-26 19:14 ` [PATCH nf-next v2 1/2] netfilter: x_tables: refactor lookup helpers from xt_socket Daniel Borkmann
2015-03-27  0:06   ` Pablo Neira Ayuso
2015-03-27  8:18     ` Daniel Borkmann
2015-03-26 19:14 ` [PATCH nf-next v2 2/2] netfilter: x_tables: fix cgroup's NF_INET_LOCAL_IN sk lookups Daniel Borkmann
2015-03-27  0:14   ` Pablo Neira Ayuso
2015-03-27  2:10     ` Pablo Neira Ayuso
2015-03-27  9:48       ` Daniel Borkmann
2015-03-27 10:47         ` Pablo Neira Ayuso
2015-03-27 12:02           ` Daniel Borkmann
2015-03-27  8:40     ` Daniel Borkmann
2015-03-27  0:40 ` [PATCH nf-next v2 0/2] xt_cgroups fix Pablo Neira Ayuso
2015-03-27  8:48   ` Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).