[PATCH/RFC 00/10] Transparent proxying patches version 4

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH/RFC 00/10] Transparent proxying patches version 4
@ 2007-01-03 16:33 KOVACS Krisztian
  2007-01-03 16:34 ` [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs KOVACS Krisztian
                   ` (12 more replies)
  0 siblings, 13 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:33 UTC (permalink / raw)
  To: netfilter-devel, netdev

The following set of patches implement transparent proxying support
loosely modeled on the Linux 2.2 transparent proxying functionality.

In the last few years we've been maintaining a set of patches
implementing Netfilter NAT to provide similar functionality. However,
as time passed, more and more bugs surfaced, some of which were not
possible to fix using that approach. Also, those patches required
modification of user-space application code and the "API" provided was
neither clean nor easy to use.

So instead of using NAT to dynamically redirect traffic to local
addresses, we now rely on "native" non-locally-bound sockets and do
early socket lookups for inbound IPv4 packets. These lookups are done
in a separate Netfilter/iptables module, so there are only negligible
performance implications of building transparent proxying support as a
module and then not loading it.

Small modifications were also necessary in IP/TCP/UDP core code to
support the Netfilter modules. All those have been functionally split
out into stand-alone patches among which there are no direct
dependencies. Among these changes are ones which I think might be
potentially risky, especially the core IPv4 routing code changes.

Also please note that at the moment only IPv4 support is implemented,
but opposed to the NAT-based approach taken by older TProxy versions
IPv6 support is possible this way.

Comments welcome...

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
@ 2007-01-03 16:34 ` KOVACS Krisztian
  2007-01-10  6:46   ` Patrick McHardy
  2007-01-03 16:34 ` [PATCH/RFC 02/10] Port redirection support for TCP KOVACS Krisztian
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:34 UTC (permalink / raw)
  To: netfilter-devel, netdev

The input path for non-local bound sockets requires diverting certain
packets locally, even if their destination IP address is not
considered local. We achieve this by assigning a specially crafted dst
entry to these skbs, and optionally also attaching a socket to the skb
so that the upper layer code does not need to redo the socket lookup.

We also have to be able to differentiate between these fake entries
and "real" entries in the cache: it is perfectly legal that the
diversion is done only for certain TCP or UDP packets and not for all
packets of the flow. Since these special dst entries are used only by
the iptables tproxy code, and that code uses exclusively these
entries, simply flagging these entries as DST_DIVERTED is OK. All
other cache lookup paths skip diverted entries, while our new
ip_divert_local() function uses exclusively diverted dst entries.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 include/net/dst.h   |    1 
 include/net/route.h |    2 +
 net/ipv4/route.c    |  106 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 62b7e75..72b712c 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -50,6 +50,7 @@ #define DST_NOXFRM		2
 #define DST_NOPOLICY		4
 #define DST_NOHASH		8
 #define DST_BALANCED            0x10
+#define DST_DIVERTED		0x20
 	unsigned long		lastuse;
 	unsigned long		expires;
 
diff --git a/include/net/route.h b/include/net/route.h
index 486e37a..ee52393 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -126,6 +126,8 @@ extern int		ip_rt_ioctl(unsigned int cmd
 extern void		ip_rt_get_source(u8 *src, struct rtable *rt);
 extern int		ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb);
 
+extern int		ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct sock *sk);
+
 struct in_ifaddr;
 extern void fib_add_ifaddr(struct in_ifaddr *);
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 2daa0dc..537b976 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -942,9 +942,11 @@ restart:
 	while ((rth = *rthp) != NULL) {
 #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
 		if (!(rth->u.dst.flags & DST_BALANCED) &&
+		    ((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & DST_DIVERTED)) &&
 		    compare_keys(&rth->fl, &rt->fl)) {
 #else
-		if (compare_keys(&rth->fl, &rt->fl)) {
+		if (((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & DST_DIVERTED)) &&
+		    compare_keys(&rth->fl, &rt->fl)) {
 #endif
 			/* Put it first */
 			*rthp = rth->u.rt_next;
@@ -1166,6 +1168,7 @@ void ip_rt_redirect(__be32 old_gw, __be3
 				if (rth->fl.fl4_dst != daddr ||
 				    rth->fl.fl4_src != skeys[i] ||
 				    rth->fl.oif != ikeys[k] ||
+				    (rth->u.dst.flags & DST_DIVERTED) ||
 				    rth->fl.iif != 0) {
 					rthp = &rth->u.rt_next;
 					continue;
@@ -1526,6 +1529,105 @@ static int ip_rt_bug(struct sk_buff *skb
 	return 0;
 }
 
+static void ip_divert_free_sock(struct sk_buff *skb)
+{
+	struct sock *sk = skb->sk;
+
+	skb->sk = NULL;
+	skb->destructor = NULL;
+	sock_put(sk);
+}
+
+int ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct sock *sk)
+{
+	struct iphdr *iph = skb->nh.iph;
+	struct rtable *rth, *rtres;
+	unsigned hash;
+	const int iif = in->dev->ifindex;
+	u_int8_t tos;
+	int err;
+
+	/* look up hash first */
+	tos = iph->tos & IPTOS_RT_MASK;
+	hash = rt_hash_code(iph->daddr, iph->saddr ^ (iif << 5));
+
+	rcu_read_lock();
+	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
+	     rth = rcu_dereference(rth->u.rt_next)) {
+		if (rth->fl.fl4_dst == iph->daddr &&
+		    rth->fl.fl4_src == iph->saddr &&
+		    rth->fl.iif == iif &&
+		    rth->fl.oif == 0 &&
+		    rth->fl.mark == skb->mark &&
+		    (rth->u.dst.flags & DST_DIVERTED) &&
+		    rth->fl.fl4_tos == tos) {
+			rth->u.dst.lastuse = jiffies;
+			dst_hold(&rth->u.dst);
+			rth->u.dst.__use++;
+			RT_CACHE_STAT_INC(in_hit);
+			rcu_read_unlock();
+
+			dst_release(skb->dst);
+			skb->dst = (struct dst_entry*)rth;
+
+			if (sk) {
+				sock_hold(sk);
+				skb->sk = sk;
+				skb->destructor = ip_divert_free_sock;
+			}
+
+			return 0;
+		}
+		RT_CACHE_STAT_INC(in_hlist_search);
+	}
+	rcu_read_unlock();
+
+	/* not found in cache, try to allocate a new dst entry */
+	rth = dst_alloc(&ipv4_dst_ops);
+	if (!rth)
+		return -ENOMEM;
+
+	rth->u.dst.output= ip_rt_bug;
+
+	atomic_set(&rth->u.dst.__refcnt, 1);
+	rth->u.dst.flags = DST_HOST | DST_DIVERTED;
+
+	if (in->cnf.no_policy)
+		rth->u.dst.flags |= DST_NOPOLICY;
+
+	rth->fl.fl4_dst = iph->daddr;
+	rth->rt_dst	= iph->daddr;
+	rth->fl.fl4_tos = iph->tos;
+	rth->fl.mark	= skb->mark;
+	rth->fl.fl4_src = iph->saddr;
+	rth->rt_src	= iph->saddr;
+	rth->rt_iif	=
+	rth->fl.iif	= skb->dev->ifindex;
+	rth->u.dst.dev	= &loopback_dev;
+	dev_hold(rth->u.dst.dev);
+	rth->idev	= in_dev_get(rth->u.dst.dev);
+	rth->rt_gateway = iph->daddr;
+	rth->rt_spec_dst= iph->daddr;
+	rth->u.dst.input= ip_local_deliver;
+	rth->rt_flags	= RTCF_LOCAL;
+	rth->rt_type	= RTN_LOCAL;
+
+	err = rt_intern_hash(hash, rth, &rtres);
+	if (err)
+		return err;
+
+	dst_release(skb->dst);
+	skb->dst = (struct dst_entry *) rth;
+
+	if (sk) {
+		sock_hold(sk);
+		skb->sk = sk;
+		skb->destructor = ip_divert_free_sock;
+	}
+
+	return 0;
+}
+
 /*
    We do not cache source address of outgoing interface,
    because it is used only by IP RR, TS and SRR options,
@@ -2104,6 +2206,7 @@ int ip_route_input(struct sk_buff *skb,
 		    rth->fl.fl4_src == saddr &&
 		    rth->fl.iif == iif &&
 		    rth->fl.oif == 0 &&
+		    !(rth->u.dst.flags & DST_DIVERTED) &&
 		    rth->fl.mark == skb->mark &&
 		    rth->fl.fl4_tos == tos) {
 			rth->u.dst.lastuse = jiffies;
@@ -3199,3 +3302,4 @@ #endif
 EXPORT_SYMBOL(__ip_select_ident);
 EXPORT_SYMBOL(ip_route_input);
 EXPORT_SYMBOL(ip_route_output_key);
+EXPORT_SYMBOL_GPL(ip_divert_local);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 02/10] Port redirection support for TCP
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
  2007-01-03 16:34 ` [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs KOVACS Krisztian
@ 2007-01-03 16:34 ` KOVACS Krisztian
  2007-01-03 16:35 ` [PATCH/RFC 03/10] Don't do the TCP socket lookup if we already have one attached KOVACS Krisztian
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:34 UTC (permalink / raw)
  To: netfilter-devel, netdev

Current TCP code relies on the local port of the listening socket
being the same as the destination address of the incoming
connection. Port redirection used by many transparent proxying
techniques obviously breaks this, so we have to store the original
destination port address.

This patch extends struct inet_request_sock and stores the incoming
destination port value there. It also modifies the handshake code to
use that value as the source port when sending reply packets.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 include/net/inet_sock.h         |    1 +
 include/net/tcp.h               |    1 +
 net/ipv4/inet_connection_sock.c |    2 ++
 net/ipv4/tcp_output.c           |    2 +-
 4 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index ce6da97..0bd167b 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -64,6 +64,7 @@ #if defined(CONFIG_IPV6) || defined(CONF
 #endif
 	__be32			loc_addr;
 	__be32			rmt_addr;
+	__be16			loc_port;
 	__be16			rmt_port;
 	u16			snd_wscale : 4, 
 				rcv_wscale : 4, 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b7d8317..08ea8f3 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -983,6 +983,7 @@ static inline void tcp_openreq_init(stru
 	ireq->acked = 0;
 	ireq->ecn_ok = 0;
 	ireq->rmt_port = skb->h.th->source;
+	ireq->loc_port = skb->h.th->dest;
 }
 
 extern void tcp_enter_memory_pressure(void);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 9d68837..889a487 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -502,6 +502,8 @@ struct sock *inet_csk_clone(struct sock
 		newicsk->icsk_bind_hash = NULL;
 
 		inet_sk(newsk)->dport = inet_rsk(req)->rmt_port;
+		inet_sk(newsk)->num = ntohs(inet_rsk(req)->loc_port);
+		inet_sk(newsk)->sport = inet_rsk(req)->loc_port;
 		newsk->sk_write_space = sk_stream_write_space;
 
 		newicsk->icsk_retransmits = 0;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 32c1a97..bb37048 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2132,7 +2132,7 @@ #endif
 	th->syn = 1;
 	th->ack = 1;
 	TCP_ECN_make_synack(req, th);
-	th->source = inet_sk(sk)->sport;
+	th->source = ireq->loc_port;
 	th->dest = ireq->rmt_port;
 	TCP_SKB_CB(skb)->seq = tcp_rsk(req)->snt_isn;
 	TCP_SKB_CB(skb)->end_seq = TCP_SKB_CB(skb)->seq + 1;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 03/10] Don't do the TCP socket lookup if we already have one attached
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
  2007-01-03 16:34 ` [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs KOVACS Krisztian
  2007-01-03 16:34 ` [PATCH/RFC 02/10] Port redirection support for TCP KOVACS Krisztian
@ 2007-01-03 16:35 ` KOVACS Krisztian
  2007-01-03 16:35 ` [PATCH/RFC 04/10] Don't do the UDP " KOVACS Krisztian
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:35 UTC (permalink / raw)
  To: netfilter-devel, netdev

TCP input code path looks up the TCP socket hash tables to find a
socket matching the incoming packet. However, as iptable_tproxy does
socket lookups early the skb may already have the appropriate
reference attached, in that case we steal that reference instead of
doing the lookup.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 net/ipv4/tcp_ipv4.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index bf7a224..7828aec 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1647,9 +1647,16 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	TCP_SKB_CB(skb)->flags	 = skb->nh.iph->tos;
 	TCP_SKB_CB(skb)->sacked	 = 0;
 
-	sk = __inet_lookup(&tcp_hashinfo, skb->nh.iph->saddr, th->source,
-			   skb->nh.iph->daddr, th->dest,
-			   inet_iif(skb));
+	if (unlikely(skb->sk)) {
+		/* steal reference */
+		sk = skb->sk;
+		skb->destructor = NULL;
+		skb->sk = NULL;
+	} else {
+		sk = __inet_lookup(&tcp_hashinfo, skb->nh.iph->saddr, th->source,
+				   skb->nh.iph->daddr, th->dest,
+				   inet_iif(skb));
+	}
 
 	if (!sk)
 		goto no_tcp_socket;

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 04/10] Don't do the UDP socket lookup if we already have one attached
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (2 preceding siblings ...)
  2007-01-03 16:35 ` [PATCH/RFC 03/10] Don't do the TCP socket lookup if we already have one attached KOVACS Krisztian
@ 2007-01-03 16:35 ` KOVACS Krisztian
  2007-01-03 16:36 ` [PATCH/RFC 05/10] Remove local address check on IP output KOVACS Krisztian
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:35 UTC (permalink / raw)
  To: netfilter-devel, netdev

UDP input code path looks up the UDP socket hash tables to find a
socket matching the incoming packet. However, as iptable_tproxy does
socket lookups early the skb may already have the appropriate
reference attached, in that case we steal that reference instead of
doing the lookup.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 net/ipv4/udp.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index cfff930..1b348f5 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1225,8 +1225,15 @@ int __udp4_lib_rcv(struct sk_buff *skb,
 	if(rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
 		return __udp4_lib_mcast_deliver(skb, uh, saddr, daddr, udptable);
 
-	sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
-			       skb->dev->ifindex, udptable        );
+	if (skb->sk) {
+		/* steal reference */
+		sk = skb->sk;
+		skb->destructor = NULL;
+		skb->sk = NULL;
+	} else {
+		sk = __udp4_lib_lookup(saddr, uh->source, daddr, uh->dest,
+				       skb->dev->ifindex, udptable        );
+	}
 
 	if (sk != NULL) {
 		int ret = udp_queue_rcv_skb(sk, skb);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 05/10] Remove local address check on IP output
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (3 preceding siblings ...)
  2007-01-03 16:35 ` [PATCH/RFC 04/10] Don't do the UDP " KOVACS Krisztian
@ 2007-01-03 16:36 ` KOVACS Krisztian
  2007-01-10  6:47   ` Patrick McHardy
  2007-01-03 16:36 ` [PATCH/RFC 06/10] Create a tproxy flag in struct sk_buff KOVACS Krisztian
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:36 UTC (permalink / raw)
  To: netfilter-devel, netdev

ip_route_output() contains a check to make sure that no flows with
non-local source IP addresses are routed. Unfortunately this check
makes it completely impossible to use non-local bound sockets as no
outbound packets will make through the stack.

This patch moves the interface lookup to the multicast-specific code
path as that is the only real user of the interface data looked up.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 net/ipv4/route.c |   13 +++++--------
 1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 537b976..bb1158a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2498,11 +2498,6 @@ #endif
 		    ZERONET(oldflp->fl4_src))
 			goto out;
 
-		/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
-		dev_out = ip_dev_find(oldflp->fl4_src);
-		if (dev_out == NULL)
-			goto out;
-
 		/* I removed check for oif == dev_out->oif here.
 		   It was wrong for two reasons:
 		   1. ip_dev_find(saddr) can return wrong iface, if saddr is
@@ -2528,12 +2523,14 @@ #endif
 			   Luckily, this hack is good workaround.
 			 */
 
+			/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
+			dev_out = ip_dev_find(oldflp->fl4_src);
+			if (dev_out == NULL)
+				goto out;
+
 			fl.oif = dev_out->ifindex;
 			goto make_route;
 		}
-		if (dev_out)
-			dev_put(dev_out);
-		dev_out = NULL;
 	}
 
 

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 06/10] Create a tproxy flag in struct sk_buff
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (4 preceding siblings ...)
  2007-01-03 16:36 ` [PATCH/RFC 05/10] Remove local address check on IP output KOVACS Krisztian
@ 2007-01-03 16:36 ` KOVACS Krisztian
  2007-01-03 16:37 ` [PATCH/RFC 07/10] Export UDP socket lookup function KOVACS Krisztian
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:36 UTC (permalink / raw)
  To: netfilter-devel, netdev

We would like to be able to match on whether or not a given packet has
been diverted by tproxy. To make this possible we need a flag in
sk_buff.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 include/linux/skbuff.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..6d7f5c7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -284,7 +284,8 @@ struct sk_buff {
 				nfctinfo:3;
 	__u8			pkt_type:3,
 				fclone:2,
-				ipvs_property:1;
+				ipvs_property:1,
+				ip_tproxy:1;
 	__be16			protocol;
 
 	void			(*destructor)(struct sk_buff *skb);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 07/10] Export UDP socket lookup function
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (5 preceding siblings ...)
  2007-01-03 16:36 ` [PATCH/RFC 06/10] Create a tproxy flag in struct sk_buff KOVACS Krisztian
@ 2007-01-03 16:37 ` KOVACS Krisztian
  2007-01-03 16:37 ` [PATCH/RFC 08/10] iptables tproxy table KOVACS Krisztian
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:37 UTC (permalink / raw)
  To: netfilter-devel, netdev

The iptables tproxy code has to be able to do UDP socket hash lookups,
so we have to provide an exported lookup function for this purpose.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 include/net/udp.h |    4 ++++
 net/ipv4/udp.c    |    8 ++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 1b921fa..ea5aa31 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -141,6 +141,10 @@ extern int 	udp_lib_setsockopt(struct so
 				   char __user *optval, int optlen,
 				   int (*push_pending_frames)(struct sock *));
 
+extern struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+				    __be32 daddr, __be16 dport,
+				    int dif);
+
 DECLARE_SNMP_STAT(struct udp_mib, udp_statistics);
 /*
  * 	SNMP statistics for UDP and UDP-Lite
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1b348f5..a44d3d3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -284,6 +284,14 @@ static struct sock *__udp4_lib_lookup(__
 	return result;
 }
 
+struct sock *udp4_lib_lookup(__be32 saddr, __be16 sport,
+			     __be32 daddr, __be16 dport,
+			     int dif)
+{
+	return __udp4_lib_lookup(saddr, sport, daddr, dport, dif, udp_hash);
+}
+EXPORT_SYMBOL_GPL(udp4_lib_lookup);
+
 static inline struct sock *udp_v4_mcast_next(struct sock *sk,
 					     __be16 loc_port, __be32 loc_addr,
 					     __be16 rmt_port, __be32 rmt_addr,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 08/10] iptables tproxy table
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (6 preceding siblings ...)
  2007-01-03 16:37 ` [PATCH/RFC 07/10] Export UDP socket lookup function KOVACS Krisztian
@ 2007-01-03 16:37 ` KOVACS Krisztian
  2007-01-10 12:40   ` Patrick McHardy
  2007-01-03 16:38 ` [PATCH/RFC 09/10] iptables TPROXY target KOVACS Krisztian
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:37 UTC (permalink / raw)
  To: netfilter-devel, netdev

The iptables tproxy table registers a new hook on PRE_ROUTING and for
each incoming TCP/UDP packet performs as follows:

1. Does a TCP/UDP socket hash lookup to decide whether or not the packet
   is sent to a non-local bound socket. If a matching socket is found
   and the socket has the IP_FREEBIND socket option enabled the skb is
   diverted locally and the socket reference is stored in the skb.

2. If no matching socket was found, the PREROUTING chain of the
   iptables tproxy table is consulted. Matching rules with the TPROXY
   target can do transparent redirection here. (In this case it is not
   necessary to have the IP_FREEBIND socket option enabled for the
   target socket, redirection takes place even for "regular"
   sockets. This way no modification of the application is necessary.)

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 include/linux/netfilter_ipv4/ip_tproxy.h |   20 ++
 net/ipv4/netfilter/Kconfig               |   10 +
 net/ipv4/netfilter/Makefile              |    1 
 net/ipv4/netfilter/iptable_tproxy.c      |  253 ++++++++++++++++++++++++++++++
 4 files changed, 284 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter_ipv4/ip_tproxy.h b/include/linux/netfilter_ipv4/ip_tproxy.h
new file mode 100644
index 0000000..ae890e3
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ip_tproxy.h
@@ -0,0 +1,20 @@
+#ifndef _IP_TPROXY_H
+#define _IP_TPROXY_H
+
+#include <linux/types.h>
+
+/* look up and get a reference to a matching socket */
+extern struct sock *
+ip_tproxy_get_sock(const u8 protocol,
+		   const __be32 saddr, const __be32 daddr,
+		   const __be16 sport, const __be16 dport,
+		   const struct net_device *in);
+
+/* divert skb to a given socket */
+extern int
+ip_tproxy_do_divert(struct sk_buff *skb,
+		    const struct sock *sk,
+		    const int require_freebind,
+		    const struct net_device *in);
+
+#endif
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index f6026d4..312b0ef 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -652,6 +652,16 @@ config IP_NF_RAW
 	  If you want to compile it as a module, say M here and read
 	  <file:Documentation/modules.txt>.  If unsure, say `N'.
 
+# tproxy table
+config IP_NF_TPROXY
+	tristate "Transparent proxying"
+	depends on IP_NF_IPTABLES
+	help
+	  Transparent proxying. For more information see
+	  http://www.balabit.com/downloads/tproxy.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 # ARP tables
 config IP_NF_ARPTABLES
 	tristate "ARP tables support"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index 15e741a..aa57ce4 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -71,6 +71,7 @@ obj-$(CONFIG_IP_NF_MANGLE) += iptable_ma
 obj-$(CONFIG_IP_NF_NAT) += iptable_nat.o
 obj-$(CONFIG_NF_NAT) += iptable_nat.o
 obj-$(CONFIG_IP_NF_RAW) += iptable_raw.o
+obj-$(CONFIG_IP_NF_TPROXY) += iptable_tproxy.o
 
 # matches
 obj-$(CONFIG_IP_NF_MATCH_IPRANGE) += ipt_iprange.o
diff --git a/net/ipv4/netfilter/iptable_tproxy.c b/net/ipv4/netfilter/iptable_tproxy.c
new file mode 100644
index 0000000..6049c83
--- /dev/null
+++ b/net/ipv4/netfilter/iptable_tproxy.c
@@ -0,0 +1,253 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/version.h>
+#include <linux/module.h>
+
+#include <linux/sysctl.h>
+#include <linux/vmalloc.h>
+#include <linux/net.h>
+#include <linux/slab.h>
+#include <linux/if.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/netdevice.h>
+#include <linux/inetdevice.h>
+#include <linux/time.h>
+#include <linux/in.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+#include <net/sock.h>
+#include <net/inet_sock.h>
+#include <asm/uaccess.h>
+
+#include <linux/netfilter.h>
+#include <linux/netfilter_ipv4.h>
+#include <linux/netfilter_ipv4/ip_tables.h>
+
+#define TPROXY_VALID_HOOKS (1 << NF_IP_PRE_ROUTING)
+
+#if 0
+#define DEBUGP printk
+#else
+#define DEBUGP(f, args...)
+#endif
+
+static struct
+{
+	struct ipt_replace repl;
+	struct ipt_standard entries[2];
+	struct ipt_error term;
+} initial_table __initdata = {
+	.repl = {
+		.name = "tproxy",
+		.valid_hooks = TPROXY_VALID_HOOKS,
+		.num_entries = 2,
+		.size = sizeof(struct ipt_standard) + sizeof(struct ipt_error),
+		.hook_entry = {
+			[NF_IP_PRE_ROUTING] = 0 },
+		.underflow = {
+			[NF_IP_PRE_ROUTING] = 0 },
+	},
+	.entries = {
+		/* PRE_ROUTING */
+		{
+			.entry = {
+				.target_offset = sizeof(struct ipt_entry),
+				.next_offset = sizeof(struct ipt_standard),
+			},
+			.target = {
+				.target = {
+					.u = {
+						.target_size = IPT_ALIGN(sizeof(struct ipt_standard_target)),
+					},
+				},
+				.verdict = -NF_ACCEPT - 1,
+			},
+		},
+	},
+	/* ERROR */
+	.term = {
+		.entry = {
+			.target_offset = sizeof(struct ipt_entry),
+			.next_offset = sizeof(struct ipt_error),
+		},
+		.target = {
+			.target = {
+				.u = {
+					.user = {
+						.target_size = IPT_ALIGN(sizeof(struct ipt_error_target)),
+						.name = IPT_ERROR_TARGET,
+					},
+				},
+			},
+			.errorname = "ERROR",
+		},
+	}
+};
+
+static struct ipt_table tproxy_table = {
+	.name		= "tproxy",
+	.valid_hooks	= TPROXY_VALID_HOOKS,
+	.lock		= RW_LOCK_UNLOCKED,
+	.me		= THIS_MODULE,
+	.af		= AF_INET,
+};
+
+struct sock *
+ip_tproxy_get_sock(const u8 protocol,
+		   const __be32 saddr, const __be32 daddr,
+		   const __be16 sport, const __be16 dport,
+		   const struct net_device *in)
+{
+	struct sock *sk = NULL;
+
+	/* look up socket */
+	switch (protocol) {
+	case IPPROTO_TCP:
+		sk = __inet_lookup(&tcp_hashinfo,
+				   saddr, sport, daddr, sport,
+				   in->ifindex);
+		break;
+	case IPPROTO_UDP:
+		sk = udp4_lib_lookup(saddr, sport, daddr, dport,
+				     in->ifindex);
+		break;
+	default:
+		WARN_ON(1);
+	}
+
+	return sk;
+}
+EXPORT_SYMBOL_GPL(ip_tproxy_get_sock);
+
+int
+ip_tproxy_do_divert(struct sk_buff *skb, struct sock *sk,
+		    const int require_freebind,
+		    const struct net_device *in)
+{
+	const struct inet_sock *inet = inet_sk(sk);
+	struct in_device *indev;
+
+	if (unlikely(inet == NULL))
+		return -EINVAL;
+
+	if (!require_freebind || inet->freebind) {
+		indev = in_dev_get(in);
+		if (indev == NULL)
+			return -ENODEV;
+
+		skb->ip_tproxy = 1;
+
+		ip_divert_local(skb, indev, sk);
+		in_dev_put(indev);
+
+		DEBUGP(KERN_DEBUG "IP_TPROXY: diverted to socket %p\n", sk);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(ip_tproxy_do_divert);
+
+static unsigned int
+ip_tproxy_prerouting(unsigned int hooknum,
+		     struct sk_buff **pskb,
+		     const struct net_device *in,
+		     const struct net_device *out,
+		     int (*okfn)(struct sk_buff *))
+{
+	int verdict = NF_ACCEPT;
+	struct sk_buff *skb = *pskb;
+	u8 protocol = skb->nh.iph->protocol;
+	struct sock *sk = NULL;
+	const struct iphdr *iph = (*pskb)->nh.iph;
+	struct udphdr _hdr, *hp;
+
+	/* TCP and UDP only */
+	if ((protocol != IPPROTO_TCP) && (protocol != IPPROTO_UDP))
+		return NF_ACCEPT;
+
+	if (in == NULL)
+		return NF_ACCEPT;
+
+	if ((skb->dst != NULL) || (skb->ip_tproxy == 1))
+		return NF_ACCEPT;
+
+	hp = skb_header_pointer(skb, skb->nh.iph->ihl * 4, sizeof(_hdr), &_hdr);
+	if (hp == NULL) {
+		DEBUGP(KERN_DEBUG "IP_TPROXY: ip_tproxy_fn(): "
+		       "failed to get protocol header\n");
+		return NF_DROP;
+	}
+
+	sk = ip_tproxy_get_sock(iph->protocol,
+				iph->saddr, iph->daddr,
+				hp->source, hp->dest, in);
+	if (sk) {
+		if (ip_tproxy_do_divert(skb, sk, 1, in) < 0) {
+			DEBUGP(KERN_DEBUG "IP_TPROXY: divert failed, dropping packet\n");
+			verdict = NF_DROP;
+		}
+		sock_put(sk);
+	} else {
+		verdict = ipt_do_table(pskb, hooknum, in, out, &tproxy_table);
+	}
+
+	return verdict;
+}
+
+static struct nf_hook_ops ip_tproxy_pre_ops = {
+	.hook		= ip_tproxy_prerouting,
+	.owner		= THIS_MODULE,
+	.pf		= PF_INET,
+	.hooknum	= NF_IP_PRE_ROUTING,
+	.priority	= -130
+};
+
+static int __init init(void)
+{
+	int ret;
+
+	ret = ipt_register_table(&tproxy_table, &initial_table.repl);
+	if (ret < 0) {
+		printk("IP_TPROXY: can't register tproxy table.\n");
+		return ret;
+	}
+
+	ret = nf_register_hook(&ip_tproxy_pre_ops);
+	if (ret < 0) {
+		printk("IP_TPROXY: can't register prerouting hook.\n");
+		goto clean_table;
+	}
+
+	printk("IP_TPROXY: Transparent proxy support initialized, version 4.0.0\n"
+	       "IP_TPROXY: Copyright (c) 2006-2007 BalaBit IT Ltd.\n");
+
+	return ret;
+
+ clean_table:
+	ipt_unregister_table(&tproxy_table);
+	return ret;
+}
+
+static void __exit fini(void)
+{
+	nf_unregister_hook(&ip_tproxy_pre_ops);
+	ipt_unregister_table(&tproxy_table);
+}
+
+module_init(init);
+module_exit(fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Krisztian Kovacs <hidden@balabit.hu>");
+MODULE_DESCRIPTION("iptables transparent proxy table");

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 09/10] iptables TPROXY target
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (7 preceding siblings ...)
  2007-01-03 16:37 ` [PATCH/RFC 08/10] iptables tproxy table KOVACS Krisztian
@ 2007-01-03 16:38 ` KOVACS Krisztian
  2007-01-10 12:45   ` Patrick McHardy
  2007-01-03 16:38 ` [PATCH/RFC 10/10] iptables tproxy match KOVACS Krisztian
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:38 UTC (permalink / raw)
  To: netfilter-devel, netdev

The TPROXY target implements redirection of non-local TCP/UDP traffic
to local sockets. It is simply a wrapper around functionality exported
from iptable_tproxy.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 include/linux/netfilter_ipv4/ipt_TPROXY.h |    9 +++
 net/ipv4/netfilter/Kconfig                |   11 +++
 net/ipv4/netfilter/Makefile               |    1 
 net/ipv4/netfilter/ipt_TPROXY.c           |  103 +++++++++++++++++++++++++++++
 4 files changed, 124 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter_ipv4/ipt_TPROXY.h b/include/linux/netfilter_ipv4/ipt_TPROXY.h
new file mode 100644
index 0000000..d05c956
--- /dev/null
+++ b/include/linux/netfilter_ipv4/ipt_TPROXY.h
@@ -0,0 +1,9 @@
+#ifndef _IPT_TPROXY_H_target
+#define _IPT_TPROXY_H_target
+
+struct ipt_tproxy_target_info {
+	u_int16_t lport;
+	u_int32_t laddr;
+};
+
+#endif
diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig
index 312b0ef..7f76ab6 100644
--- a/net/ipv4/netfilter/Kconfig
+++ b/net/ipv4/netfilter/Kconfig
@@ -662,6 +662,17 @@ config IP_NF_TPROXY
 
 	  To compile it as a module, choose M here.  If unsure, say N.
 
+config IP_NF_TARGET_TPROXY
+	tristate "TPROXY target support"
+	depends on IP_NF_TPROXY
+	help
+	  This option adds a `TPROXY' target, which is somewhat similar to
+	  REDIRECT.  It can only be used in the tproxy table and is useful
+	  to redirect traffic to a transparent proxy.  It does _not_ depend
+	  on Netfilter connection tracking.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 # ARP tables
 config IP_NF_ARPTABLES
 	tristate "ARP tables support"
diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile
index aa57ce4..851da93 100644
--- a/net/ipv4/netfilter/Makefile
+++ b/net/ipv4/netfilter/Makefile
@@ -97,6 +97,7 @@ obj-$(CONFIG_IP_NF_TARGET_ULOG) += ipt_U
 obj-$(CONFIG_IP_NF_TARGET_TCPMSS) += ipt_TCPMSS.o
 obj-$(CONFIG_IP_NF_TARGET_CLUSTERIP) += ipt_CLUSTERIP.o
 obj-$(CONFIG_IP_NF_TARGET_TTL) += ipt_TTL.o
+obj-$(CONFIG_IP_NF_TARGET_TPROXY) += ipt_TPROXY.o
 
 # generic ARP tables
 obj-$(CONFIG_IP_NF_ARPTABLES) += arp_tables.o
diff --git a/net/ipv4/netfilter/ipt_TPROXY.c b/net/ipv4/netfilter/ipt_TPROXY.c
new file mode 100644
index 0000000..6f64717
--- /dev/null
+++ b/net/ipv4/netfilter/ipt_TPROXY.c
@@ -0,0 +1,103 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2006-2007 BalaBit IT Ltd.
+ * Author: Balazs Scheidler, Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/ip.h>
+#include <net/checksum.h>
+#include <net/udp.h>
+#include <net/inet_sock.h>
+
+#include <linux/netfilter_ipv4/ip_tables.h>
+#include <linux/netfilter_ipv4/ip_tproxy.h>
+#include <linux/netfilter_ipv4/ipt_TPROXY.h>
+
+static unsigned int
+target(struct sk_buff **pskb,
+       const struct net_device *in,
+       const struct net_device *out,
+       unsigned int hooknum,
+       const struct xt_target *target,
+       const void *targinfo)
+{
+	const struct iphdr *iph = (*pskb)->nh.iph;
+	unsigned int verdict = NF_ACCEPT;
+	struct sk_buff *skb = *pskb;
+	struct udphdr _hdr, *hp;
+	struct sock *sk;
+
+	/* TCP/UDP only */
+	if ((iph->protocol != IPPROTO_TCP) &&
+	    (iph->protocol != IPPROTO_UDP))
+		return NF_ACCEPT;
+
+	if (in == NULL)
+		return NF_ACCEPT;
+
+	if ((skb->dst != NULL) || (skb->ip_tproxy == 1))
+		return NF_ACCEPT;
+
+	hp = skb_header_pointer(*pskb, iph->ihl * 4, sizeof(_hdr), &_hdr);
+	if (hp == NULL)
+		return NF_DROP;
+
+	sk = ip_tproxy_get_sock(iph->protocol,
+				iph->saddr, iph->daddr,
+				hp->source, hp->dest, in);
+	if (sk != NULL) {
+		if (ip_tproxy_do_divert(skb, sk, 0, in) < 0)
+			verdict = NF_DROP;
+		sock_put(sk);
+	}
+
+	return verdict;
+}
+
+static int
+checkentry(const char *tablename,
+	   const void *e,
+	   const struct xt_target *target,
+           void *targinfo,
+           unsigned int hook_mask)
+{
+	/* checks are now done by the x_tables core based on
+	 * information specified in the ipt_target structure */
+	return 1;
+}
+
+static struct ipt_target ipt_tproxy_reg = {
+	.name		= "TPROXY",
+	.target		= target,
+	.targetsize	= sizeof(struct ipt_tproxy_target_info),
+	.table		= "tproxy",
+	.checkentry	= checkentry,
+	.me		= THIS_MODULE,
+};
+
+static int __init init(void)
+{
+	if (ipt_register_target(&ipt_tproxy_reg))
+		return -EINVAL;
+
+	return 0;
+}
+
+static void __exit fini(void)
+{
+	ipt_unregister_target(&ipt_tproxy_reg);
+}
+
+module_init(init);
+module_exit(fini);
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Krisztian Kovacs <hidden@balabit.hu>");
+MODULE_DESCRIPTION("Netfilter transparent proxy TPROXY target module.");

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH/RFC 10/10] iptables tproxy match
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (8 preceding siblings ...)
  2007-01-03 16:38 ` [PATCH/RFC 09/10] iptables TPROXY target KOVACS Krisztian
@ 2007-01-03 16:38 ` KOVACS Krisztian
  2007-01-03 17:23 ` [PATCH/RFC 00/10] Transparent proxying patches version 4 Evgeniy Polyakov
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-03 16:38 UTC (permalink / raw)
  To: netfilter-devel, netdev

Implements an iptables module which matches packets which have the
tproxy flag set, that is, packets diverted in the tproxy table.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>

---

 net/netfilter/Kconfig     |    9 +++++
 net/netfilter/Makefile    |    1 +
 net/netfilter/xt_tproxy.c |   77 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 0 deletions(-)

diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index 1b853c3..76c6f14 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -559,6 +559,15 @@ config NETFILTER_XT_MATCH_QUOTA
 	  If you want to compile it as a module, say M here and read
 	  <file:Documentation/modules.txt>.  If unsure, say `N'.
 
+config NETFILTER_XT_MATCH_TPROXY
+	tristate '"tproxy" match support'
+	depends on NETFILTER_XTABLES
+	help
+	  This option adds a `tproxy' match, which allows you to match
+	  packets which have been diverted to local sockets by TProxy.
+
+	  To compile it as a module, choose M here.  If unsure, say N.
+
 config NETFILTER_XT_MATCH_REALM
 	tristate  '"realm" match support'
 	depends on NETFILTER_XTABLES
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index 5dc5574..4a83585 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MARK) +=
 obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_TPROXY) += xt_tproxy.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
 obj-$(CONFIG_NETFILTER_XT_MATCH_SCTP) += xt_sctp.o
diff --git a/net/netfilter/xt_tproxy.c b/net/netfilter/xt_tproxy.c
new file mode 100644
index 0000000..53f8bee
--- /dev/null
+++ b/net/netfilter/xt_tproxy.c
@@ -0,0 +1,77 @@
+/*
+ * Transparent proxy support for Linux/iptables
+ *
+ * Copyright (c) 2007 BalaBit IT Ltd.
+ * Author: Krisztian Kovacs
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+
+#include <linux/netfilter/x_tables.h>
+
+static int
+match(const struct sk_buff *skb,
+      const struct net_device *in,
+      const struct net_device *out,
+      const struct xt_match *match,
+      const void *matchinfo,
+      int offset,
+      unsigned int protoff,
+      int *hotdrop)
+{
+	return skb->ip_tproxy;
+}
+
+static int
+check(const char *tablename,
+      const void *entry,
+      const struct xt_match *match,
+      void *matchinfo,
+      unsigned int hook_mask)
+{
+	return 1;
+}
+
+static struct xt_match tproxy_matches[] = {
+	{
+		.name		= "tproxy",
+		.match		= match,
+		.matchsize	= 0,
+		.checkentry	= check,
+		.family		= AF_INET,
+		.me		= THIS_MODULE,
+	},
+	{
+		.name		= "tproxy",
+		.match		= match,
+		.matchsize	= 0,
+		.checkentry	= check,
+		.family		= AF_INET6,
+		.me		= THIS_MODULE,
+	},
+};
+
+static int __init xt_tproxy_init(void)
+{
+	return xt_register_matches(tproxy_matches, ARRAY_SIZE(tproxy_matches));
+}
+
+static void __exit xt_tproxy_fini(void)
+{
+	xt_unregister_matches(tproxy_matches, ARRAY_SIZE(tproxy_matches));
+}
+
+module_init(xt_tproxy_init);
+module_exit(xt_tproxy_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Krisztian Kovacs <hidden@balabit.hu>");
+MODULE_DESCRIPTION("iptables tproxy match module");
+MODULE_ALIAS("ipt_tproxy");
+MODULE_ALIAS("ip6t_tproxy");

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (9 preceding siblings ...)
  2007-01-03 16:38 ` [PATCH/RFC 10/10] iptables tproxy match KOVACS Krisztian
@ 2007-01-03 17:23 ` Evgeniy Polyakov
  2007-01-08 20:30   ` KOVACS Krisztian
  2007-01-03 19:33 ` Lennert Buytenhek
  2007-01-07 14:11 ` Harald Welte
  12 siblings, 1 reply; 35+ messages in thread
From: Evgeniy Polyakov @ 2007-01-03 17:23 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, netdev

On Wed, Jan 03, 2007 at 05:33:57PM +0100, KOVACS Krisztian (hidden@balabit.hu) wrote:
> The following set of patches implement transparent proxying support
> loosely modeled on the Linux 2.2 transparent proxying functionality.
> 
> In the last few years we've been maintaining a set of patches
> implementing Netfilter NAT to provide similar functionality. However,
> as time passed, more and more bugs surfaced, some of which were not
> possible to fix using that approach. Also, those patches required
> modification of user-space application code and the "API" provided was
> neither clean nor easy to use.
> 
> So instead of using NAT to dynamically redirect traffic to local
> addresses, we now rely on "native" non-locally-bound sockets and do
> early socket lookups for inbound IPv4 packets. These lookups are done
> in a separate Netfilter/iptables module, so there are only negligible
> performance implications of building transparent proxying support as a
> module and then not loading it.

Out of curiosity, would you use netchannels [1] if the implementation
will be much broader? Since what you have created works exactly like
netchannels netfilter NAT target (although it does not change ports, but
it can be trivially extended), but without all existing netfilter
overhead and without hacks in core TCP/UDP/IP/route code.

Some quote for netfilter maillist on behalf of advertisement :)

Network channel is peer-to-peer protocol agnostic communication channel
between hardware and userspace. They allow to work directly with
network hardware from userspace without any kind of filtering or
processing from kernel.

Network channels are organized into single multidimensional trie, which
allows to perform route, netfilter and other types of lookups in one
traversal, since it is completely protocol agnostic.

Layering model does not exist in netchannels - layers are the way to 
design protocols, not implement them, thus actual protocol processing
happens on the ends of netchannel - for example in userspace (userspace
network stack), which improves cache locality and reduce overhead of
unneded layer crossing.

Netchannels featureset includes:
* multidimensional wildcards support
* RCU searching
* single multidimensional trie for different kinds of dataflows
* dedicated processing threads with possibility to
  schedule processing on different CPUs for those
  netchannel types which are not acked with processing context
* userspace netchannel backend (allows to receive
  packets to userspace), which can be used for:
	o high-performance sniffers
	o tun/tap device replacement
	o packet socket replacement (note, that netchannels steal
	  packets from main stack)
	o userspace network stack implementation [2]
	o own protocol stack implementaion (from VPN tunnels to TOE)
* netfilter netchannel backend (only NAT is supported as the most interesting
  user, NAT caches appropriate route, so essentially routing becomes part
  of the netchannel trie)

1. Netchannels homepage.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=netchannel

2. Userspace network stack.
http://tservice.net.ru/~s0mbre/old/?section=projects&item=unetstack

3. Netchannel vs. socket benchmarks.
http://tservice.net.ru/~s0mbre/blog/2006/10/26#2006_10_26
http://tservice.net.ru/~s0mbre/blog/2006/12/21#2006_12_21

4. Netchannels multidimensional wildcard trie testing.
 userspace test (scales to millions of end nodes)
   http://tservice.net.ru/~s0mbre/blog/2006/12/02#2006_12_02
 kernelspace test (tens of thousands of netchannels)
   http://tservice.net.ru/~s0mbre/blog/2006/12/21#2006_12_21_1

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (10 preceding siblings ...)
  2007-01-03 17:23 ` [PATCH/RFC 00/10] Transparent proxying patches version 4 Evgeniy Polyakov
@ 2007-01-03 19:33 ` Lennert Buytenhek
  2007-01-04 12:13   ` KOVACS Krisztian
  2007-01-07 14:11 ` Harald Welte
  12 siblings, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2007-01-03 19:33 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, netdev

On Wed, Jan 03, 2007 at 05:33:57PM +0100, KOVACS Krisztian wrote:

> The following set of patches implement transparent proxying support
> loosely modeled on the Linux 2.2 transparent proxying functionality.

In a transparent http proxy server I wrote a while ago, we used to use
tproxy for making outgoing connections appear to be originating from a
foreign IP address, but moved to inserting an iptables nat rule from
the proxy app every time an outgoing connection needs to be made, due
to the pain of having to patch in the tproxy patches every time we
needed to do a kernel update.

I'd love to see working tproxy functionality merged upstream for that
reason alone.

I'd also love to see the old tproxy API go away entirely.  It was
always a bit of a pain to use.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-03 19:33 ` Lennert Buytenhek
@ 2007-01-04 12:13   ` KOVACS Krisztian
  2007-01-04 12:16     ` Lennert Buytenhek
  0 siblings, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-04 12:13 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netfilter-devel, netdev


  Hi,

On Wednesday 03 January 2007 20:33, Lennert Buytenhek wrote:
> I'd also love to see the old tproxy API go away entirely.  It was
> always a bit of a pain to use.

  It's gone with these patches: all you need is to bind() to foreign 
addresses, like in the Linux 2.2 days.

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-04 12:13   ` KOVACS Krisztian
@ 2007-01-04 12:16     ` Lennert Buytenhek
  0 siblings, 0 replies; 35+ messages in thread
From: Lennert Buytenhek @ 2007-01-04 12:16 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, netdev

On Thu, Jan 04, 2007 at 01:13:27PM +0100, KOVACS Krisztian wrote:

> > I'd also love to see the old tproxy API go away entirely.  It was
> > always a bit of a pain to use.
> 
>   It's gone with these patches: all you need is to bind() to foreign 
> addresses, like in the Linux 2.2 days.

That's how I understood it.  Great.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
                   ` (11 preceding siblings ...)
  2007-01-03 19:33 ` Lennert Buytenhek
@ 2007-01-07 14:11 ` Harald Welte
  2007-01-07 16:11   ` Lennert Buytenhek
  12 siblings, 1 reply; 35+ messages in thread
From: Harald Welte @ 2007-01-07 14:11 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netdev, netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1669 bytes --]

Hi Krisztian!

On Wed, Jan 03, 2007 at 05:33:57PM +0100, KOVACS Krisztian wrote:
> So instead of using NAT to dynamically redirect traffic to local
> addresses, we now rely on "native" non-locally-bound sockets and do
> early socket lookups for inbound IPv4 packets. 

It's good to see a solid implementation of this 'old idea'.  

Just as a quick historical note to netdev:  This is the way how the
netfilter project  advised the balabit guys to implement fully
transparent proxy support, after having seen the complexity of the old
nat-based TPROXY patches.

So I personally support this patchset and vote for it to be included
(with whatever modifications netdev deems apropriate)

It might be that there now is the experimental netchannels system which
might provide an even better way for transparent proxy support.

However, ever since ip_tables was merged in the 2.3.x days, we have
lacked good support for transparent proxies.  Now that the first
incarnation of the NAT based TPROXY patch for 2.4.x had to be developed
and maintained out-of-tree for many years, I definitely think it's
better to merge the new, way less intrusive, patchset.  

Some interested party can work on a netchannels implementation later on,
but that's the next generation...

Cheers,
-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-07 14:11 ` Harald Welte
@ 2007-01-07 16:11   ` Lennert Buytenhek
  2007-01-07 23:58     ` Harald Welte
  0 siblings, 1 reply; 35+ messages in thread
From: Lennert Buytenhek @ 2007-01-07 16:11 UTC (permalink / raw)
  To: Harald Welte, KOVACS Krisztian, netfilter-devel, netdev

On Sun, Jan 07, 2007 at 03:11:34PM +0100, Harald Welte wrote:

> > So instead of using NAT to dynamically redirect traffic to local
> > addresses, we now rely on "native" non-locally-bound sockets and do
> > early socket lookups for inbound IPv4 packets. 
> 
> It's good to see a solid implementation of this 'old idea'.  
> 
> Just as a quick historical note to netdev:  This is the way how the
> netfilter project  advised the balabit guys to implement fully
> transparent proxy support, after having seen the complexity of the old
> nat-based TPROXY patches.

Didn't rusty tell the balabit guys to use the NAT approach?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-07 16:11   ` Lennert Buytenhek
@ 2007-01-07 23:58     ` Harald Welte
  0 siblings, 0 replies; 35+ messages in thread
From: Harald Welte @ 2007-01-07 23:58 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: KOVACS Krisztian, netfilter-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]

On Sun, Jan 07, 2007 at 05:11:06PM +0100, Lennert Buytenhek wrote:
> On Sun, Jan 07, 2007 at 03:11:34PM +0100, Harald Welte wrote:
> 
> > > So instead of using NAT to dynamically redirect traffic to local
> > > addresses, we now rely on "native" non-locally-bound sockets and do
> > > early socket lookups for inbound IPv4 packets. 
> > 
> > It's good to see a solid implementation of this 'old idea'.  
> > 
> > Just as a quick historical note to netdev:  This is the way how the
> > netfilter project  advised the balabit guys to implement fully
> > transparent proxy support, after having seen the complexity of the old
> > nat-based TPROXY patches.
> 
> Didn't rusty tell the balabit guys to use the NAT approach?

that was originally, way back.  It turned out to be a bad idea, after
all... way too complex.  At least that's how I look at it.  Too sad :(

Rusty and me then had the idea about the routing based approach at some
point, if I remember correctly.  We talked about it with Krisztian and
Balazs at least on one occasion.

All that isn't really important.  All I wanted to say was:

"I (and AFAIR the netfilter core team) believe this is the way to
implement good support for transparent proxying.  It's already the
second completely independent implementation, let's merge it after all."

-- 
- Harald Welte <laforge@netfilter.org>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 00/10] Transparent proxying patches version 4
  2007-01-03 17:23 ` [PATCH/RFC 00/10] Transparent proxying patches version 4 Evgeniy Polyakov
@ 2007-01-08 20:30   ` KOVACS Krisztian
  0 siblings, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-08 20:30 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Evgeniy Polyakov, netdev

  Hi Evgeniy,

On Wednesday 03 January 2007 18:23, Evgeniy Polyakov wrote:
> Out of curiosity, would you use netchannels [1] if the implementation
> will be much broader? Since what you have created works exactly like
> netchannels netfilter NAT target (although it does not change ports,
> but it can be trivially extended), but without all existing netfilter
> overhead and without hacks in core TCP/UDP/IP/route code.

  Indeed, a netchannels based implementation would be very nice. Combined 
with a userspace network stack I think this could be a very powerful 
tool, especially for people doing dirty tricks -- like transparent 
proxying in our case.

  However, I think that adopting netchannels now would be an enormous work 
on our part. Of course, personally I'm really interested in netchannels 
and the related projects, but I agree with Harald that we still have a 
long way to go before being able to switch to netchannels. And I 
definitely _hate_ the previous incarnations of our tproxy patches enough 
that even this patchset seems acceptable for me. ;)

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-03 16:34 ` [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs KOVACS Krisztian
@ 2007-01-10  6:46   ` Patrick McHardy
  2007-01-10  9:31     ` Balazs Scheidler
  2007-01-10 10:17     ` KOVACS Krisztian
  0 siblings, 2 replies; 35+ messages in thread
From: Patrick McHardy @ 2007-01-10  6:46 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netdev, netfilter-devel

KOVACS Krisztian wrote:
> The input path for non-local bound sockets requires diverting certain
> packets locally, even if their destination IP address is not
> considered local. We achieve this by assigning a specially crafted dst
> entry to these skbs, and optionally also attaching a socket to the skb
> so that the upper layer code does not need to redo the socket lookup.
> 
> We also have to be able to differentiate between these fake entries
> and "real" entries in the cache: it is perfectly legal that the
> diversion is done only for certain TCP or UDP packets and not for all
> packets of the flow. Since these special dst entries are used only by
> the iptables tproxy code, and that code uses exclusively these
> entries, simply flagging these entries as DST_DIVERTED is OK. All
> other cache lookup paths skip diverted entries, while our new
> ip_divert_local() function uses exclusively diverted dst entries.
> 
> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
> 
> ---
> 
>  include/net/dst.h   |    1 
>  include/net/route.h |    2 +
>  net/ipv4/route.c    |  106 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 108 insertions(+), 1 deletions(-)
> 
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 62b7e75..72b712c 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -50,6 +50,7 @@ #define DST_NOXFRM		2
>  #define DST_NOPOLICY		4
>  #define DST_NOHASH		8
>  #define DST_BALANCED            0x10
> +#define DST_DIVERTED		0x20
>  	unsigned long		lastuse;
>  	unsigned long		expires;
>  
> diff --git a/include/net/route.h b/include/net/route.h
> index 486e37a..ee52393 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -126,6 +126,8 @@ extern int		ip_rt_ioctl(unsigned int cmd
>  extern void		ip_rt_get_source(u8 *src, struct rtable *rt);
>  extern int		ip_rt_dump(struct sk_buff *skb,  struct netlink_callback *cb);
>  
> +extern int		ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct sock *sk);
> +
>  struct in_ifaddr;
>  extern void fib_add_ifaddr(struct in_ifaddr *);
>  
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 2daa0dc..537b976 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -942,9 +942,11 @@ restart:
>  	while ((rth = *rthp) != NULL) {
>  #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
>  		if (!(rth->u.dst.flags & DST_BALANCED) &&
> +		    ((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & DST_DIVERTED)) &&
>  		    compare_keys(&rth->fl, &rt->fl)) {
>  #else
> -		if (compare_keys(&rth->fl, &rt->fl)) {
> +		if (((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & DST_DIVERTED)) &&
> +		    compare_keys(&rth->fl, &rt->fl)) {
>  #endif
>  			/* Put it first */
>  			*rthp = rth->u.rt_next;
> @@ -1166,6 +1168,7 @@ void ip_rt_redirect(__be32 old_gw, __be3
>  				if (rth->fl.fl4_dst != daddr ||
>  				    rth->fl.fl4_src != skeys[i] ||
>  				    rth->fl.oif != ikeys[k] ||
> +				    (rth->u.dst.flags & DST_DIVERTED) ||
>  				    rth->fl.iif != 0) {
>  					rthp = &rth->u.rt_next;
>  					continue;
> @@ -1526,6 +1529,105 @@ static int ip_rt_bug(struct sk_buff *skb
>  	return 0;
>  }
>  
> +static void ip_divert_free_sock(struct sk_buff *skb)
> +{
> +	struct sock *sk = skb->sk;
> +
> +	skb->sk = NULL;
> +	skb->destructor = NULL;
> +	sock_put(sk);
> +}
> +
> +int ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct sock *sk)
> +{
> +	struct iphdr *iph = skb->nh.iph;
> +	struct rtable *rth, *rtres;
> +	unsigned hash;
> +	const int iif = in->dev->ifindex;
> +	u_int8_t tos;
> +	int err;
> +
> +	/* look up hash first */
> +	tos = iph->tos & IPTOS_RT_MASK;
> +	hash = rt_hash_code(iph->daddr, iph->saddr ^ (iif << 5));
> +
> +	rcu_read_lock();
> +	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
> +	     rth = rcu_dereference(rth->u.rt_next)) {
> +		if (rth->fl.fl4_dst == iph->daddr &&
> +		    rth->fl.fl4_src == iph->saddr &&
> +		    rth->fl.iif == iif &&
> +		    rth->fl.oif == 0 &&
> +		    rth->fl.mark == skb->mark &&
> +		    (rth->u.dst.flags & DST_DIVERTED) &&
> +		    rth->fl.fl4_tos == tos) {

Mark and tos look unnecessary here since they don't affect the further
processing of the packet.

> +			rth->u.dst.lastuse = jiffies;
> +			dst_hold(&rth->u.dst);
> +			rth->u.dst.__use++;
> +			RT_CACHE_STAT_INC(in_hit);
> +			rcu_read_unlock();
> +
> +			dst_release(skb->dst);
> +			skb->dst = (struct dst_entry*)rth;
> +
> +			if (sk) {
> +				sock_hold(sk);
> +				skb->sk = sk;

This looks racy, the socket could be closed between the lookup and
the actual use. Why do you need the socket lookup at all, can't
you just divert all packets selected by iptables?

I'm wondering if it would be possible to use normal input routing
combined with netfilter marks to do the diversion ..

> +				skb->destructor = ip_divert_free_sock;
> +			}
> +
> +			return 0;
> +		}
> +		RT_CACHE_STAT_INC(in_hlist_search);
> +	}
> +	rcu_read_unlock();
> +
> +	/* not found in cache, try to allocate a new dst entry */
> +	rth = dst_alloc(&ipv4_dst_ops);
> +	if (!rth)
> +		return -ENOMEM;
> +
> +	rth->u.dst.output= ip_rt_bug;
> +
> +	atomic_set(&rth->u.dst.__refcnt, 1);
> +	rth->u.dst.flags = DST_HOST | DST_DIVERTED;
> +
> +	if (in->cnf.no_policy)
> +		rth->u.dst.flags |= DST_NOPOLICY;
> +
> +	rth->fl.fl4_dst = iph->daddr;
> +	rth->rt_dst	= iph->daddr;
> +	rth->fl.fl4_tos = iph->tos;
> +	rth->fl.mark	= skb->mark;
> +	rth->fl.fl4_src = iph->saddr;
> +	rth->rt_src	= iph->saddr;
> +	rth->rt_iif	=
> +	rth->fl.iif	= skb->dev->ifindex;
> +	rth->u.dst.dev	= &loopback_dev;
> +	dev_hold(rth->u.dst.dev);
> +	rth->idev	= in_dev_get(rth->u.dst.dev);
> +	rth->rt_gateway = iph->daddr;
> +	rth->rt_spec_dst= iph->daddr;
> +	rth->u.dst.input= ip_local_deliver;
> +	rth->rt_flags	= RTCF_LOCAL;
> +	rth->rt_type	= RTN_LOCAL;
> +
> +	err = rt_intern_hash(hash, rth, &rtres);
> +	if (err)
> +		return err;
> +
> +	dst_release(skb->dst);
> +	skb->dst = (struct dst_entry *) rth;
> +
> +	if (sk) {
> +		sock_hold(sk);
> +		skb->sk = sk;
> +		skb->destructor = ip_divert_free_sock;
> +	}
> +
> +	return 0;
> +}
> +
>  /*
>     We do not cache source address of outgoing interface,
>     because it is used only by IP RR, TS and SRR options,
> @@ -2104,6 +2206,7 @@ int ip_route_input(struct sk_buff *skb,
>  		    rth->fl.fl4_src == saddr &&
>  		    rth->fl.iif == iif &&
>  		    rth->fl.oif == 0 &&
> +		    !(rth->u.dst.flags & DST_DIVERTED) &&
>  		    rth->fl.mark == skb->mark &&
>  		    rth->fl.fl4_tos == tos) {
>  			rth->u.dst.lastuse = jiffies;
> @@ -3199,3 +3302,4 @@ #endif
>  EXPORT_SYMBOL(__ip_select_ident);
>  EXPORT_SYMBOL(ip_route_input);
>  EXPORT_SYMBOL(ip_route_output_key);
> +EXPORT_SYMBOL_GPL(ip_divert_local);
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 05/10] Remove local address check on IP output
  2007-01-03 16:36 ` [PATCH/RFC 05/10] Remove local address check on IP output KOVACS Krisztian
@ 2007-01-10  6:47   ` Patrick McHardy
  2007-01-10 10:01     ` KOVACS Krisztian
  2007-02-06 14:36     ` IP_FREEBIND and CAP_NET_ADMIN (was: Re: [PATCH/RFC 05/10] Remove local address check on IP output) KOVACS Krisztian
  0 siblings, 2 replies; 35+ messages in thread
From: Patrick McHardy @ 2007-01-10  6:47 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netdev, netfilter-devel

KOVACS Krisztian wrote:
> ip_route_output() contains a check to make sure that no flows with
> non-local source IP addresses are routed. Unfortunately this check
> makes it completely impossible to use non-local bound sockets as no
> outbound packets will make through the stack.
> 
> This patch moves the interface lookup to the multicast-specific code
> path as that is the only real user of the interface data looked up.
> 
> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
> 
> ---
> 
>  net/ipv4/route.c |   13 +++++--------
>  1 files changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 537b976..bb1158a 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2498,11 +2498,6 @@ #endif
>  		    ZERONET(oldflp->fl4_src))
>  			goto out;
>  
> -		/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
> -		dev_out = ip_dev_find(oldflp->fl4_src);
> -		if (dev_out == NULL)
> -			goto out;
> -

I'm not sure how exactly this is used by applications, but couldn't you
restrict this to sockets without freebind?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10  6:46   ` Patrick McHardy
@ 2007-01-10  9:31     ` Balazs Scheidler
  2007-01-10 12:32       ` Patrick McHardy
  2007-01-10 10:17     ` KOVACS Krisztian
  1 sibling, 1 reply; 35+ messages in thread
From: Balazs Scheidler @ 2007-01-10  9:31 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, netfilter-devel, KOVACS Krisztian

On Wed, 2007-01-10 at 07:46 +0100, Patrick McHardy wrote:
> KOVACS Krisztian wrote:

> > +			rth->u.dst.lastuse = jiffies;
> > +			dst_hold(&rth->u.dst);
> > +			rth->u.dst.__use++;
> > +			RT_CACHE_STAT_INC(in_hit);
> > +			rcu_read_unlock();
> > +
> > +			dst_release(skb->dst);
> > +			skb->dst = (struct dst_entry*)rth;
> > +
> > +			if (sk) {
> > +				sock_hold(sk);
> > +				skb->sk = sk;
> 
> This looks racy, the socket could be closed between the lookup and
> the actual use. Why do you need the socket lookup at all, can't
> you just divert all packets selected by iptables?
> 
> I'm wondering if it would be possible to use normal input routing
> combined with netfilter marks to do the diversion ..
> 

The problem is that userspace proxies open ports dynamically (think of
FTP data channels), you cannot add iptables rule for every such
redirection. So one rule for every dynamic redirection is a no-go.

If we'd add a single rule, which would do some kind of lookup and then
mark packets, would again introduce a state inside tproxy that'd need to
be synchronized with the socket table. We explicitly wanted to avoid
such tables.

And additionally, using the mark this way would prevent the admin to use
it they way he/she likes. 

-- 
Bazsi

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 05/10] Remove local address check on IP output
  2007-01-10  6:47   ` Patrick McHardy
@ 2007-01-10 10:01     ` KOVACS Krisztian
  2007-02-06 14:36     ` IP_FREEBIND and CAP_NET_ADMIN (was: Re: [PATCH/RFC 05/10] Remove local address check on IP output) KOVACS Krisztian
  1 sibling, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-10 10:01 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev


  Hi,

On Wednesday 10 January 2007 07:47, Patrick McHardy wrote:
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 537b976..bb1158a 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -2498,11 +2498,6 @@ #endif
> >  		    ZERONET(oldflp->fl4_src))
> >  			goto out;
> >
> > -		/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
> > -		dev_out = ip_dev_find(oldflp->fl4_src);
> > -		if (dev_out == NULL)
> > -			goto out;
> > -
>
> I'm not sure how exactly this is used by applications, but couldn't you
> restrict this to sockets without freebind?

  I'll try to do so in the next incarnation of the patches. Thanks for the 
comment, it'd ineed be safer to do so.

  BTW, could anyone shed some light on exactly why that check is 
necessary? As far as I can see it prevents packets with a non-local 
source address being routed -- but I fail to see why we need to prevent 
that.

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10  6:46   ` Patrick McHardy
  2007-01-10  9:31     ` Balazs Scheidler
@ 2007-01-10 10:17     ` KOVACS Krisztian
  2007-01-10 12:19       ` Patrick McHardy
  1 sibling, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-10 10:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev, Balazs Scheidler


  Hi,

On Wednesday 10 January 2007 07:46, Patrick McHardy wrote:
> > +	rcu_read_lock();
> > +	for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
> > +	     rth = rcu_dereference(rth->u.rt_next)) {
> > +		if (rth->fl.fl4_dst == iph->daddr &&
> > +		    rth->fl.fl4_src == iph->saddr &&
> > +		    rth->fl.iif == iif &&
> > +		    rth->fl.oif == 0 &&
> > +		    rth->fl.mark == skb->mark &&
> > +		    (rth->u.dst.flags & DST_DIVERTED) &&
> > +		    rth->fl.fl4_tos == tos) {
>
> Mark and tos look unnecessary here since they don't affect the further
> processing of the packet.

  Indeed, thanks for spotting it.

> > +			rth->u.dst.lastuse = jiffies;
> > +			dst_hold(&rth->u.dst);
> > +			rth->u.dst.__use++;
> > +			RT_CACHE_STAT_INC(in_hit);
> > +			rcu_read_unlock();
> > +
> > +			dst_release(skb->dst);
> > +			skb->dst = (struct dst_entry*)rth;
> > +
> > +			if (sk) {
> > +				sock_hold(sk);
> > +				skb->sk = sk;
>
> This looks racy, the socket could be closed between the lookup and
> the actual use. Why do you need the socket lookup at all, can't
> you just divert all packets selected by iptables?

  Yes, it's racy, but I this is true for the "regular" socket lookup, too. 
Take UDP for example: __udp4_lib_rcv() does the socket lookup, gets a 
reference to the socket, and then calls udp_queue_rcv_skb() to queue the 
skb. As far as I can see there's nothing there which prevents the socket 
from being closed between these calls. sk_common_release() even documents 
this behaviour:

	[...]
	if (sk->sk_prot->destroy)
		sk->sk_prot->destroy(sk);

	/*
	 * Observation: when sock_common_release is called, processes have
	 * no access to socket. But net still has.
	 * Step one, detach it from networking:
	 *
	 * A. Remove from hash tables.
	 */

	sk->sk_prot->unhash(sk);

	/*
	 * In this point socket cannot receive new packets, but it is possible
	 * that some packets are in flight because some CPU runs receiver and
	 * did hash table lookup before we unhashed socket. They will achieve
	 * receive queue and will be purged by socket destructor.
	 *
	 * Also we still have packets pending on receive queue and probably,
	 * our own packets waiting in device queues. sock_destroy will drain
	 * receive queue, but transmitted packets will delay socket destruction
	 * until the last reference will be released.
	 */
	[...]

  Of course it's true that doing early lookups and storing that reference 
in the skb widens the window considerably, but I think this race is 
already handled. Or is there anything I don't see?

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10 10:17     ` KOVACS Krisztian
@ 2007-01-10 12:19       ` Patrick McHardy
  2007-01-16 12:49         ` KOVACS Krisztian
  0 siblings, 1 reply; 35+ messages in thread
From: Patrick McHardy @ 2007-01-10 12:19 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, netdev, Balazs Scheidler

KOVACS Krisztian wrote:
> On Wednesday 10 January 2007 07:46, Patrick McHardy wrote:
> 
>>>+			if (sk) {
>>>+				sock_hold(sk);
>>>+				skb->sk = sk;
>>
>>This looks racy, the socket could be closed between the lookup and
>>the actual use. Why do you need the socket lookup at all, can't
>>you just divert all packets selected by iptables?
> 
> 
>   Yes, it's racy, but I this is true for the "regular" socket lookup, too. 
> Take UDP for example: __udp4_lib_rcv() does the socket lookup, gets a 
> reference to the socket, and then calls udp_queue_rcv_skb() to queue the 
> skb. As far as I can see there's nothing there which prevents the socket 
> from being closed between these calls. sk_common_release() even documents 
> this behaviour:
> 
> 	[...]
> 	if (sk->sk_prot->destroy)
> 		sk->sk_prot->destroy(sk);
> 
> 	/*
> 	 * Observation: when sock_common_release is called, processes have
> 	 * no access to socket. But net still has.
> 	 * Step one, detach it from networking:
> 	 *
> 	 * A. Remove from hash tables.
> 	 */
> 
> 	sk->sk_prot->unhash(sk);
> 
> 	/*
> 	 * In this point socket cannot receive new packets, but it is possible
> 	 * that some packets are in flight because some CPU runs receiver and
> 	 * did hash table lookup before we unhashed socket. They will achieve
> 	 * receive queue and will be purged by socket destructor.
> 	 *
> 	 * Also we still have packets pending on receive queue and probably,
> 	 * our own packets waiting in device queues. sock_destroy will drain
> 	 * receive queue, but transmitted packets will delay socket destruction
> 	 * until the last reference will be released.
> 	 */
> 	[...]
>
>   Of course it's true that doing early lookups and storing that reference 
> in the skb widens the window considerably, but I think this race is 
> already handled. Or is there anything I don't see?

You're right, it seems to be handled properly (except I think there is
a race between sk_common_release calling xfrm_sk_free_policy and f.e.
udp calling __xfrm_policy_check, will look into that).

It probably shouldn't be cached anyway, with nf_queue for example
the window could be _really_ large.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10  9:31     ` Balazs Scheidler
@ 2007-01-10 12:32       ` Patrick McHardy
  2007-01-10 13:27         ` Ingo Oeser
  2007-01-11 14:05         ` KOVACS Krisztian
  0 siblings, 2 replies; 35+ messages in thread
From: Patrick McHardy @ 2007-01-10 12:32 UTC (permalink / raw)
  To: Balazs Scheidler; +Cc: netdev, netfilter-devel, KOVACS Krisztian

Balazs Scheidler wrote:
> On Wed, 2007-01-10 at 07:46 +0100, Patrick McHardy wrote:
> 
>>I'm wondering if it would be possible to use normal input routing
>>combined with netfilter marks to do the diversion ..
>
> 
> The problem is that userspace proxies open ports dynamically (think of
> FTP data channels), you cannot add iptables rule for every such
> redirection. So one rule for every dynamic redirection is a no-go.
> 
> If we'd add a single rule, which would do some kind of lookup and then
> mark packets, would again introduce a state inside tproxy that'd need to
> be synchronized with the socket table. We explicitly wanted to avoid
> such tables.

How exactly are dynamic ports handled? Do you just add a catch-all rule
that filters based on socket lookups?

In that case you could do something like this:

ip route add local default dev lo scope host table 1
ip rule add fwmark 0x1 lookup 1

and still use the socket lookups for marking, which would (without the
socket caching) remove the need for this patch entirely.

> And additionally, using the mark this way would prevent the admin to use
> it they way he/she likes. 

We support bitwise use of the mark everywhere in current kernels, so
that shouldn't be a problem anymore.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 08/10] iptables tproxy table
  2007-01-03 16:37 ` [PATCH/RFC 08/10] iptables tproxy table KOVACS Krisztian
@ 2007-01-10 12:40   ` Patrick McHardy
  0 siblings, 0 replies; 35+ messages in thread
From: Patrick McHardy @ 2007-01-10 12:40 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, netdev

KOVACS Krisztian wrote:

> diff --git a/net/ipv4/netfilter/iptable_tproxy.c b/net/ipv4/netfilter/iptable_tproxy.c
> new file mode 100644
> index 0000000..6049c83
> --- /dev/null
> +++ b/net/ipv4/netfilter/iptable_tproxy.c
> @@ -0,0 +1,253 @@
> +/*
> + * Transparent proxy support for Linux/iptables
> + *
> + * Copyright (c) 2006-2007 BalaBit IT Ltd.
> + * Author: Balazs Scheidler, Krisztian Kovacs
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/version.h>
> +#include <linux/module.h>
> +
> +#include <linux/sysctl.h>
> +#include <linux/vmalloc.h>
> +#include <linux/net.h>
> +#include <linux/slab.h>
> +#include <linux/if.h>
> +#include <linux/proc_fs.h>
> +#include <linux/seq_file.h>
> +#include <linux/netdevice.h>
> +#include <linux/inetdevice.h>
> +#include <linux/time.h>
> +#include <linux/in.h>
> +#include <net/tcp.h>
> +#include <net/udp.h>
> +#include <net/sock.h>
> +#include <net/inet_sock.h>
> +#include <asm/uaccess.h>

A few of these look unnecessary (like vmalloc, seq_file, proc_fs,
time, uaccess).

> +#include <linux/netfilter.h>
> +#include <linux/netfilter_ipv4.h>
> +#include <linux/netfilter_ipv4/ip_tables.h>
> +
> +#define TPROXY_VALID_HOOKS (1 << NF_IP_PRE_ROUTING)
> +
> +#if 0
> +#define DEBUGP printk
> +#else
> +#define DEBUGP(f, args...)
> +#endif
> +
> +static struct
> +{
> +	struct ipt_replace repl;
> +	struct ipt_standard entries[2];
> +	struct ipt_error term;
> +} initial_table __initdata = {
> +	.repl = {
> +		.name = "tproxy",
> +		.valid_hooks = TPROXY_VALID_HOOKS,
> +		.num_entries = 2,
> +		.size = sizeof(struct ipt_standard) + sizeof(struct ipt_error),
> +		.hook_entry = {
> +			[NF_IP_PRE_ROUTING] = 0 },
> +		.underflow = {
> +			[NF_IP_PRE_ROUTING] = 0 },
> +	},
> +	.entries = {
> +		/* PRE_ROUTING */
> +		{
> +			.entry = {
> +				.target_offset = sizeof(struct ipt_entry),
> +				.next_offset = sizeof(struct ipt_standard),
> +			},
> +			.target = {
> +				.target = {
> +					.u = {
> +						.target_size = IPT_ALIGN(sizeof(struct ipt_standard_target)),
> +					},
> +				},
> +				.verdict = -NF_ACCEPT - 1,
> +			},
> +		},
> +	},
> +	/* ERROR */
> +	.term = {
> +		.entry = {
> +			.target_offset = sizeof(struct ipt_entry),
> +			.next_offset = sizeof(struct ipt_error),
> +		},
> +		.target = {
> +			.target = {
> +				.u = {
> +					.user = {
> +						.target_size = IPT_ALIGN(sizeof(struct ipt_error_target)),
> +						.name = IPT_ERROR_TARGET,
> +					},
> +				},
> +			},
> +			.errorname = "ERROR",
> +		},
> +	}
> +};
> +
> +static struct ipt_table tproxy_table = {
> +	.name		= "tproxy",
> +	.valid_hooks	= TPROXY_VALID_HOOKS,
> +	.lock		= RW_LOCK_UNLOCKED,
> +	.me		= THIS_MODULE,
> +	.af		= AF_INET,
> +};
> +
> +struct sock *
> +ip_tproxy_get_sock(const u8 protocol,
> +		   const __be32 saddr, const __be32 daddr,
> +		   const __be16 sport, const __be16 dport,
> +		   const struct net_device *in)
> +{
> +	struct sock *sk = NULL;
> +
> +	/* look up socket */
> +	switch (protocol) {
> +	case IPPROTO_TCP:
> +		sk = __inet_lookup(&tcp_hashinfo,
> +				   saddr, sport, daddr, sport,
> +				   in->ifindex);
> +		break;
> +	case IPPROTO_UDP:
> +		sk = udp4_lib_lookup(saddr, sport, daddr, dport,
> +				     in->ifindex);
> +		break;
> +	default:
> +		WARN_ON(1);
> +	}
> +
> +	return sk;
> +}
> +EXPORT_SYMBOL_GPL(ip_tproxy_get_sock);
> +
> +int
> +ip_tproxy_do_divert(struct sk_buff *skb, struct sock *sk,
> +		    const int require_freebind,
> +		    const struct net_device *in)
> +{
> +	const struct inet_sock *inet = inet_sk(sk);
> +	struct in_device *indev;
> +
> +	if (unlikely(inet == NULL))
> +		return -EINVAL;
> +
> +	if (!require_freebind || inet->freebind) {
> +		indev = in_dev_get(in);
> +		if (indev == NULL)
> +			return -ENODEV;
> +
> +		skb->ip_tproxy = 1;
> +
> +		ip_divert_local(skb, indev, sk);
> +		in_dev_put(indev);
> +
> +		DEBUGP(KERN_DEBUG "IP_TPROXY: diverted to socket %p\n", sk);
> +	}
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ip_tproxy_do_divert);
> +
> +static unsigned int
> +ip_tproxy_prerouting(unsigned int hooknum,
> +		     struct sk_buff **pskb,
> +		     const struct net_device *in,
> +		     const struct net_device *out,
> +		     int (*okfn)(struct sk_buff *))
> +{
> +	int verdict = NF_ACCEPT;
> +	struct sk_buff *skb = *pskb;
> +	u8 protocol = skb->nh.iph->protocol;
> +	struct sock *sk = NULL;
> +	const struct iphdr *iph = (*pskb)->nh.iph;
> +	struct udphdr _hdr, *hp;
> +
> +	/* TCP and UDP only */
> +	if ((protocol != IPPROTO_TCP) && (protocol != IPPROTO_UDP))
> +		return NF_ACCEPT;
> +
> +	if (in == NULL)
> +		return NF_ACCEPT;
> +
> +	if ((skb->dst != NULL) || (skb->ip_tproxy == 1))
> +		return NF_ACCEPT;
> +
> +	hp = skb_header_pointer(skb, skb->nh.iph->ihl * 4, sizeof(_hdr), &_hdr);
> +	if (hp == NULL) {
> +		DEBUGP(KERN_DEBUG "IP_TPROXY: ip_tproxy_fn(): "
> +		       "failed to get protocol header\n");
> +		return NF_DROP;
> +	}
> +
> +	sk = ip_tproxy_get_sock(iph->protocol,
> +				iph->saddr, iph->daddr,
> +				hp->source, hp->dest, in);
> +	if (sk) {
> +		if (ip_tproxy_do_divert(skb, sk, 1, in) < 0) {
> +			DEBUGP(KERN_DEBUG "IP_TPROXY: divert failed, dropping packet\n");
> +			verdict = NF_DROP;
> +		}
> +		sock_put(sk);

This doesn't handle time wait sockets (need inet_twsk_put).

> +	} else {
> +		verdict = ipt_do_table(pskb, hooknum, in, out, &tproxy_table);
> +	}
> +
> +	return verdict;
> +}
> +
> +static struct nf_hook_ops ip_tproxy_pre_ops = {
> +	.hook		= ip_tproxy_prerouting,
> +	.owner		= THIS_MODULE,
> +	.pf		= PF_INET,
> +	.hooknum	= NF_IP_PRE_ROUTING,
> +	.priority	= -130

This should go in netfilter_ipv4.h

> +};
> +
> +static int __init init(void)
> +{
> +	int ret;
> +
> +	ret = ipt_register_table(&tproxy_table, &initial_table.repl);
> +	if (ret < 0) {
> +		printk("IP_TPROXY: can't register tproxy table.\n");
> +		return ret;
> +	}
> +
> +	ret = nf_register_hook(&ip_tproxy_pre_ops);
> +	if (ret < 0) {
> +		printk("IP_TPROXY: can't register prerouting hook.\n");
> +		goto clean_table;
> +	}
> +
> +	printk("IP_TPROXY: Transparent proxy support initialized, version 4.0.0\n"
> +	       "IP_TPROXY: Copyright (c) 2006-2007 BalaBit IT Ltd.\n");
> +
> +	return ret;
> +
> + clean_table:
> +	ipt_unregister_table(&tproxy_table);
> +	return ret;
> +}
> +
> +static void __exit fini(void)
> +{
> +	nf_unregister_hook(&ip_tproxy_pre_ops);
> +	ipt_unregister_table(&tproxy_table);
> +}
> +
> +module_init(init);
> +module_exit(fini);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Krisztian Kovacs <hidden@balabit.hu>");
> +MODULE_DESCRIPTION("iptables transparent proxy table");
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 09/10] iptables TPROXY target
  2007-01-03 16:38 ` [PATCH/RFC 09/10] iptables TPROXY target KOVACS Krisztian
@ 2007-01-10 12:45   ` Patrick McHardy
  0 siblings, 0 replies; 35+ messages in thread
From: Patrick McHardy @ 2007-01-10 12:45 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, netdev

KOVACS Krisztian wrote:

> diff --git a/net/ipv4/netfilter/ipt_TPROXY.c b/net/ipv4/netfilter/ipt_TPROXY.c
> new file mode 100644
> index 0000000..6f64717
> --- /dev/null
> +++ b/net/ipv4/netfilter/ipt_TPROXY.c

> +static unsigned int
> +target(struct sk_buff **pskb,
> +       const struct net_device *in,
> +       const struct net_device *out,
> +       unsigned int hooknum,
> +       const struct xt_target *target,
> +       const void *targinfo)
> +{
> +	const struct iphdr *iph = (*pskb)->nh.iph;
> +	unsigned int verdict = NF_ACCEPT;
> +	struct sk_buff *skb = *pskb;
> +	struct udphdr _hdr, *hp;
> +	struct sock *sk;
> +
> +	/* TCP/UDP only */
> +	if ((iph->protocol != IPPROTO_TCP) &&
> +	    (iph->protocol != IPPROTO_UDP))
> +		return NF_ACCEPT;
> +
> +	if (in == NULL)
> +		return NF_ACCEPT;
> +
> +	if ((skb->dst != NULL) || (skb->ip_tproxy == 1))
> +		return NF_ACCEPT;
> +
> +	hp = skb_header_pointer(*pskb, iph->ihl * 4, sizeof(_hdr), &_hdr);
> +	if (hp == NULL)
> +		return NF_DROP;
> +
> +	sk = ip_tproxy_get_sock(iph->protocol,
> +				iph->saddr, iph->daddr,
> +				hp->source, hp->dest, in);
> +	if (sk != NULL) {
> +		if (ip_tproxy_do_divert(skb, sk, 0, in) < 0)
> +			verdict = NF_DROP;
> +		sock_put(sk);

Missing time wait socket handling.

> +	}
> +
> +	return verdict;
> +}
> +
> +static int
> +checkentry(const char *tablename,
> +	   const void *e,
> +	   const struct xt_target *target,
> +           void *targinfo,
> +           unsigned int hook_mask)
> +{
> +	/* checks are now done by the x_tables core based on
> +	 * information specified in the ipt_target structure */
> +	return 1;
> +}

The function is optional, you can simply delete it.

> +
> +static struct ipt_target ipt_tproxy_reg = {
> +	.name		= "TPROXY",
> +	.target		= target,
> +	.targetsize	= sizeof(struct ipt_tproxy_target_info),
> +	.table		= "tproxy",
> +	.checkentry	= checkentry,
> +	.me		= THIS_MODULE,
> +};
> +
> +static int __init init(void)
> +{
> +	if (ipt_register_target(&ipt_tproxy_reg))
> +		return -EINVAL;

This should return the result of ipt_register_target.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10 12:32       ` Patrick McHardy
@ 2007-01-10 13:27         ` Ingo Oeser
  2007-01-10 13:42           ` Patrick McHardy
  2007-01-11 14:05         ` KOVACS Krisztian
  1 sibling, 1 reply; 35+ messages in thread
From: Ingo Oeser @ 2007-01-10 13:27 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Balazs Scheidler, netdev, netfilter-devel, KOVACS Krisztian

Patrick McHardy schrieb:
> We support bitwise use of the mark everywhere in current kernels, so
> that shouldn't be a problem anymore.

For firewall mark based policy routing to work, one must still disable 
rp_filter, because this lookup doesn't take the mark into account[1].

So this statement is not quite true, although I believe you are probably right 
for this case.

BTW: This rp_filter=0 requirement isn't even officially documented 
	(e.g. in the LARTC).


Regards

Ingo Oeser

[1] But does take TOS into account for historic (???) reasons.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10 13:27         ` Ingo Oeser
@ 2007-01-10 13:42           ` Patrick McHardy
  0 siblings, 0 replies; 35+ messages in thread
From: Patrick McHardy @ 2007-01-10 13:42 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Balazs Scheidler, netdev, netfilter-devel, KOVACS Krisztian

Ingo Oeser wrote:
> Patrick McHardy schrieb:
> 
>>We support bitwise use of the mark everywhere in current kernels, so
>>that shouldn't be a problem anymore.
> 
> 
> For firewall mark based policy routing to work, one must still disable 
> rp_filter, because this lookup doesn't take the mark into account[1].

If distributions wouldn't enable it by default there would be no need
to disable it again :)

> So this statement is not quite true, although I believe you are probably right 
> for this case.
> 
> BTW: This rp_filter=0 requirement isn't even officially documented 
> 	(e.g. in the LARTC).

The kernel defaults to rp_filter = 0.

> [1] But does take TOS into account for historic (???) reasons.

With TOS the assumption of symetry is a bit more realistic than
with fwmarks, but it will probably still not work properly if
you actually use routing by TOS value.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10 12:32       ` Patrick McHardy
  2007-01-10 13:27         ` Ingo Oeser
@ 2007-01-11 14:05         ` KOVACS Krisztian
  1 sibling, 0 replies; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-11 14:05 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Balazs Scheidler, netdev, netfilter-devel


  Hi,

On Wednesday 10 January 2007 13:32, Patrick McHardy wrote:
> How exactly are dynamic ports handled? Do you just add a catch-all rule
> that filters based on socket lookups?
>
> In that case you could do something like this:
>
> ip route add local default dev lo scope host table 1
> ip rule add fwmark 0x1 lookup 1
>
> and still use the socket lookups for marking, which would (without the
> socket caching) remove the need for this patch entirely.

  Ok, I'll try to address all the concerns raised on the list.

  Thanks a lot for the review and comments.

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-10 12:19       ` Patrick McHardy
@ 2007-01-16 12:49         ` KOVACS Krisztian
  2007-01-16 13:19           ` Patrick McHardy
  0 siblings, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-01-16 12:49 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev, Balazs Scheidler

  Hi,

On Wednesday 10 January 2007 13:19, Patrick McHardy wrote:
> >   Of course it's true that doing early lookups and storing that
> > reference in the skb widens the window considerably, but I think this
> > race is already handled. Or is there anything I don't see?
>
> You're right, it seems to be handled properly (except I think there is
> a race between sk_common_release calling xfrm_sk_free_policy and f.e.
> udp calling __xfrm_policy_check, will look into that).
>
> It probably shouldn't be cached anyway, with nf_queue for example
> the window could be _really_ large.

  Patrick, I seem to be out of ideas how this could be done 
without "caching" the socket lookup. The problem is that it's not only 
caching in some cases. For example we can do something like this:

  iptables -t tproxy -A PREROUTING -s X -d Y -p tcp --dport 80 \
           -j TPROXY --to proxy_ip:proxy_port

  In this case the TPROXY target does a socket lookup for 
proxy_ip:proxy_port and stores that socket reference in skb->sk. 
Obviously if you don't do this then TCP will do a lookup on the packet's 
original destination address/port and it won't work.

  Unfortunately I don't see any way how this could be solved without 
storing the result of the lookup... So while I agree that having that 
socket reference in the skb is risky, as previously skb->sk was unused on 
the input path, I simply don't have any other idea. (Unless your load 
iptable_tproxy skb->sk==NULL on input is still true with these patches, 
so I think there should be absolutely no problems with tproxy unused.)

  Other possible problems which came to my mind:

- The previous version was missing IPv4 fragment reassembly: we obviously 
need this to be able to do socket lookups, so now I've added this to 
iptable_tproxy.
- IP_FREEBIND does not require NET_ADMIN capability, combined with the 
relaxed source address on ip_output() this means that we provide a way to 
do IPv4 address forging for unprivileged users. As we must not break 
anything it looks like we need a separate socket option for disabling 
output source address checks (this would obviously require NET_ADMIN).

  Thoughts? I'd be especially interested in any ideas wrt. the socket 
reference problems, as the other two seems to be easier to solve.

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
  2007-01-16 12:49         ` KOVACS Krisztian
@ 2007-01-16 13:19           ` Patrick McHardy
  0 siblings, 0 replies; 35+ messages in thread
From: Patrick McHardy @ 2007-01-16 13:19 UTC (permalink / raw)
  To: KOVACS Krisztian; +Cc: netfilter-devel, netdev, Balazs Scheidler

KOVACS Krisztian wrote:
> On Wednesday 10 January 2007 13:19, Patrick McHardy wrote:
> 
>>>  Of course it's true that doing early lookups and storing that
>>>reference in the skb widens the window considerably, but I think this
>>>race is already handled. Or is there anything I don't see?
>>
>>You're right, it seems to be handled properly (except I think there is
>>a race between sk_common_release calling xfrm_sk_free_policy and f.e.
>>udp calling __xfrm_policy_check, will look into that).
>>
>>It probably shouldn't be cached anyway, with nf_queue for example
>>the window could be _really_ large.
> 
> 
>   Patrick, I seem to be out of ideas how this could be done 
> without "caching" the socket lookup. The problem is that it's not only 
> caching in some cases. For example we can do something like this:
> 
>   iptables -t tproxy -A PREROUTING -s X -d Y -p tcp --dport 80 \
>            -j TPROXY --to proxy_ip:proxy_port
> 
>   In this case the TPROXY target does a socket lookup for 
> proxy_ip:proxy_port and stores that socket reference in skb->sk. 
> Obviously if you don't do this then TCP will do a lookup on the packet's 
> original destination address/port and it won't work.
> 
>   Unfortunately I don't see any way how this could be solved without 
> storing the result of the lookup... So while I agree that having that 
> socket reference in the skb is risky, as previously skb->sk was unused on 
> the input path, I simply don't have any other idea. (Unless your load 
> iptable_tproxy skb->sk==NULL on input is still true with these patches, 
> so I think there should be absolutely no problems with tproxy unused.)

One (not very pretty) possibility would be to store the address/port
somewhere in the skb and use it for the socket lookup. I think thats
also what the 2.2 code did. Other than that I don't have any ideas
either, but I'm not too familiar with that code, maybe someone else
could explain whether caching the sockets would really be a problem
and why.

>   Other possible problems which came to my mind:
> 
> - The previous version was missing IPv4 fragment reassembly: we obviously 
> need this to be able to do socket lookups, so now I've added this to 
> iptable_tproxy.

Makes sense.

> - IP_FREEBIND does not require NET_ADMIN capability, combined with the 
> relaxed source address on ip_output() this means that we provide a way to 
> do IPv4 address forging for unprivileged users. As we must not break 
> anything it looks like we need a separate socket option for disabling 
> output source address checks (this would obviously require NET_ADMIN).

Also sounds reasonable.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* IP_FREEBIND and CAP_NET_ADMIN (was: Re: [PATCH/RFC 05/10] Remove local address check on IP output)
  2007-01-10  6:47   ` Patrick McHardy
  2007-01-10 10:01     ` KOVACS Krisztian
@ 2007-02-06 14:36     ` KOVACS Krisztian
  2007-02-06 19:46       ` IP_FREEBIND and CAP_NET_ADMIN David Miller
  1 sibling, 1 reply; 35+ messages in thread
From: KOVACS Krisztian @ 2007-02-06 14:36 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel, netdev

On Wednesday 10 January 2007 07:47, Patrick McHardy wrote:
> KOVACS Krisztian wrote:
> > ip_route_output() contains a check to make sure that no flows with
> > non-local source IP addresses are routed. Unfortunately this check
> > makes it completely impossible to use non-local bound sockets as no
> > outbound packets will make through the stack.
> >
> > This patch moves the interface lookup to the multicast-specific code
> > path as that is the only real user of the interface data looked up.
> >
> > Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
> >
> > ---
> >
> >  net/ipv4/route.c |   13 +++++--------
> >  1 files changed, 5 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> > index 537b976..bb1158a 100644
> > --- a/net/ipv4/route.c
> > +++ b/net/ipv4/route.c
> > @@ -2498,11 +2498,6 @@ #endif
> >  		    ZERONET(oldflp->fl4_src))
> >  			goto out;
> >
> > -		/* It is equivalent to inet_addr_type(saddr) == RTN_LOCAL */
> > -		dev_out = ip_dev_find(oldflp->fl4_src);
> > -		if (dev_out == NULL)
> > -			goto out;
> > -
>
> I'm not sure how exactly this is used by applications, but couldn't you
> restrict this to sockets without freebind?

As it turned out since I've submitted this patch simply removing the 
branch in the quoted patch above is not good, as that'd allow all local 
users to generate connections from a non-local IP address. (Since setting 
IP_FREEBIND does not require CAP_NET_ADMIN.)

I've attempted to restrict the removal of the check to certain sockets, 
but it is more difficult than expected. It'd require touching a lot of 
areas of the kernel code, as the socket is not available at times where 
an output routing lookup is requested.

In fact the only thing available when making the decision in 
ip_route_output_slow() is a struct flowi. I've tried to stuff a flag bit 
into "struct flowi", but that solution seems to be very risky, as the 
value for "struct flowi->flags" is not consulted at a lot of places. IMHO 
the result would be far from pretty... (And I have to admit that I don't 
really know what flowi->flags is used for. I've found no in-tree user of 
that field. The only defined flag bit, FLOWI_FLAG_MULTIPATHOLDROUTE, has 
no in-tree user either.)

And even if we have this flag in place, it's not enough to set it for 
certain sockets in ip_route_connect(): this would not handle SYN+ACK or 
ACK packets sent in response for redirected TCP connection attempts. And 
who knows what else is still hiding there: ip_route_output_*() calls are 
pretty much everywhere in the whole net/ipv4 directory.

So I think the cleanest solution would be to require CAP_NET_ADMIN for 
IP_FREEBIND. This way, a non-root process would not be allowed to bind to 
a non-local socket, thus it would not be possible to initiate connections 
from a non-local IP.

As this would be a change in the kernel ABI, me and Balazs have tried to 
search for applications using the IP_FREEBIND option using Google 
codesearch (www.google.com/codesearch).

Outside libc and kernel, we've found only three applications that mention
this option:
* socat: which allows setting all socket options by the user (I doubt 
using IP_FREEBIND with socat has any meaningful use)
* strace: to be able to dump IP_FREEBIND
* qemu: for emulating Linux system calls

Neither of these require IP_FREEBIND as core functionality, and will 
probably work if IP_FREEBIND would be bound to CAP_NET_ADMIN.

So the question is: shall we take the IP_FREEBIND approach, this would 
change a hardly ever used interface by requiring CAP_NET_ADMIN 
capabilities, or we should try finding all the scattered places in the 
Linux IP stack which does a route lookup?

-- 
 Regards,
  Krisztian Kovacs

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: IP_FREEBIND and CAP_NET_ADMIN
  2007-02-06 14:36     ` IP_FREEBIND and CAP_NET_ADMIN (was: Re: [PATCH/RFC 05/10] Remove local address check on IP output) KOVACS Krisztian
@ 2007-02-06 19:46       ` David Miller
  0 siblings, 0 replies; 35+ messages in thread
From: David Miller @ 2007-02-06 19:46 UTC (permalink / raw)
  To: hidden; +Cc: kaber, netfilter-devel, netdev

From: KOVACS Krisztian <hidden@balabit.hu>
Date: Tue, 6 Feb 2007 15:36:18 +0100

> Neither of these require IP_FREEBIND as core functionality, and will 
> probably work if IP_FREEBIND would be bound to CAP_NET_ADMIN.
> 
> So the question is: shall we take the IP_FREEBIND approach, this would 
> change a hardly ever used interface by requiring CAP_NET_ADMIN 
> capabilities, or we should try finding all the scattered places in the 
> Linux IP stack which does a route lookup?

We're not going to remove functionality from the user for the
sake of convenience of something you are trying to write.

If it was some security hole, then fine, but it's not so it
can stay and it does have legitimate uses.

This freebind behavior should actually be the default, but we had to
put the socket option and sysctl there because allowing freebind by
default makes several test suites fail that try to purposely bind to a
non-local address and expect an error return.

It allows servers to bind when your on-demand connection is down.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2007-02-06 19:46 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
2007-01-03 16:34 ` [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs KOVACS Krisztian
2007-01-10  6:46   ` Patrick McHardy
2007-01-10  9:31     ` Balazs Scheidler
2007-01-10 12:32       ` Patrick McHardy
2007-01-10 13:27         ` Ingo Oeser
2007-01-10 13:42           ` Patrick McHardy
2007-01-11 14:05         ` KOVACS Krisztian
2007-01-10 10:17     ` KOVACS Krisztian
2007-01-10 12:19       ` Patrick McHardy
2007-01-16 12:49         ` KOVACS Krisztian
2007-01-16 13:19           ` Patrick McHardy
2007-01-03 16:34 ` [PATCH/RFC 02/10] Port redirection support for TCP KOVACS Krisztian
2007-01-03 16:35 ` [PATCH/RFC 03/10] Don't do the TCP socket lookup if we already have one attached KOVACS Krisztian
2007-01-03 16:35 ` [PATCH/RFC 04/10] Don't do the UDP " KOVACS Krisztian
2007-01-03 16:36 ` [PATCH/RFC 05/10] Remove local address check on IP output KOVACS Krisztian
2007-01-10  6:47   ` Patrick McHardy
2007-01-10 10:01     ` KOVACS Krisztian
2007-02-06 14:36     ` IP_FREEBIND and CAP_NET_ADMIN (was: Re: [PATCH/RFC 05/10] Remove local address check on IP output) KOVACS Krisztian
2007-02-06 19:46       ` IP_FREEBIND and CAP_NET_ADMIN David Miller
2007-01-03 16:36 ` [PATCH/RFC 06/10] Create a tproxy flag in struct sk_buff KOVACS Krisztian
2007-01-03 16:37 ` [PATCH/RFC 07/10] Export UDP socket lookup function KOVACS Krisztian
2007-01-03 16:37 ` [PATCH/RFC 08/10] iptables tproxy table KOVACS Krisztian
2007-01-10 12:40   ` Patrick McHardy
2007-01-03 16:38 ` [PATCH/RFC 09/10] iptables TPROXY target KOVACS Krisztian
2007-01-10 12:45   ` Patrick McHardy
2007-01-03 16:38 ` [PATCH/RFC 10/10] iptables tproxy match KOVACS Krisztian
2007-01-03 17:23 ` [PATCH/RFC 00/10] Transparent proxying patches version 4 Evgeniy Polyakov
2007-01-08 20:30   ` KOVACS Krisztian
2007-01-03 19:33 ` Lennert Buytenhek
2007-01-04 12:13   ` KOVACS Krisztian
2007-01-04 12:16     ` Lennert Buytenhek
2007-01-07 14:11 ` Harald Welte
2007-01-07 16:11   ` Lennert Buytenhek
2007-01-07 23:58     ` Harald Welte

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).