From: Patrick McHardy <kaber@trash.net>
To: KOVACS Krisztian <hidden@balabit.hu>
Cc: netdev@vger.kernel.org, netfilter-devel@lists.netfilter.org
Subject: Re: [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs
Date: Wed, 10 Jan 2007 07:46:46 +0100 [thread overview]
Message-ID: <45A48BD6.5010507@trash.net> (raw)
In-Reply-To: <20070103163427.14635.49596.stgit@nienna.balabit>
KOVACS Krisztian wrote:
> The input path for non-local bound sockets requires diverting certain
> packets locally, even if their destination IP address is not
> considered local. We achieve this by assigning a specially crafted dst
> entry to these skbs, and optionally also attaching a socket to the skb
> so that the upper layer code does not need to redo the socket lookup.
>
> We also have to be able to differentiate between these fake entries
> and "real" entries in the cache: it is perfectly legal that the
> diversion is done only for certain TCP or UDP packets and not for all
> packets of the flow. Since these special dst entries are used only by
> the iptables tproxy code, and that code uses exclusively these
> entries, simply flagging these entries as DST_DIVERTED is OK. All
> other cache lookup paths skip diverted entries, while our new
> ip_divert_local() function uses exclusively diverted dst entries.
>
> Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
>
> ---
>
> include/net/dst.h | 1
> include/net/route.h | 2 +
> net/ipv4/route.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 108 insertions(+), 1 deletions(-)
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 62b7e75..72b712c 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -50,6 +50,7 @@ #define DST_NOXFRM 2
> #define DST_NOPOLICY 4
> #define DST_NOHASH 8
> #define DST_BALANCED 0x10
> +#define DST_DIVERTED 0x20
> unsigned long lastuse;
> unsigned long expires;
>
> diff --git a/include/net/route.h b/include/net/route.h
> index 486e37a..ee52393 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -126,6 +126,8 @@ extern int ip_rt_ioctl(unsigned int cmd
> extern void ip_rt_get_source(u8 *src, struct rtable *rt);
> extern int ip_rt_dump(struct sk_buff *skb, struct netlink_callback *cb);
>
> +extern int ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct sock *sk);
> +
> struct in_ifaddr;
> extern void fib_add_ifaddr(struct in_ifaddr *);
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 2daa0dc..537b976 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -942,9 +942,11 @@ restart:
> while ((rth = *rthp) != NULL) {
> #ifdef CONFIG_IP_ROUTE_MULTIPATH_CACHED
> if (!(rth->u.dst.flags & DST_BALANCED) &&
> + ((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & DST_DIVERTED)) &&
> compare_keys(&rth->fl, &rt->fl)) {
> #else
> - if (compare_keys(&rth->fl, &rt->fl)) {
> + if (((rt->u.dst.flags & DST_DIVERTED) == (rth->u.dst.flags & DST_DIVERTED)) &&
> + compare_keys(&rth->fl, &rt->fl)) {
> #endif
> /* Put it first */
> *rthp = rth->u.rt_next;
> @@ -1166,6 +1168,7 @@ void ip_rt_redirect(__be32 old_gw, __be3
> if (rth->fl.fl4_dst != daddr ||
> rth->fl.fl4_src != skeys[i] ||
> rth->fl.oif != ikeys[k] ||
> + (rth->u.dst.flags & DST_DIVERTED) ||
> rth->fl.iif != 0) {
> rthp = &rth->u.rt_next;
> continue;
> @@ -1526,6 +1529,105 @@ static int ip_rt_bug(struct sk_buff *skb
> return 0;
> }
>
> +static void ip_divert_free_sock(struct sk_buff *skb)
> +{
> + struct sock *sk = skb->sk;
> +
> + skb->sk = NULL;
> + skb->destructor = NULL;
> + sock_put(sk);
> +}
> +
> +int ip_divert_local(struct sk_buff *skb, const struct in_device *in, struct sock *sk)
> +{
> + struct iphdr *iph = skb->nh.iph;
> + struct rtable *rth, *rtres;
> + unsigned hash;
> + const int iif = in->dev->ifindex;
> + u_int8_t tos;
> + int err;
> +
> + /* look up hash first */
> + tos = iph->tos & IPTOS_RT_MASK;
> + hash = rt_hash_code(iph->daddr, iph->saddr ^ (iif << 5));
> +
> + rcu_read_lock();
> + for (rth = rcu_dereference(rt_hash_table[hash].chain); rth;
> + rth = rcu_dereference(rth->u.rt_next)) {
> + if (rth->fl.fl4_dst == iph->daddr &&
> + rth->fl.fl4_src == iph->saddr &&
> + rth->fl.iif == iif &&
> + rth->fl.oif == 0 &&
> + rth->fl.mark == skb->mark &&
> + (rth->u.dst.flags & DST_DIVERTED) &&
> + rth->fl.fl4_tos == tos) {
Mark and tos look unnecessary here since they don't affect the further
processing of the packet.
> + rth->u.dst.lastuse = jiffies;
> + dst_hold(&rth->u.dst);
> + rth->u.dst.__use++;
> + RT_CACHE_STAT_INC(in_hit);
> + rcu_read_unlock();
> +
> + dst_release(skb->dst);
> + skb->dst = (struct dst_entry*)rth;
> +
> + if (sk) {
> + sock_hold(sk);
> + skb->sk = sk;
This looks racy, the socket could be closed between the lookup and
the actual use. Why do you need the socket lookup at all, can't
you just divert all packets selected by iptables?
I'm wondering if it would be possible to use normal input routing
combined with netfilter marks to do the diversion ..
> + skb->destructor = ip_divert_free_sock;
> + }
> +
> + return 0;
> + }
> + RT_CACHE_STAT_INC(in_hlist_search);
> + }
> + rcu_read_unlock();
> +
> + /* not found in cache, try to allocate a new dst entry */
> + rth = dst_alloc(&ipv4_dst_ops);
> + if (!rth)
> + return -ENOMEM;
> +
> + rth->u.dst.output= ip_rt_bug;
> +
> + atomic_set(&rth->u.dst.__refcnt, 1);
> + rth->u.dst.flags = DST_HOST | DST_DIVERTED;
> +
> + if (in->cnf.no_policy)
> + rth->u.dst.flags |= DST_NOPOLICY;
> +
> + rth->fl.fl4_dst = iph->daddr;
> + rth->rt_dst = iph->daddr;
> + rth->fl.fl4_tos = iph->tos;
> + rth->fl.mark = skb->mark;
> + rth->fl.fl4_src = iph->saddr;
> + rth->rt_src = iph->saddr;
> + rth->rt_iif =
> + rth->fl.iif = skb->dev->ifindex;
> + rth->u.dst.dev = &loopback_dev;
> + dev_hold(rth->u.dst.dev);
> + rth->idev = in_dev_get(rth->u.dst.dev);
> + rth->rt_gateway = iph->daddr;
> + rth->rt_spec_dst= iph->daddr;
> + rth->u.dst.input= ip_local_deliver;
> + rth->rt_flags = RTCF_LOCAL;
> + rth->rt_type = RTN_LOCAL;
> +
> + err = rt_intern_hash(hash, rth, &rtres);
> + if (err)
> + return err;
> +
> + dst_release(skb->dst);
> + skb->dst = (struct dst_entry *) rth;
> +
> + if (sk) {
> + sock_hold(sk);
> + skb->sk = sk;
> + skb->destructor = ip_divert_free_sock;
> + }
> +
> + return 0;
> +}
> +
> /*
> We do not cache source address of outgoing interface,
> because it is used only by IP RR, TS and SRR options,
> @@ -2104,6 +2206,7 @@ int ip_route_input(struct sk_buff *skb,
> rth->fl.fl4_src == saddr &&
> rth->fl.iif == iif &&
> rth->fl.oif == 0 &&
> + !(rth->u.dst.flags & DST_DIVERTED) &&
> rth->fl.mark == skb->mark &&
> rth->fl.fl4_tos == tos) {
> rth->u.dst.lastuse = jiffies;
> @@ -3199,3 +3302,4 @@ #endif
> EXPORT_SYMBOL(__ip_select_ident);
> EXPORT_SYMBOL(ip_route_input);
> EXPORT_SYMBOL(ip_route_output_key);
> +EXPORT_SYMBOL_GPL(ip_divert_local);
>
next prev parent reply other threads:[~2007-01-10 6:46 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-01-03 16:33 [PATCH/RFC 00/10] Transparent proxying patches version 4 KOVACS Krisztian
2007-01-03 16:34 ` [PATCH/RFC 01/10] Implement local diversion of IPv4 skbs KOVACS Krisztian
2007-01-10 6:46 ` Patrick McHardy [this message]
2007-01-10 9:31 ` Balazs Scheidler
2007-01-10 12:32 ` Patrick McHardy
2007-01-10 13:27 ` Ingo Oeser
2007-01-10 13:42 ` Patrick McHardy
2007-01-11 14:05 ` KOVACS Krisztian
2007-01-10 10:17 ` KOVACS Krisztian
2007-01-10 12:19 ` Patrick McHardy
2007-01-16 12:49 ` KOVACS Krisztian
2007-01-16 13:19 ` Patrick McHardy
2007-01-03 16:34 ` [PATCH/RFC 02/10] Port redirection support for TCP KOVACS Krisztian
2007-01-03 16:35 ` [PATCH/RFC 03/10] Don't do the TCP socket lookup if we already have one attached KOVACS Krisztian
2007-01-03 16:35 ` [PATCH/RFC 04/10] Don't do the UDP " KOVACS Krisztian
2007-01-03 16:36 ` [PATCH/RFC 05/10] Remove local address check on IP output KOVACS Krisztian
2007-01-10 6:47 ` Patrick McHardy
2007-01-10 10:01 ` KOVACS Krisztian
2007-02-06 14:36 ` IP_FREEBIND and CAP_NET_ADMIN (was: Re: [PATCH/RFC 05/10] Remove local address check on IP output) KOVACS Krisztian
2007-02-06 19:46 ` IP_FREEBIND and CAP_NET_ADMIN David Miller
2007-01-03 16:36 ` [PATCH/RFC 06/10] Create a tproxy flag in struct sk_buff KOVACS Krisztian
2007-01-03 16:37 ` [PATCH/RFC 07/10] Export UDP socket lookup function KOVACS Krisztian
2007-01-03 16:37 ` [PATCH/RFC 08/10] iptables tproxy table KOVACS Krisztian
2007-01-10 12:40 ` Patrick McHardy
2007-01-03 16:38 ` [PATCH/RFC 09/10] iptables TPROXY target KOVACS Krisztian
2007-01-10 12:45 ` Patrick McHardy
2007-01-03 16:38 ` [PATCH/RFC 10/10] iptables tproxy match KOVACS Krisztian
2007-01-03 17:23 ` [PATCH/RFC 00/10] Transparent proxying patches version 4 Evgeniy Polyakov
2007-01-08 20:30 ` KOVACS Krisztian
2007-01-03 19:33 ` Lennert Buytenhek
2007-01-04 12:13 ` KOVACS Krisztian
2007-01-04 12:16 ` Lennert Buytenhek
2007-01-07 14:11 ` Harald Welte
2007-01-07 16:11 ` Lennert Buytenhek
2007-01-07 23:58 ` Harald Welte
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=45A48BD6.5010507@trash.net \
--to=kaber@trash.net \
--cc=hidden@balabit.hu \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@lists.netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).