From: juliusv@google.com (Julius Volz)
To: netfilter-devel@vger.kernel.org
Cc: vbusam@google.com, jengelh@medozas.de, horms@verge.net.au,
lvs-devel@vger.kernel.org, ja@ssi.bg
Subject: Making conntrack like packets DNATed by IPVS
Date: Tue, 30 Sep 2008 14:21:59 +0200 [thread overview]
Message-ID: <20080930122158.GA31253@google.com> (raw)
Hi,
I'm still stuck trying to get IPVS/NAT to work together with Netfilter
conntrack/Netfilter SNAT. First, I removed the Netfilter hook function
in IPVS that prevented further processing in POSTROUTING. Then, I made
IPVS reflect its own DNAT changes in the skb->nfct tuples just before
IPVS injects the packet back into LOCAL_OUT:
======================
diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c
index 958abf3..96d24b5 100644
--- a/net/ipv4/ipvs/ip_vs_core.c
+++ b/net/ipv4/ipvs/ip_vs_core.c
@@ -1429,13 +1429,13 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
.priority = 99,
},
/* Before the netfilter connection tracking, exit from POST_ROUTING */
- {
+ /*{
.hook = ip_vs_post_routing,
.owner = THIS_MODULE,
.pf = PF_INET,
.hooknum = NF_INET_POST_ROUTING,
.priority = NF_IP_PRI_NAT_SRC-1,
- },
+ },*/
#ifdef CONFIG_IP_VS_IPV6
/* After packet filtering, forward packet through VS/DR, VS/TUN,
* or VS/NAT(change destination), so that filtering rules can be
diff --git a/net/ipv4/ipvs/ip_vs_xmit.c b/net/ipv4/ipvs/ip_vs_xmit.c
index 02ddc2b..de7feb5 100644
--- a/net/ipv4/ipvs/ip_vs_xmit.c
+++ b/net/ipv4/ipvs/ip_vs_xmit.c
@@ -24,6 +24,7 @@
#include <net/ip6_route.h>
#include <linux/icmpv6.h>
#include <linux/netfilter.h>
+#include <net/netfilter/nf_conntrack.h>
#include <linux/netfilter_ipv4.h>
#include <net/ip_vs.h>
@@ -360,6 +361,21 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
EnterFunction(10);
+ if (skb->nfct) {
+ struct nf_conn *ct = (struct nf_conn*)skb->nfct;
+
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.ip = cp->daddr.ip;
+ ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u.tcp.port = cp->dport;
+
+ ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.ip = cp->daddr.ip;
+ ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port = cp->dport;
+
+ /* Netfilter SNAT was already marked done in LOCAL_IN, but
+ * somehow, the packet still contains the original source IP,
+ * so we want it to be done again in POSTROUTING */
+ clear_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
+ }
+
/* check if it is a connection of no-client-port */
if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT)) {
__be16 _pt, *p;
======================
The Netfilter SNAT rule is simply:
$ iptables -t -nat -A POSTROUTING -o eth1 -j SNAT -to <director IP>
The SYN and SYN/ACK packets of a new connection get handled correctly by
IPVS and even get SNATed correctly. The ACK to the SYN/ACK still gets
handled correctly by IPVS but is NF_DROPed in POSTROUTING in
__nf_conntrack_confirm() as a result of a check finding the associated
conntrack tuple already in the nf_conntrack_hash (meaning, the
connection has already been confirmed). If I understand it correctly, we
shouldn't be entering that function for the ACK packet anyways, so
I'm doing something very wrong...
A packet trace on the director looks like this:
CIP: client IP
VIP: virtual service IP
DIP: director (load balancer) IP
RIP: real server (backend) IP
11:28:51.431221 IP <CIP>.49988 > <VIP>.80: S 1151908514:1151908514(0) win 5840 <mss 1460,sackOK,timestamp 74963354 0,nop,wscale 7>
11:28:51.432294 IP <DIP>.49988 > <RIP>.80: S 1151908514:1151908514(0) win 5840 <mss 1460,sackOK,timestamp 74963354 0,nop,wscale 7>
11:28:51.432822 IP <RIP>.80 > <DIP>.49988: S 1508888076:1508888076(0) ack 1151908515 win 5792 <sackOK,timestamp 7468557 74963354,mss 1460,nop,wscale 4>
11:28:51.434159 IP <VIP>.80 > <CIP>.49988: S 1508888076:1508888076(0) ack 1151908515 win 5792 <sackOK,timestamp 7468557 74963354,mss 1460,nop,wscale 4>
11:28:51.434253 IP <CIP>.49988 > <VIP>.80: . ack 1 win 46 <nop,nop,timestamp 74963362 7468557>
(the above packet is dropped in POSTROUTING...)
11:28:52.029604 IP <CIP>.49988 > <VIP>.80: P 1:3(2) ack 1 win 46 <nop,nop,timestamp 74963957 7468557>
11:28:52.237975 IP <CIP>.49988 > <VIP>.80: P 1:3(2) ack 1 win 46 <nop,nop,timestamp 74964165 7468557>
...
The various places in Netfilter at which tuples are created, modified,
checked, inserted, etc. are kind of confusing to me and I'm missing the
necessary Netfilter internals knowledge to understand and handle this
correctly. I'd be glad if someone could give me a pointer into the right
direction or help out in any other way!
Thanks,
Julius
--
Julius Volz - Corporate Operations - SysOps
Google Switzerland GmbH - Identification No.: CH-020.4.028.116-1
reply other threads:[~2008-09-30 12:22 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080930122158.GA31253@google.com \
--to=juliusv@google.com \
--cc=horms@verge.net.au \
--cc=ja@ssi.bg \
--cc=jengelh@medozas.de \
--cc=lvs-devel@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=vbusam@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox