* LVS-NAT and source routing @ 2006-08-29 7:37 Horms 2006-08-29 9:06 ` Patrick McHardy 0 siblings, 1 reply; 9+ messages in thread From: Horms @ 2006-08-29 7:37 UTC (permalink / raw) To: netfilter-devel Cc: Ken Brownfield, Roberto Nibali, Farid Sarwari, David Miller, Julian Anastasov, David Black, Joseph Mack NA3T, Patrick McHardy Hi, sorry that this is a little off-topic, but I'm hoping for some advice in relation to a problem with LVS. When LVS-NAT is in use (basically load-balancing using DNAT) then the return packets need to honour any source routing rules on the linux-director (machine runing LVS). If you think it as if the packets originate from the linux-director then this makes sense (if you think about it other ways it doesn't, but I'm pretty convinced that this is the right way to think about it. A long time ago Ken Brownfield sent a patch that resolves this problem by using an old variant of ip_route_me_harder() in ip_vs_out(), the return patch for LVS-NATed packets. http://archive.linuxvirtualserver.org/html/lvs-users/2006-03/msg00106.html I ported this to net-2.6.19 this afternoon, and it seems to fall out to a call to ip_route_me_harder() . (Nevermind the skb = *pskb, I'd like to clean that up, but its a separate issue.) I spoke breifly with Dave Miller about whether calling ip_route_me_harder() was apprpriate here. His answer was yes, but try and call it as infrequently as possible as it is expensive. He pointed me at nf_ip_reroute() and how this is used to minimise calls to ip_route_me_harder(). However I'm not entirely sure if that techinque is applicable to LVS, as the need for ip_route_me_harder() seems to be based on the presance of applicable source routing rules and nothing else. So here I am. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c index 3f47ad8..4c05182 100644 --- a/net/ipv4/ipvs/ip_vs_core.c +++ b/net/ipv4/ipvs/ip_vs_core.c @@ -813,6 +813,16 @@ ip_vs_out(unsigned int hooknum, struct s skb->nh.iph->saddr = cp->vaddr; ip_send_check(skb->nh.iph); + /* For policy routing, packets originating from this + * machine itself may be routed differently to packets + * passing through. We want this packet to be routed as + * if it came from this machine itself. So re-compute + * the routing information. + */ + if (ip_route_me_harder(pskb) != 0) + goto drop; + skb = *pskb; + IP_VS_DBG_PKT(10, pp, skb, 0, "After SNAT"); ip_vs_out_stats(cp, skb); ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-08-29 7:37 LVS-NAT and source routing Horms @ 2006-08-29 9:06 ` Patrick McHardy 2006-08-29 9:31 ` David Miller ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Patrick McHardy @ 2006-08-29 9:06 UTC (permalink / raw) To: Horms Cc: Ken Brownfield, Roberto Nibali, netfilter-devel, Farid Sarwari, Julian Anastasov, David Black, Joseph Mack NA3T, David Miller Horms wrote: > Hi, > > sorry that this is a little off-topic, but I'm hoping for some > advice in relation to a problem with LVS. > > When LVS-NAT is in use (basically load-balancing using DNAT) > then the return packets need to honour any source routing rules > on the linux-director (machine runing LVS). If you think it as > if the packets originate from the linux-director then this makes > sense (if you think about it other ways it doesn't, but I'm pretty > convinced that this is the right way to think about it. > > A long time ago Ken Brownfield sent a patch that resolves this problem > by using an old variant of ip_route_me_harder() in ip_vs_out(), > the return patch for LVS-NATed packets. > > http://archive.linuxvirtualserver.org/html/lvs-users/2006-03/msg00106.html > > I ported this to net-2.6.19 this afternoon, and it seems to > fall out to a call to ip_route_me_harder() . (Nevermind the skb = *pskb, > I'd like to clean that up, but its a separate issue.) > > I spoke breifly with Dave Miller about whether calling > ip_route_me_harder() was apprpriate here. His answer was yes, but try > and call it as infrequently as possible as it is expensive. He pointed > me at nf_ip_reroute() and how this is used to minimise calls to > ip_route_me_harder(). However I'm not entirely sure if that techinque is > applicable to LVS, as the need for ip_route_me_harder() seems to be > based on the presance of applicable source routing rules and nothing > else. So here I am. > > + /* For policy routing, packets originating from this > + * machine itself may be routed differently to packets > + * passing through. We want this packet to be routed as > + * if it came from this machine itself. So re-compute > + * the routing information. ip_route_me_harder is meant for the opposite case, rerouting locally originating packets as if they were forwarded (if the source is non-local). For your case just calling ip_route_output_key should be faster since it saves the inet_addr_type call. I think nf_ip_reroute doesn't help much since you always seem to change the source address, but you could make the whole thing depend on CONFIG_IP_MULTIPLE_TABLES. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-08-29 9:06 ` Patrick McHardy @ 2006-08-29 9:31 ` David Miller 2006-08-29 12:52 ` Patrick McHardy 2006-08-29 9:40 ` Horms 2006-09-04 3:37 ` Horms 2 siblings, 1 reply; 9+ messages in thread From: David Miller @ 2006-08-29 9:31 UTC (permalink / raw) To: kaber; +Cc: krb, ratz, netfilter-devel, dave, ja, horms, jmack, fsarwari From: Patrick McHardy <kaber@trash.net> Date: Tue, 29 Aug 2006 11:06:37 +0200 > but you could make the whole thing depend on CONFIG_IP_MULTIPLE_TABLES. IPSEC can make the saddr changes matter too. BTW it shows a technical issue with nf_ip_reroute(), since it only checks for changes to saddr/daddr/tos when even things like port changes can make IPSEC generate a different route. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-08-29 9:31 ` David Miller @ 2006-08-29 12:52 ` Patrick McHardy 0 siblings, 0 replies; 9+ messages in thread From: Patrick McHardy @ 2006-08-29 12:52 UTC (permalink / raw) To: David Miller; +Cc: krb, ratz, netfilter-devel, dave, ja, horms, jmack, fsarwari David Miller wrote: > From: Patrick McHardy <kaber@trash.net> > Date: Tue, 29 Aug 2006 11:06:37 +0200 > > >>but you could make the whole thing depend on CONFIG_IP_MULTIPLE_TABLES. > > > IPSEC can make the saddr changes matter too. BTW it shows a technical > issue with nf_ip_reroute(), since it only checks for changes to > saddr/daddr/tos when even things like port changes can make IPSEC > generate a different route. Right. It also ignores nfmark changes, which is not valid anymore since nfnetlink_queue allows to change these. I'm going to fix it up and send the patch with my next batch of netfilter patches. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-08-29 9:06 ` Patrick McHardy 2006-08-29 9:31 ` David Miller @ 2006-08-29 9:40 ` Horms 2006-09-04 3:37 ` Horms 2 siblings, 0 replies; 9+ messages in thread From: Horms @ 2006-08-29 9:40 UTC (permalink / raw) To: Patrick McHardy Cc: Ken Brownfield, Roberto Nibali, netfilter-devel, Farid Sarwari, Julian Anastasov, David Black, Joseph Mack NA3T, David Miller On Tue, Aug 29, 2006 at 11:06:37AM +0200, Patrick McHardy wrote: > Horms wrote: > > Hi, > > > > sorry that this is a little off-topic, but I'm hoping for some > > advice in relation to a problem with LVS. > > > > When LVS-NAT is in use (basically load-balancing using DNAT) > > then the return packets need to honour any source routing rules > > on the linux-director (machine runing LVS). If you think it as > > if the packets originate from the linux-director then this makes > > sense (if you think about it other ways it doesn't, but I'm pretty > > convinced that this is the right way to think about it. > > > > A long time ago Ken Brownfield sent a patch that resolves this problem > > by using an old variant of ip_route_me_harder() in ip_vs_out(), > > the return patch for LVS-NATed packets. > > > > http://archive.linuxvirtualserver.org/html/lvs-users/2006-03/msg00106.html > > > > I ported this to net-2.6.19 this afternoon, and it seems to > > fall out to a call to ip_route_me_harder() . (Nevermind the skb = *pskb, > > I'd like to clean that up, but its a separate issue.) > > > > I spoke breifly with Dave Miller about whether calling > > ip_route_me_harder() was apprpriate here. His answer was yes, but try > > and call it as infrequently as possible as it is expensive. He pointed > > me at nf_ip_reroute() and how this is used to minimise calls to > > ip_route_me_harder(). However I'm not entirely sure if that techinque is > > applicable to LVS, as the need for ip_route_me_harder() seems to be > > based on the presance of applicable source routing rules and nothing > > else. So here I am. > > > > > + /* For policy routing, packets originating from this > > + * machine itself may be routed differently to packets > > + * passing through. We want this packet to be routed as > > + * if it came from this machine itself. So re-compute > > + * the routing information. > > > ip_route_me_harder is meant for the opposite case, rerouting locally > originating packets as if they were forwarded (if the source is > non-local). For your case just calling ip_route_output_key should be > faster since it saves the inet_addr_type call. I think nf_ip_reroute > doesn't help much since you always seem to change the source address, > but you could make the whole thing depend on CONFIG_IP_MULTIPLE_TABLES. Thanks. I think that your suggestion is more or less what the original patch did, I misread ip_route_me_harder() as a replacement for this. I'll up-port that code, and see how things go. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-08-29 9:06 ` Patrick McHardy 2006-08-29 9:31 ` David Miller 2006-08-29 9:40 ` Horms @ 2006-09-04 3:37 ` Horms 2006-09-10 9:54 ` Patrick McHardy 2 siblings, 1 reply; 9+ messages in thread From: Horms @ 2006-09-04 3:37 UTC (permalink / raw) To: Patrick McHardy Cc: Ken Brownfield, Roberto Nibali, netfilter-devel, Farid Sarwari, Julian Anastasov, David Black, Joseph Mack NA3T, David Miller On Tue, Aug 29, 2006 at 11:06:37AM +0200, Patrick McHardy wrote: > Horms wrote: > > Hi, > > > > sorry that this is a little off-topic, but I'm hoping for some > > advice in relation to a problem with LVS. > > > > When LVS-NAT is in use (basically load-balancing using DNAT) > > then the return packets need to honour any source routing rules > > on the linux-director (machine runing LVS). If you think it as > > if the packets originate from the linux-director then this makes > > sense (if you think about it other ways it doesn't, but I'm pretty > > convinced that this is the right way to think about it. > > > > A long time ago Ken Brownfield sent a patch that resolves this problem > > by using an old variant of ip_route_me_harder() in ip_vs_out(), > > the return patch for LVS-NATed packets. > > > > http://archive.linuxvirtualserver.org/html/lvs-users/2006-03/msg00106.html > > > > I ported this to net-2.6.19 this afternoon, and it seems to > > fall out to a call to ip_route_me_harder() . (Nevermind the skb = *pskb, > > I'd like to clean that up, but its a separate issue.) > > > > I spoke breifly with Dave Miller about whether calling > > ip_route_me_harder() was apprpriate here. His answer was yes, but try > > and call it as infrequently as possible as it is expensive. He pointed > > me at nf_ip_reroute() and how this is used to minimise calls to > > ip_route_me_harder(). However I'm not entirely sure if that techinque is > > applicable to LVS, as the need for ip_route_me_harder() seems to be > > based on the presance of applicable source routing rules and nothing > > else. So here I am. > > > > > + /* For policy routing, packets originating from this > > + * machine itself may be routed differently to packets > > + * passing through. We want this packet to be routed as > > + * if it came from this machine itself. So re-compute > > + * the routing information. > > > ip_route_me_harder is meant for the opposite case, rerouting locally > originating packets as if they were forwarded (if the source is > non-local). For your case just calling ip_route_output_key should be > faster since it saves the inet_addr_type call. I think nf_ip_reroute > doesn't help much since you always seem to change the source address, > but you could make the whole thing depend on CONFIG_IP_MULTIPLE_TABLES. Hi, I took a look into this. It seems that the real key is to avoid uneccesary calls to inet_addr_type(). But it seems that the rest of ip_route_me_harder() really is needed for ip_vs. If that isn't correct, please set me straight. But if it is correct, it really does mean a fair ammount of duplicated code going into ip_vs_core.c. I wonder if a better option would be to allow the addr_type to be passed to ip_route_me_harder(). I have a patch below which expresses this idea. It has the nice advantage of offering the scope for other callers to supply the addr_type if it is known, though I am not sure that this can be the case. An alternate idea, which would offer the current API to current callers is to move most of the logic of ip_route_me_harder() into a variant which accepts addr_type, and simply have ip_route_me_harder() calculate addr_type and pass it and the **pskb onto that function. I'm happy to come up with a patch that expresses that idea, I find it hard to exrpess code in words. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ diff --git a/include/linux/netfilter_ipv4.h b/include/linux/netfilter_ipv4.h index ce02c98..5b63a23 100644 --- a/include/linux/netfilter_ipv4.h +++ b/include/linux/netfilter_ipv4.h @@ -77,7 +77,7 @@ enum nf_ip_hook_priorities { #define SO_ORIGINAL_DST 80 #ifdef __KERNEL__ -extern int ip_route_me_harder(struct sk_buff **pskb); +extern int ip_route_me_harder(struct sk_buff **pskb, unsigned addr_type); extern int ip_xfrm_me_harder(struct sk_buff **pskb); extern unsigned int nf_ip_checksum(struct sk_buff *skb, unsigned int hook, unsigned int dataoff, u_int8_t protocol); diff --git a/net/ipv4/ipvs/ip_vs_core.c b/net/ipv4/ipvs/ip_vs_core.c index 3f47ad8..1b5701e 100644 --- a/net/ipv4/ipvs/ip_vs_core.c +++ b/net/ipv4/ipvs/ip_vs_core.c @@ -813,6 +813,16 @@ ip_vs_out(unsigned int hooknum, struct s skb->nh.iph->saddr = cp->vaddr; ip_send_check(skb->nh.iph); + /* For policy routing, packets originating from this + * machine itself may be routed differently to packets + * passing through. We want this packet to be routed as + * if it came from this machine itself. So re-compute + * the routing information. + */ + if (ip_route_me_harder(pskb, RTN_LOCAL) != 0) + goto drop; + skb = *pskb; + IP_VS_DBG_PKT(10, pp, skb, 0, "After SNAT"); ip_vs_out_stats(cp, skb); diff --git a/net/ipv4/netfilter.c b/net/ipv4/netfilter.c index f88347d..3b66f87 100644 --- a/net/ipv4/netfilter.c +++ b/net/ipv4/netfilter.c @@ -8,7 +8,7 @@ #include <net/xfrm.h> #include <net/ip.h> /* route_me_harder function, used by iptable_nat, iptable_mangle + ip_queue */ -int ip_route_me_harder(struct sk_buff **pskb) +int ip_route_me_harder(struct sk_buff **pskb, unsigned addr_type) { struct iphdr *iph = (*pskb)->nh.iph; struct rtable *rt; @@ -16,10 +16,13 @@ int ip_route_me_harder(struct sk_buff ** struct dst_entry *odst; unsigned int hh_len; + if (addr_type == RTN_UNSPEC) + addr_type = inet_addr_type(iph->saddr); + /* some non-standard hacks like ipt_REJECT.c:send_reset() can cause * packets with foreign saddr to appear on the NF_IP_LOCAL_OUT hook. */ - if (inet_addr_type(iph->saddr) == RTN_LOCAL) { + if (addr_type == RTN_LOCAL) { fl.nl_u.ip4_u.daddr = iph->daddr; fl.nl_u.ip4_u.saddr = iph->saddr; fl.nl_u.ip4_u.tos = RT_TOS(iph->tos); @@ -156,7 +159,7 @@ static int nf_ip_reroute(struct sk_buff if (!(iph->tos == rt_info->tos && iph->daddr == rt_info->daddr && iph->saddr == rt_info->saddr)) - return ip_route_me_harder(pskb); + return ip_route_me_harder(pskb, RTN_UNSPEC); } return 0; } diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c index f3b7783..7f5ceac 100644 --- a/net/ipv4/netfilter/ip_nat_standalone.c +++ b/net/ipv4/netfilter/ip_nat_standalone.c @@ -269,7 +269,8 @@ #ifdef CONFIG_XFRM ct->tuplehash[!dir].tuple.src.u.all #endif ) - return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP; + return ip_route_me_harder(pskb, RTN_UNSPEC) == 0 ? + ret : NF_DROP; } return ret; } diff --git a/net/ipv4/netfilter/iptable_mangle.c b/net/ipv4/netfilter/iptable_mangle.c index 79336cb..62da663 100644 --- a/net/ipv4/netfilter/iptable_mangle.c +++ b/net/ipv4/netfilter/iptable_mangle.c @@ -157,7 +157,8 @@ #ifdef CONFIG_IP_ROUTE_FWMARK || (*pskb)->nfmark != nfmark #endif || (*pskb)->nh.iph->tos != tos)) - return ip_route_me_harder(pskb) == 0 ? ret : NF_DROP; + return ip_route_me_harder(pskb, RTN_UNSPEC) == 0 ? + ret : NF_DROP; return ret; } ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-09-04 3:37 ` Horms @ 2006-09-10 9:54 ` Patrick McHardy 2006-09-10 13:48 ` Horms 0 siblings, 1 reply; 9+ messages in thread From: Patrick McHardy @ 2006-09-10 9:54 UTC (permalink / raw) To: Horms Cc: Ken Brownfield, Roberto Nibali, netfilter-devel, Farid Sarwari, Julian Anastasov, David Black, Joseph Mack NA3T, David Miller Horms wrote: > On Tue, Aug 29, 2006 at 11:06:37AM +0200, Patrick McHardy wrote: > >>ip_route_me_harder is meant for the opposite case, rerouting locally >>originating packets as if they were forwarded (if the source is >>non-local). For your case just calling ip_route_output_key should be >>faster since it saves the inet_addr_type call. I think nf_ip_reroute >>doesn't help much since you always seem to change the source address, >>but you could make the whole thing depend on CONFIG_IP_MULTIPLE_TABLES. > > > I took a look into this. It seems that the real key is to avoid > uneccesary calls to inet_addr_type(). But it seems that the rest > of ip_route_me_harder() really is needed for ip_vs. If that isn't > correct, please set me straight. > > But if it is correct, it really does mean a fair ammount of duplicated > code going into ip_vs_core.c. I wonder if a better option would be > to allow the addr_type to be passed to ip_route_me_harder(). I have > a patch below which expresses this idea. It has the nice advantage > of offering the scope for other callers to supply the addr_type if it > is known, though I am not sure that this can be the case. Usually not, but your patch looks fine anyway. We might even be able to remove the largely duplicated route_reverse() in ipt_REJECT if we use LL_MAX_HEADER instead of LL_RESERVED_SPACE for the RST packet (since we would need to route after allocating the packet and reversing the addresses). ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-09-10 9:54 ` Patrick McHardy @ 2006-09-10 13:48 ` Horms 2006-09-15 4:34 ` Patrick McHardy 0 siblings, 1 reply; 9+ messages in thread From: Horms @ 2006-09-10 13:48 UTC (permalink / raw) To: Patrick McHardy Cc: Ken Brownfield, Roberto Nibali, netfilter-devel, Farid Sarwari, Julian Anastasov, David Black, Joseph Mack NA3T, David Miller On Sun, Sep 10, 2006 at 11:54:24AM +0200, Patrick McHardy wrote: > Horms wrote: > > On Tue, Aug 29, 2006 at 11:06:37AM +0200, Patrick McHardy wrote: > > > >>ip_route_me_harder is meant for the opposite case, rerouting locally > >>originating packets as if they were forwarded (if the source is > >>non-local). For your case just calling ip_route_output_key should be > >>faster since it saves the inet_addr_type call. I think nf_ip_reroute > >>doesn't help much since you always seem to change the source address, > >>but you could make the whole thing depend on CONFIG_IP_MULTIPLE_TABLES. > > > > > > I took a look into this. It seems that the real key is to avoid > > uneccesary calls to inet_addr_type(). But it seems that the rest > > of ip_route_me_harder() really is needed for ip_vs. If that isn't > > correct, please set me straight. > > > > But if it is correct, it really does mean a fair ammount of duplicated > > code going into ip_vs_core.c. I wonder if a better option would be > > to allow the addr_type to be passed to ip_route_me_harder(). I have > > a patch below which expresses this idea. It has the nice advantage > > of offering the scope for other callers to supply the addr_type if it > > is known, though I am not sure that this can be the case. > > > Usually not, but your patch looks fine anyway. Excellent. Do you want me to split the LVS and non-LVS portions into separate patches? > We might even be able to remove the largely duplicated route_reverse() > in ipt_REJECT if we use LL_MAX_HEADER instead of LL_RESERVED_SPACE for > the RST packet (since we would need to route after allocating the > packet and reversing the addresses). That does seem like a good idea. I'll try and find a moment to see how a patch pans out. -- Horms H: http://www.vergenet.net/~horms/ W: http://www.valinux.co.jp/en/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: LVS-NAT and source routing 2006-09-10 13:48 ` Horms @ 2006-09-15 4:34 ` Patrick McHardy 0 siblings, 0 replies; 9+ messages in thread From: Patrick McHardy @ 2006-09-15 4:34 UTC (permalink / raw) To: Horms Cc: Ken Brownfield, Roberto Nibali, netfilter-devel, Farid Sarwari, Julian Anastasov, David Black, Joseph Mack NA3T, David Miller Horms wrote: > On Sun, Sep 10, 2006 at 11:54:24AM +0200, Patrick McHardy wrote: > >>Usually not, but your patch looks fine anyway. > > > Excellent. Do you want me to split the LVS and non-LVS portions > into separate patches? I'm fine either way. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2006-09-15 4:34 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-29 7:37 LVS-NAT and source routing Horms 2006-08-29 9:06 ` Patrick McHardy 2006-08-29 9:31 ` David Miller 2006-08-29 12:52 ` Patrick McHardy 2006-08-29 9:40 ` Horms 2006-09-04 3:37 ` Horms 2006-09-10 9:54 ` Patrick McHardy 2006-09-10 13:48 ` Horms 2006-09-15 4:34 ` Patrick McHardy
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.