* [PATCH 0/3] IPVS full NAT support + netfilter 'ipvs' match support
@ 2009-09-02 14:38 ` Hannes Eder
0 siblings, 0 replies; 18+ messages in thread
From: Hannes Eder @ 2009-09-02 14:38 UTC (permalink / raw)
To: lvs-devel
Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne,
Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov,
Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman,
Wensong Zhang
The following series implements full NAT support for IPVS. The
approach is via a minimal change to IPVS (make friends with
nf_conntrack) and adding a netfilter matcher, kernel- and user-space
part, i.e. xt_ipvs and libxt_ipvs.
Example usage:
% ipvsadm -A -t 192.168.100.30:8080 -s rr
% ipvsadm -a -t 192.168.100.30:8080 -r 192.168.10.20:8080 -m
# ...
# Source NAT for VIP 192.168.100.30:8080
% iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \
> -j SNAT --to-source 192.168.10.10
Changes to the linux kernel (rebased to next-20090831):
Hannes Eder (2):
netfilter: xt_ipvs (netfilter matcher for IPVS)
IPVS: make friends with nf_conntrack
include/linux/netfilter/xt_ipvs.h | 23 +++++
net/netfilter/Kconfig | 9 ++
net/netfilter/Makefile | 1
net/netfilter/ipvs/Kconfig | 2
net/netfilter/ipvs/ip_vs_core.c | 36 -------
net/netfilter/ipvs/ip_vs_proto.c | 1
net/netfilter/ipvs/ip_vs_xmit.c | 27 +++++
net/netfilter/xt_ipvs.c | 183 +++++++++++++++++++++++++++++++++++++
8 files changed, 245 insertions(+), 37 deletions(-)
create mode 100644 include/linux/netfilter/xt_ipvs.h
create mode 100644 net/netfilter/xt_ipvs.c
Changs to iptables:
Hannes Eder (1):
libxt_ipvs: user space lib for netfilter matcher xt_ipvs
configure.ac | 11 +
extensions/libxt_ipvs.c | 349 +++++++++++++++++++++++++++++++++++++
extensions/libxt_ipvs.man | 21 ++
include/linux/netfilter/xt_ipvs.h | 23 ++
4 files changed, 401 insertions(+), 3 deletions(-)
create mode 100644 extensions/libxt_ipvs.c
create mode 100644 extensions/libxt_ipvs.man
create mode 100644 include/linux/netfilter/xt_ipvs.h
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 0/3] IPVS full NAT support + netfilter 'ipvs' match support @ 2009-09-02 14:38 ` Hannes Eder 0 siblings, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 14:38 UTC (permalink / raw) To: lvs-devel Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang The following series implements full NAT support for IPVS. The approach is via a minimal change to IPVS (make friends with nf_conntrack) and adding a netfilter matcher, kernel- and user-space part, i.e. xt_ipvs and libxt_ipvs. Example usage: % ipvsadm -A -t 192.168.100.30:8080 -s rr % ipvsadm -a -t 192.168.100.30:8080 -r 192.168.10.20:8080 -m # ... # Source NAT for VIP 192.168.100.30:8080 % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \ > -j SNAT --to-source 192.168.10.10 Changes to the linux kernel (rebased to next-20090831): Hannes Eder (2): netfilter: xt_ipvs (netfilter matcher for IPVS) IPVS: make friends with nf_conntrack include/linux/netfilter/xt_ipvs.h | 23 +++++ net/netfilter/Kconfig | 9 ++ net/netfilter/Makefile | 1 net/netfilter/ipvs/Kconfig | 2 net/netfilter/ipvs/ip_vs_core.c | 36 ------- net/netfilter/ipvs/ip_vs_proto.c | 1 net/netfilter/ipvs/ip_vs_xmit.c | 27 +++++ net/netfilter/xt_ipvs.c | 183 +++++++++++++++++++++++++++++++++++++ 8 files changed, 245 insertions(+), 37 deletions(-) create mode 100644 include/linux/netfilter/xt_ipvs.h create mode 100644 net/netfilter/xt_ipvs.c Changs to iptables: Hannes Eder (1): libxt_ipvs: user space lib for netfilter matcher xt_ipvs configure.ac | 11 + extensions/libxt_ipvs.c | 349 +++++++++++++++++++++++++++++++++++++ extensions/libxt_ipvs.man | 21 ++ include/linux/netfilter/xt_ipvs.h | 23 ++ 4 files changed, 401 insertions(+), 3 deletions(-) create mode 100644 extensions/libxt_ipvs.c create mode 100644 extensions/libxt_ipvs.man create mode 100644 include/linux/netfilter/xt_ipvs.h ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) 2009-09-02 14:38 ` Hannes Eder @ 2009-09-02 14:39 ` Hannes Eder -1 siblings, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 14:39 UTC (permalink / raw) To: lvs-devel Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang This implements the kernel-space side of the netfilter matcher xt_ipvs. Signed-off-by: Hannes Eder <heder@google.com> --- include/linux/netfilter/xt_ipvs.h | 23 +++++ net/netfilter/Kconfig | 9 ++ net/netfilter/Makefile | 1 net/netfilter/ipvs/ip_vs_proto.c | 1 net/netfilter/xt_ipvs.c | 183 +++++++++++++++++++++++++++++++++++++ 5 files changed, 217 insertions(+), 0 deletions(-) create mode 100644 include/linux/netfilter/xt_ipvs.h create mode 100644 net/netfilter/xt_ipvs.c diff --git a/include/linux/netfilter/xt_ipvs.h b/include/linux/netfilter/xt_ipvs.h new file mode 100644 index 0000000..eb09759 --- /dev/null +++ b/include/linux/netfilter/xt_ipvs.h @@ -0,0 +1,23 @@ +#ifndef _XT_IPVS_H +#define _XT_IPVS_H 1 + +#define XT_IPVS_IPVS_PROPERTY 0x01 /* this is implied by all other options */ +#define XT_IPVS_PROTO 0x02 +#define XT_IPVS_VADDR 0x04 +#define XT_IPVS_VPORT 0x08 +#define XT_IPVS_DIR 0x10 +#define XT_IPVS_METHOD 0x20 +#define XT_IPVS_MASK (0x40 - 1) +#define XT_IPVS_ONCE_MASK (XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY) + +struct xt_ipvs { + union nf_inet_addr vaddr, vmask; + __be16 vport; + __u16 l4proto; + __u16 fwd_method; + + __u8 invert; + __u8 bitmask; +}; + +#endif /* _XT_IPVS_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 634d14a..fc35bd6 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -678,6 +678,15 @@ config NETFILTER_XT_MATCH_IPRANGE If unsure, say M. +config NETFILTER_XT_MATCH_IPVS + tristate '"ipvs" match support' + depends on IP_VS + depends on NETFILTER_ADVANCED + help + This option allows you to match against IPVS properties of a packet. + + If unsure, say N. + config NETFILTER_XT_MATCH_LENGTH tristate '"length" match support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 49f62ee..ff95372 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -72,6 +72,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_HASHLIMIT) += xt_hashlimit.o obj-$(CONFIG_NETFILTER_XT_MATCH_HELPER) += xt_helper.o obj-$(CONFIG_NETFILTER_XT_MATCH_HL) += xt_hl.o obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o +obj-$(CONFIG_NETFILTER_XT_MATCH_IPVS) += xt_ipvs.o obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c index 3e76716..db083c3 100644 --- a/net/netfilter/ipvs/ip_vs_proto.c +++ b/net/netfilter/ipvs/ip_vs_proto.c @@ -97,6 +97,7 @@ struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto) return NULL; } +EXPORT_SYMBOL(ip_vs_proto_get); /* diff --git a/net/netfilter/xt_ipvs.c b/net/netfilter/xt_ipvs.c new file mode 100644 index 0000000..579b053 --- /dev/null +++ b/net/netfilter/xt_ipvs.c @@ -0,0 +1,183 @@ +/* + * xt_ipvs - kernel module to match IPVS connection properties + * + * Author: Hannes Eder <heder@google.com> + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/moduleparam.h> +#include <linux/spinlock.h> +#include <linux/skbuff.h> +#ifdef CONFIG_IP_VS_IPV6 +#include <net/ipv6.h> +#endif +#include <linux/ip_vs.h> +#include <linux/types.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter/xt_ipvs.h> +#include <net/netfilter/nf_conntrack.h> + +#include <net/ip_vs.h> + +MODULE_AUTHOR("Hannes Eder <heder@google.com>"); +MODULE_DESCRIPTION("Xtables: match IPVS connection properties"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_ipvs"); +MODULE_ALIAS("ip6t_ipvs"); + +/* borrowed from xt_conntrack */ +static bool ipvs_mt_addrcmp(const union nf_inet_addr *kaddr, + const union nf_inet_addr *uaddr, + const union nf_inet_addr *umask, + unsigned int l3proto) +{ + if (l3proto == NFPROTO_IPV4) + return ((kaddr->ip ^ uaddr->ip) & umask->ip) == 0; +#ifdef CONFIG_IP_VS_IPV6 + else if (l3proto == NFPROTO_IPV6) + return ipv6_masked_addr_cmp(&kaddr->in6, &umask->in6, + &uaddr->in6) == 0; +#endif + else + return false; +} + +bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par) +{ + const struct xt_ipvs *data = par->matchinfo; + const u_int8_t family = par->family; + struct ip_vs_iphdr iph; + struct ip_vs_protocol *pp; + struct ip_vs_conn *cp; + int af; + bool match = true; + + if (data->bitmask == XT_IPVS_IPVS_PROPERTY) { + match = skb->ipvs_property ^ + !!(data->invert & XT_IPVS_IPVS_PROPERTY); + goto out; + } + + /* other flags than XT_IPVS_IPVS_PROPERTY are set */ + if (!skb->ipvs_property) { + match = false; + goto out; + } + + switch (skb->protocol) { + case htons(ETH_P_IP): + af = AF_INET; + break; +#ifdef CONFIG_IP_VS_IPV6 + case htons(ETH_P_IPV6): + af = AF_INET6; + break; +#endif + default: + match = false; + goto out; + } + + ip_vs_fill_iphdr(af, skb_network_header(skb), &iph); + + if (data->bitmask & XT_IPVS_PROTO) + if ((iph.protocol == data->l4proto) ^ + !(data->invert & XT_IPVS_PROTO)) { + match = false; + goto out; + } + + pp = ip_vs_proto_get(iph.protocol); + if (unlikely(!pp)) { + match = false; + goto out; + } + + /* + * Check if the packet belongs to an existing entry + */ + cp = pp->conn_out_get(af, skb, pp, &iph, iph.len, 1 /* inverse */); + if (unlikely(cp == NULL)) { + match = false; + goto out; + } + + /* + * We found a connection, i.e. ct != 0, make sure to call + * __ip_vs_conn_put before returning. In our case jump to out_put_con. + */ + + if (data->bitmask & XT_IPVS_VPORT) + if ((cp->vport == data->vport) ^ + !(data->invert & XT_IPVS_VPORT)) { + match = false; + goto out_put_cp; + } + + if (data->bitmask & XT_IPVS_DIR) { + enum ip_conntrack_info ctinfo; + struct nf_conn *ct = nf_ct_get(skb, &ctinfo); + + if (ct == NULL || ct == &nf_conntrack_untracked) { + match = false; + goto out_put_cp; + } + + if ((ctinfo >= IP_CT_IS_REPLY) ^ + !!(data->invert & XT_IPVS_DIR)) { + match = false; + goto out_put_cp; + } + } + + if (data->bitmask & XT_IPVS_METHOD) + if (((cp->flags & IP_VS_CONN_F_FWD_MASK) == data->fwd_method) ^ + !(data->invert & XT_IPVS_METHOD)) { + match = false; + goto out_put_cp; + } + + if (data->bitmask & XT_IPVS_VADDR) { + if (af != family) { + match = false; + goto out_put_cp; + } + + if (ipvs_mt_addrcmp(&cp->vaddr, &data->vaddr, + &data->vmask, af) ^ + !(data->invert & XT_IPVS_VADDR)) { + match = false; + goto out_put_cp; + } + } + +out_put_cp: + __ip_vs_conn_put(cp); +out: + pr_debug("match=%d\n", match); + return match; +} + +static struct xt_match xt_ipvs_mt_reg __read_mostly = { + .name = "ipvs", + .revision = 0, + .family = NFPROTO_UNSPEC, + .match = ipvs_mt, + .matchsize = sizeof(struct xt_ipvs), + .me = THIS_MODULE, +}; + +static int __init ipvs_mt_init(void) +{ + return xt_register_match(&xt_ipvs_mt_reg); +} + +static void __exit ipvs_mt_exit(void) +{ + xt_unregister_match(&xt_ipvs_mt_reg); +} + +module_init(ipvs_mt_init); +module_exit(ipvs_mt_exit); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) @ 2009-09-02 14:39 ` Hannes Eder 0 siblings, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 14:39 UTC (permalink / raw) To: lvs-devel Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang This implements the kernel-space side of the netfilter matcher xt_ipvs. Signed-off-by: Hannes Eder <heder@google.com> --- include/linux/netfilter/xt_ipvs.h | 23 +++++ net/netfilter/Kconfig | 9 ++ net/netfilter/Makefile | 1 net/netfilter/ipvs/ip_vs_proto.c | 1 net/netfilter/xt_ipvs.c | 183 +++++++++++++++++++++++++++++++++++++ 5 files changed, 217 insertions(+), 0 deletions(-) create mode 100644 include/linux/netfilter/xt_ipvs.h create mode 100644 net/netfilter/xt_ipvs.c diff --git a/include/linux/netfilter/xt_ipvs.h b/include/linux/netfilter/xt_ipvs.h new file mode 100644 index 0000000..eb09759 --- /dev/null +++ b/include/linux/netfilter/xt_ipvs.h @@ -0,0 +1,23 @@ +#ifndef _XT_IPVS_H +#define _XT_IPVS_H 1 + +#define XT_IPVS_IPVS_PROPERTY 0x01 /* this is implied by all other options */ +#define XT_IPVS_PROTO 0x02 +#define XT_IPVS_VADDR 0x04 +#define XT_IPVS_VPORT 0x08 +#define XT_IPVS_DIR 0x10 +#define XT_IPVS_METHOD 0x20 +#define XT_IPVS_MASK (0x40 - 1) +#define XT_IPVS_ONCE_MASK (XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY) + +struct xt_ipvs { + union nf_inet_addr vaddr, vmask; + __be16 vport; + __u16 l4proto; + __u16 fwd_method; + + __u8 invert; + __u8 bitmask; +}; + +#endif /* _XT_IPVS_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 634d14a..fc35bd6 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -678,6 +678,15 @@ config NETFILTER_XT_MATCH_IPRANGE If unsure, say M. +config NETFILTER_XT_MATCH_IPVS + tristate '"ipvs" match support' + depends on IP_VS + depends on NETFILTER_ADVANCED + help + This option allows you to match against IPVS properties of a packet. + + If unsure, say N. + config NETFILTER_XT_MATCH_LENGTH tristate '"length" match support' depends on NETFILTER_ADVANCED diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 49f62ee..ff95372 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -72,6 +72,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_HASHLIMIT) += xt_hashlimit.o obj-$(CONFIG_NETFILTER_XT_MATCH_HELPER) += xt_helper.o obj-$(CONFIG_NETFILTER_XT_MATCH_HL) += xt_hl.o obj-$(CONFIG_NETFILTER_XT_MATCH_IPRANGE) += xt_iprange.o +obj-$(CONFIG_NETFILTER_XT_MATCH_IPVS) += xt_ipvs.o obj-$(CONFIG_NETFILTER_XT_MATCH_LENGTH) += xt_length.o obj-$(CONFIG_NETFILTER_XT_MATCH_LIMIT) += xt_limit.o obj-$(CONFIG_NETFILTER_XT_MATCH_MAC) += xt_mac.o diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c index 3e76716..db083c3 100644 --- a/net/netfilter/ipvs/ip_vs_proto.c +++ b/net/netfilter/ipvs/ip_vs_proto.c @@ -97,6 +97,7 @@ struct ip_vs_protocol * ip_vs_proto_get(unsigned short proto) return NULL; } +EXPORT_SYMBOL(ip_vs_proto_get); /* diff --git a/net/netfilter/xt_ipvs.c b/net/netfilter/xt_ipvs.c new file mode 100644 index 0000000..579b053 --- /dev/null +++ b/net/netfilter/xt_ipvs.c @@ -0,0 +1,183 @@ +/* + * xt_ipvs - kernel module to match IPVS connection properties + * + * Author: Hannes Eder <heder@google.com> + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/module.h> +#include <linux/moduleparam.h> +#include <linux/spinlock.h> +#include <linux/skbuff.h> +#ifdef CONFIG_IP_VS_IPV6 +#include <net/ipv6.h> +#endif +#include <linux/ip_vs.h> +#include <linux/types.h> +#include <linux/netfilter/x_tables.h> +#include <linux/netfilter/xt_ipvs.h> +#include <net/netfilter/nf_conntrack.h> + +#include <net/ip_vs.h> + +MODULE_AUTHOR("Hannes Eder <heder@google.com>"); +MODULE_DESCRIPTION("Xtables: match IPVS connection properties"); +MODULE_LICENSE("GPL"); +MODULE_ALIAS("ipt_ipvs"); +MODULE_ALIAS("ip6t_ipvs"); + +/* borrowed from xt_conntrack */ +static bool ipvs_mt_addrcmp(const union nf_inet_addr *kaddr, + const union nf_inet_addr *uaddr, + const union nf_inet_addr *umask, + unsigned int l3proto) +{ + if (l3proto == NFPROTO_IPV4) + return ((kaddr->ip ^ uaddr->ip) & umask->ip) == 0; +#ifdef CONFIG_IP_VS_IPV6 + else if (l3proto == NFPROTO_IPV6) + return ipv6_masked_addr_cmp(&kaddr->in6, &umask->in6, + &uaddr->in6) == 0; +#endif + else + return false; +} + +bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par) +{ + const struct xt_ipvs *data = par->matchinfo; + const u_int8_t family = par->family; + struct ip_vs_iphdr iph; + struct ip_vs_protocol *pp; + struct ip_vs_conn *cp; + int af; + bool match = true; + + if (data->bitmask == XT_IPVS_IPVS_PROPERTY) { + match = skb->ipvs_property ^ + !!(data->invert & XT_IPVS_IPVS_PROPERTY); + goto out; + } + + /* other flags than XT_IPVS_IPVS_PROPERTY are set */ + if (!skb->ipvs_property) { + match = false; + goto out; + } + + switch (skb->protocol) { + case htons(ETH_P_IP): + af = AF_INET; + break; +#ifdef CONFIG_IP_VS_IPV6 + case htons(ETH_P_IPV6): + af = AF_INET6; + break; +#endif + default: + match = false; + goto out; + } + + ip_vs_fill_iphdr(af, skb_network_header(skb), &iph); + + if (data->bitmask & XT_IPVS_PROTO) + if ((iph.protocol == data->l4proto) ^ + !(data->invert & XT_IPVS_PROTO)) { + match = false; + goto out; + } + + pp = ip_vs_proto_get(iph.protocol); + if (unlikely(!pp)) { + match = false; + goto out; + } + + /* + * Check if the packet belongs to an existing entry + */ + cp = pp->conn_out_get(af, skb, pp, &iph, iph.len, 1 /* inverse */); + if (unlikely(cp == NULL)) { + match = false; + goto out; + } + + /* + * We found a connection, i.e. ct != 0, make sure to call + * __ip_vs_conn_put before returning. In our case jump to out_put_con. + */ + + if (data->bitmask & XT_IPVS_VPORT) + if ((cp->vport == data->vport) ^ + !(data->invert & XT_IPVS_VPORT)) { + match = false; + goto out_put_cp; + } + + if (data->bitmask & XT_IPVS_DIR) { + enum ip_conntrack_info ctinfo; + struct nf_conn *ct = nf_ct_get(skb, &ctinfo); + + if (ct == NULL || ct == &nf_conntrack_untracked) { + match = false; + goto out_put_cp; + } + + if ((ctinfo >= IP_CT_IS_REPLY) ^ + !!(data->invert & XT_IPVS_DIR)) { + match = false; + goto out_put_cp; + } + } + + if (data->bitmask & XT_IPVS_METHOD) + if (((cp->flags & IP_VS_CONN_F_FWD_MASK) == data->fwd_method) ^ + !(data->invert & XT_IPVS_METHOD)) { + match = false; + goto out_put_cp; + } + + if (data->bitmask & XT_IPVS_VADDR) { + if (af != family) { + match = false; + goto out_put_cp; + } + + if (ipvs_mt_addrcmp(&cp->vaddr, &data->vaddr, + &data->vmask, af) ^ + !(data->invert & XT_IPVS_VADDR)) { + match = false; + goto out_put_cp; + } + } + +out_put_cp: + __ip_vs_conn_put(cp); +out: + pr_debug("match=%d\n", match); + return match; +} + +static struct xt_match xt_ipvs_mt_reg __read_mostly = { + .name = "ipvs", + .revision = 0, + .family = NFPROTO_UNSPEC, + .match = ipvs_mt, + .matchsize = sizeof(struct xt_ipvs), + .me = THIS_MODULE, +}; + +static int __init ipvs_mt_init(void) +{ + return xt_register_match(&xt_ipvs_mt_reg); +} + +static void __exit ipvs_mt_exit(void) +{ + xt_unregister_match(&xt_ipvs_mt_reg); +} + +module_init(ipvs_mt_init); +module_exit(ipvs_mt_exit); ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) 2009-09-02 14:39 ` Hannes Eder (?) @ 2009-09-02 14:54 ` Patrick McHardy 2009-09-02 15:33 ` Hannes Eder -1 siblings, 1 reply; 18+ messages in thread From: Patrick McHardy @ 2009-09-02 14:54 UTC (permalink / raw) To: Hannes Eder Cc: lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang Hannes Eder wrote: > This implements the kernel-space side of the netfilter matcher > xt_ipvs. Looks mostly fine to me, just one question: > +bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par) > +{ > + const struct xt_ipvs *data = par->matchinfo; > + const u_int8_t family = par->family; > + struct ip_vs_iphdr iph; > + struct ip_vs_protocol *pp; > + struct ip_vs_conn *cp; > + int af; > + bool match = true; > + > + if (data->bitmask == XT_IPVS_IPVS_PROPERTY) { > + match = skb->ipvs_property ^ > + !!(data->invert & XT_IPVS_IPVS_PROPERTY); > + goto out; > + } > + > + /* other flags than XT_IPVS_IPVS_PROPERTY are set */ > + if (!skb->ipvs_property) { > + match = false; > + goto out; > + } > + > + switch (skb->protocol) { > + case htons(ETH_P_IP): > + af = AF_INET; > + break; > +#ifdef CONFIG_IP_VS_IPV6 > + case htons(ETH_P_IPV6): > + af = AF_INET6; > + break; > +#endif > + default: > + match = false; > + goto out; > + } In the NF_INET_LOCAL_OUT hook skb->protocol is invalid. So if you don't need this, it would make sense to restrict the match to the other hooks. Even easier would be to use par->family, which contains the address family and doesn't need any translation. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) 2009-09-02 14:54 ` Patrick McHardy @ 2009-09-02 15:33 ` Hannes Eder 2009-09-02 15:36 ` Patrick McHardy 0 siblings, 1 reply; 18+ messages in thread From: Hannes Eder @ 2009-09-02 15:33 UTC (permalink / raw) To: Patrick McHardy Cc: lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang On Wed, Sep 2, 2009 at 16:54, Patrick McHardy<kaber@trash.net> wrote: > Hannes Eder wrote: >> This implements the kernel-space side of the netfilter matcher >> xt_ipvs. > > Looks mostly fine to me, just one question: > >> +bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par) >> +{ >> + const struct xt_ipvs *data = par->matchinfo; >> + const u_int8_t family = par->family; >> + struct ip_vs_iphdr iph; >> + struct ip_vs_protocol *pp; >> + struct ip_vs_conn *cp; >> + int af; >> + bool match = true; >> + >> + if (data->bitmask == XT_IPVS_IPVS_PROPERTY) { >> + match = skb->ipvs_property ^ >> + !!(data->invert & XT_IPVS_IPVS_PROPERTY); >> + goto out; >> + } >> + >> + /* other flags than XT_IPVS_IPVS_PROPERTY are set */ >> + if (!skb->ipvs_property) { >> + match = false; >> + goto out; >> + } >> + >> + switch (skb->protocol) { >> + case htons(ETH_P_IP): >> + af = AF_INET; >> + break; >> +#ifdef CONFIG_IP_VS_IPV6 >> + case htons(ETH_P_IPV6): >> + af = AF_INET6; >> + break; >> +#endif >> + default: >> + match = false; >> + goto out; >> + } > > In the NF_INET_LOCAL_OUT hook skb->protocol is invalid. So if you > don't need this, it would make sense to restrict the match to the > other hooks. > > Even easier would be to use par->family, which contains the address > family and doesn't need any translation. Nice, I'll use par->family. So in theory I do not even need a check like the following in the beginning? if (family != NFPROTO_IPV4 #ifdef CONFIG_IP_VS_IPV6 && family != NFPROTO_IPV6 #endif ) { match = false; goto out; } Thanks, -Hannes ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) 2009-09-02 15:33 ` Hannes Eder @ 2009-09-02 15:36 ` Patrick McHardy 2009-09-02 15:49 ` Jan Engelhardt 0 siblings, 1 reply; 18+ messages in thread From: Patrick McHardy @ 2009-09-02 15:36 UTC (permalink / raw) To: Hannes Eder Cc: lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang Hannes Eder wrote: > On Wed, Sep 2, 2009 at 16:54, Patrick McHardy<kaber@trash.net> wrote: >> Hannes Eder wrote: >>> This implements the kernel-space side of the netfilter matcher >>> xt_ipvs. >> Looks mostly fine to me, just one question: >> >>> +bool ipvs_mt(const struct sk_buff *skb, const struct xt_match_param *par) >>> +{ >>> + const struct xt_ipvs *data = par->matchinfo; >>> + const u_int8_t family = par->family; >>> + struct ip_vs_iphdr iph; >>> + struct ip_vs_protocol *pp; >>> + struct ip_vs_conn *cp; >>> + int af; >>> + bool match = true; >>> + >>> + if (data->bitmask == XT_IPVS_IPVS_PROPERTY) { >>> + match = skb->ipvs_property ^ >>> + !!(data->invert & XT_IPVS_IPVS_PROPERTY); >>> + goto out; >>> + } >>> + >>> + /* other flags than XT_IPVS_IPVS_PROPERTY are set */ >>> + if (!skb->ipvs_property) { >>> + match = false; >>> + goto out; >>> + } >>> + >>> + switch (skb->protocol) { >>> + case htons(ETH_P_IP): >>> + af = AF_INET; >>> + break; >>> +#ifdef CONFIG_IP_VS_IPV6 >>> + case htons(ETH_P_IPV6): >>> + af = AF_INET6; >>> + break; >>> +#endif >>> + default: >>> + match = false; >>> + goto out; >>> + } >> In the NF_INET_LOCAL_OUT hook skb->protocol is invalid. So if you >> don't need this, it would make sense to restrict the match to the >> other hooks. >> >> Even easier would be to use par->family, which contains the address >> family and doesn't need any translation. > > Nice, I'll use par->family. > > So in theory I do not even need a check like the following in the beginning? > > if (family != NFPROTO_IPV4 > #ifdef CONFIG_IP_VS_IPV6 > && family != NFPROTO_IPV6 > #endif > ) { > match = false; > goto out; > } With the AF_UNSPEC registration of your match, it might be used with different families. But you could add two seperate IPV4/IPV6 registrations or catch an invalid family in ->checkentry() and remove the runtime check. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) 2009-09-02 15:36 ` Patrick McHardy @ 2009-09-02 15:49 ` Jan Engelhardt 2009-09-02 16:05 ` Hannes Eder 2009-09-02 17:51 ` Patrick McHardy 0 siblings, 2 replies; 18+ messages in thread From: Jan Engelhardt @ 2009-09-02 15:49 UTC (permalink / raw) To: Patrick McHardy Cc: Hannes Eder, lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang On Wednesday 2009-09-02 17:36, Patrick McHardy wrote: >> >> Nice, I'll use par->family. >> >> So in theory I do not even need a check like the following in the beginning? >> >> if (family != NFPROTO_IPV4 >> #ifdef CONFIG_IP_VS_IPV6 >> && family != NFPROTO_IPV6 >> #endif >> ) { >> match = false; >> goto out; >> } > >With the AF_UNSPEC registration of your match, it might be used par->family always contains the NFPROTO of the invoking implementation, which can never be UNSPEC (except, in future, xtables2 ;-) par->match->family however may be UNSPEC if the module works that way. Which is why we have par->family. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) 2009-09-02 15:49 ` Jan Engelhardt @ 2009-09-02 16:05 ` Hannes Eder 2009-09-02 17:51 ` Patrick McHardy 1 sibling, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 16:05 UTC (permalink / raw) To: Jan Engelhardt Cc: Patrick McHardy, lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang On Wed, Sep 2, 2009 at 17:49, Jan Engelhardt<jengelh@medozas.de> wrote: > > On Wednesday 2009-09-02 17:36, Patrick McHardy wrote: >>> >>> Nice, I'll use par->family. >>> >>> So in theory I do not even need a check like the following in the beginning? >>> >>> if (family != NFPROTO_IPV4 >>> #ifdef CONFIG_IP_VS_IPV6 >>> && family != NFPROTO_IPV6 >>> #endif >>> ) { >>> match = false; >>> goto out; >>> } >> >>With the AF_UNSPEC registration of your match, it might be used > > par->family always contains the NFPROTO of the invoking implementation, > which can never be UNSPEC (except, in future, xtables2 ;-) > > par->match->family however may be UNSPEC if the module works that way. > Which is why we have par->family. > I'll a check_entry function: static bool ipvs_mt_check(const struct xt_mtchk_param *par) { if (par->family != NFPROTO_IPV4 #ifdef CONFIG_IP_VS_IPV6 && par->family != NFPROTO_IPV6 #endif ) return false; return true; } and remove the runtime check in ipvs_mt. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) 2009-09-02 15:49 ` Jan Engelhardt 2009-09-02 16:05 ` Hannes Eder @ 2009-09-02 17:51 ` Patrick McHardy 1 sibling, 0 replies; 18+ messages in thread From: Patrick McHardy @ 2009-09-02 17:51 UTC (permalink / raw) To: Jan Engelhardt Cc: Hannes Eder, lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang Jan Engelhardt wrote: > On Wednesday 2009-09-02 17:36, Patrick McHardy wrote: >>> Nice, I'll use par->family. >>> >>> So in theory I do not even need a check like the following in the beginning? >>> >>> if (family != NFPROTO_IPV4 >>> #ifdef CONFIG_IP_VS_IPV6 >>> && family != NFPROTO_IPV6 >>> #endif >>> ) { >>> match = false; >>> goto out; >>> } >> With the AF_UNSPEC registration of your match, it might be used > > par->family always contains the NFPROTO of the invoking implementation, > which can never be UNSPEC (except, in future, xtables2 ;-) I didn't say it will be UNSPEC, I said it might be something different than IPV4/IPV6 unless that is checked *somewhere*. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] IPVS: make friends with nf_conntrack 2009-09-02 14:38 ` Hannes Eder @ 2009-09-02 14:39 ` Hannes Eder -1 siblings, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 14:39 UTC (permalink / raw) To: lvs-devel Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang Update the nf_conntrack tuple in reply direction, as we will see traffic from the real server (RIP) to the client (CIP). Once this is done we can use netfilters SNAT in POSTROUTING, especially with xt_ipvs, to do source NAT, e.g.: % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \ > -j SNAT --to-source 192.168.10.10 Signed-off-by: Hannes Eder <heder@google.com> --- net/netfilter/ipvs/Kconfig | 2 +- net/netfilter/ipvs/ip_vs_core.c | 36 ------------------------------------ net/netfilter/ipvs/ip_vs_xmit.c | 27 +++++++++++++++++++++++++++ 3 files changed, 28 insertions(+), 37 deletions(-) diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig index 79a6980..fca5379 100644 --- a/net/netfilter/ipvs/Kconfig +++ b/net/netfilter/ipvs/Kconfig @@ -3,7 +3,7 @@ # menuconfig IP_VS tristate "IP virtual server support" - depends on NET && INET && NETFILTER + depends on NET && INET && NETFILTER && NF_CONNTRACK ---help--- IP Virtual Server support will let you build a high-performance virtual server based on cluster of two or more real servers. This diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index b227750..27bd002 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -521,26 +521,6 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb, return NF_DROP; } - -/* - * It is hooked before NF_IP_PRI_NAT_SRC at the NF_INET_POST_ROUTING - * chain, and is used for VS/NAT. - * It detects packets for VS/NAT connections and sends the packets - * immediately. This can avoid that iptable_nat mangles the packets - * for VS/NAT. - */ -static unsigned int ip_vs_post_routing(unsigned int hooknum, - struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - int (*okfn)(struct sk_buff *)) -{ - if (!skb->ipvs_property) - return NF_ACCEPT; - /* The packet was sent from IPVS, exit this chain */ - return NF_STOP; -} - __sum16 ip_vs_checksum_complete(struct sk_buff *skb, int offset) { return csum_fold(skb_checksum(skb, offset, skb->len - offset, 0)); @@ -1431,14 +1411,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = { .hooknum = NF_INET_FORWARD, .priority = 99, }, - /* Before the netfilter connection tracking, exit from POST_ROUTING */ - { - .hook = ip_vs_post_routing, - .owner = THIS_MODULE, - .pf = PF_INET, - .hooknum = NF_INET_POST_ROUTING, - .priority = NF_IP_PRI_NAT_SRC-1, - }, #ifdef CONFIG_IP_VS_IPV6 /* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be @@ -1467,14 +1439,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = { .hooknum = NF_INET_FORWARD, .priority = 99, }, - /* Before the netfilter connection tracking, exit from POST_ROUTING */ - { - .hook = ip_vs_post_routing, - .owner = THIS_MODULE, - .pf = PF_INET6, - .hooknum = NF_INET_POST_ROUTING, - .priority = NF_IP6_PRI_NAT_SRC-1, - }, #endif }; diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index 30b3189..fc7d6a4 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -27,6 +27,7 @@ #include <net/ip6_route.h> #include <linux/icmpv6.h> #include <linux/netfilter.h> +#include <net/netfilter/nf_conntrack.h> #include <linux/netfilter_ipv4.h> #include <net/ip_vs.h> @@ -347,6 +348,28 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, } #endif +static void +ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp) +{ + struct nf_conn *ct = (struct nf_conn *)skb->nfct; + + if (ct == NULL || ct == &nf_conntrack_untracked || + nf_ct_is_confirmed(ct)) + return; + + /* + * The connection is not yet in the hashtable, so we update it. + * CIP->VIP will remain the same, so leave the tuple in + * IP_CT_DIR_ORIGINAL untouched. When the reply comes back from the + * real-server we will see RIP->DIP. + */ + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3 = cp->daddr; + /* + * This will also take care of UDP and other protocols. + */ + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port = cp->dport; +} + /* * NAT transmitter (only for outside-to-inside nat forwarding) * Not used for related ICMP @@ -402,6 +425,8 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT"); + ip_vs_update_conntrack(skb, cp); + /* FIXME: when application helper enlarges the packet and the length is larger than the MTU of outgoing device, there will be still MTU problem. */ @@ -478,6 +503,8 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT"); + ip_vs_update_conntrack(skb, cp); + /* FIXME: when application helper enlarges the packet and the length is larger than the MTU of outgoing device, there will be still MTU problem. */ ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/3] IPVS: make friends with nf_conntrack @ 2009-09-02 14:39 ` Hannes Eder 0 siblings, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 14:39 UTC (permalink / raw) To: lvs-devel Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang Update the nf_conntrack tuple in reply direction, as we will see traffic from the real server (RIP) to the client (CIP). Once this is done we can use netfilters SNAT in POSTROUTING, especially with xt_ipvs, to do source NAT, e.g.: % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \ > -j SNAT --to-source 192.168.10.10 Signed-off-by: Hannes Eder <heder@google.com> --- net/netfilter/ipvs/Kconfig | 2 +- net/netfilter/ipvs/ip_vs_core.c | 36 ------------------------------------ net/netfilter/ipvs/ip_vs_xmit.c | 27 +++++++++++++++++++++++++++ 3 files changed, 28 insertions(+), 37 deletions(-) diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig index 79a6980..fca5379 100644 --- a/net/netfilter/ipvs/Kconfig +++ b/net/netfilter/ipvs/Kconfig @@ -3,7 +3,7 @@ # menuconfig IP_VS tristate "IP virtual server support" - depends on NET && INET && NETFILTER + depends on NET && INET && NETFILTER && NF_CONNTRACK ---help--- IP Virtual Server support will let you build a high-performance virtual server based on cluster of two or more real servers. This diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c index b227750..27bd002 100644 --- a/net/netfilter/ipvs/ip_vs_core.c +++ b/net/netfilter/ipvs/ip_vs_core.c @@ -521,26 +521,6 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb, return NF_DROP; } - -/* - * It is hooked before NF_IP_PRI_NAT_SRC at the NF_INET_POST_ROUTING - * chain, and is used for VS/NAT. - * It detects packets for VS/NAT connections and sends the packets - * immediately. This can avoid that iptable_nat mangles the packets - * for VS/NAT. - */ -static unsigned int ip_vs_post_routing(unsigned int hooknum, - struct sk_buff *skb, - const struct net_device *in, - const struct net_device *out, - int (*okfn)(struct sk_buff *)) -{ - if (!skb->ipvs_property) - return NF_ACCEPT; - /* The packet was sent from IPVS, exit this chain */ - return NF_STOP; -} - __sum16 ip_vs_checksum_complete(struct sk_buff *skb, int offset) { return csum_fold(skb_checksum(skb, offset, skb->len - offset, 0)); @@ -1431,14 +1411,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = { .hooknum = NF_INET_FORWARD, .priority = 99, }, - /* Before the netfilter connection tracking, exit from POST_ROUTING */ - { - .hook = ip_vs_post_routing, - .owner = THIS_MODULE, - .pf = PF_INET, - .hooknum = NF_INET_POST_ROUTING, - .priority = NF_IP_PRI_NAT_SRC-1, - }, #ifdef CONFIG_IP_VS_IPV6 /* After packet filtering, forward packet through VS/DR, VS/TUN, * or VS/NAT(change destination), so that filtering rules can be @@ -1467,14 +1439,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = { .hooknum = NF_INET_FORWARD, .priority = 99, }, - /* Before the netfilter connection tracking, exit from POST_ROUTING */ - { - .hook = ip_vs_post_routing, - .owner = THIS_MODULE, - .pf = PF_INET6, - .hooknum = NF_INET_POST_ROUTING, - .priority = NF_IP6_PRI_NAT_SRC-1, - }, #endif }; diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c index 30b3189..fc7d6a4 100644 --- a/net/netfilter/ipvs/ip_vs_xmit.c +++ b/net/netfilter/ipvs/ip_vs_xmit.c @@ -27,6 +27,7 @@ #include <net/ip6_route.h> #include <linux/icmpv6.h> #include <linux/netfilter.h> +#include <net/netfilter/nf_conntrack.h> #include <linux/netfilter_ipv4.h> #include <net/ip_vs.h> @@ -347,6 +348,28 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, } #endif +static void +ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp) +{ + struct nf_conn *ct = (struct nf_conn *)skb->nfct; + + if (ct == NULL || ct == &nf_conntrack_untracked || + nf_ct_is_confirmed(ct)) + return; + + /* + * The connection is not yet in the hashtable, so we update it. + * CIP->VIP will remain the same, so leave the tuple in + * IP_CT_DIR_ORIGINAL untouched. When the reply comes back from the + * real-server we will see RIP->DIP. + */ + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3 = cp->daddr; + /* + * This will also take care of UDP and other protocols. + */ + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port = cp->dport; +} + /* * NAT transmitter (only for outside-to-inside nat forwarding) * Not used for related ICMP @@ -402,6 +425,8 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp, IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT"); + ip_vs_update_conntrack(skb, cp); + /* FIXME: when application helper enlarges the packet and the length is larger than the MTU of outgoing device, there will be still MTU problem. */ @@ -478,6 +503,8 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp, IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT"); + ip_vs_update_conntrack(skb, cp); + /* FIXME: when application helper enlarges the packet and the length is larger than the MTU of outgoing device, there will be still MTU problem. */ ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] IPVS: make friends with nf_conntrack 2009-09-02 14:39 ` Hannes Eder (?) @ 2009-09-02 14:56 ` Patrick McHardy 2009-09-03 10:22 ` Hannes Eder -1 siblings, 1 reply; 18+ messages in thread From: Patrick McHardy @ 2009-09-02 14:56 UTC (permalink / raw) To: Hannes Eder Cc: lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang Hannes Eder wrote: > Update the nf_conntrack tuple in reply direction, as we will see > traffic from the real server (RIP) to the client (CIP). Once this is > done we can use netfilters SNAT in POSTROUTING, especially with > xt_ipvs, to do source NAT, e.g.: > > % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \ >> -j SNAT --to-source 192.168.10.10 > > +static void > +ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp) > +{ > + struct nf_conn *ct = (struct nf_conn *)skb->nfct; > + > + if (ct == NULL || ct == &nf_conntrack_untracked || > + nf_ct_is_confirmed(ct)) > + return; > + > + /* > + * The connection is not yet in the hashtable, so we update it. > + * CIP->VIP will remain the same, so leave the tuple in > + * IP_CT_DIR_ORIGINAL untouched. When the reply comes back from the > + * real-server we will see RIP->DIP. > + */ > + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3 = cp->daddr; > + /* > + * This will also take care of UDP and other protocols. > + */ > + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port = cp->dport; > +} How does IPVS interact with conntrack helpers? If it does actually intend to use them (which will happen automatically), it might make sense to use nf_conntrack_alter_reply(), which will perform a new helper lookup based on the changed tuple. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] IPVS: make friends with nf_conntrack 2009-09-02 14:56 ` Patrick McHardy @ 2009-09-03 10:22 ` Hannes Eder 2009-09-03 11:04 ` Simon Horman 0 siblings, 1 reply; 18+ messages in thread From: Hannes Eder @ 2009-09-03 10:22 UTC (permalink / raw) To: Patrick McHardy Cc: lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Simon Horman, Wensong Zhang On Wed, Sep 2, 2009 at 16:56, Patrick McHardy<kaber@trash.net> wrote: > Hannes Eder wrote: >> Update the nf_conntrack tuple in reply direction, as we will see >> traffic from the real server (RIP) to the client (CIP). Once this is >> done we can use netfilters SNAT in POSTROUTING, especially with >> xt_ipvs, to do source NAT, e.g.: >> >> % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \ >>> -j SNAT --to-source 192.168.10.10 >> > >> +static void >> +ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp) >> +{ >> + struct nf_conn *ct = (struct nf_conn *)skb->nfct; >> + >> + if (ct == NULL || ct == &nf_conntrack_untracked || >> + nf_ct_is_confirmed(ct)) >> + return; >> + >> + /* >> + * The connection is not yet in the hashtable, so we update it. >> + * CIP->VIP will remain the same, so leave the tuple in >> + * IP_CT_DIR_ORIGINAL untouched. When the reply comes back from the >> + * real-server we will see RIP->DIP. >> + */ >> + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3 = cp->daddr; >> + /* >> + * This will also take care of UDP and other protocols. >> + */ >> + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port = cp->dport; >> +} > > How does IPVS interact with conntrack helpers? If it does actually > intend to use them (which will happen automatically), it might make > sense to use nf_conntrack_alter_reply(), which will perform a new > helper lookup based on the changed tuple. Good point, I'll use nf_conntrack_alter_reply(). IHMO IPVS only deals with ftp in a special way, I think something need to be done there as well, I'll investigate that. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] IPVS: make friends with nf_conntrack 2009-09-03 10:22 ` Hannes Eder @ 2009-09-03 11:04 ` Simon Horman 0 siblings, 0 replies; 18+ messages in thread From: Simon Horman @ 2009-09-03 11:04 UTC (permalink / raw) To: Hannes Eder Cc: Patrick McHardy, lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Wensong Zhang On Thu, Sep 03, 2009 at 12:22:53PM +0200, Hannes Eder wrote: > On Wed, Sep 2, 2009 at 16:56, Patrick McHardy<kaber@trash.net> wrote: > > Hannes Eder wrote: > >> Update the nf_conntrack tuple in reply direction, as we will see > >> traffic from the real server (RIP) to the client (CIP). Once this is > >> done we can use netfilters SNAT in POSTROUTING, especially with > >> xt_ipvs, to do source NAT, e.g.: > >> > >> % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \ > >>> -j SNAT --to-source 192.168.10.10 > >> > > > >> +static void > >> +ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp) > >> +{ > >> + struct nf_conn *ct = (struct nf_conn *)skb->nfct; > >> + > >> + if (ct == NULL || ct == &nf_conntrack_untracked || > >> + nf_ct_is_confirmed(ct)) > >> + return; > >> + > >> + /* > >> + * The connection is not yet in the hashtable, so we update it. > >> + * CIP->VIP will remain the same, so leave the tuple in > >> + * IP_CT_DIR_ORIGINAL untouched. When the reply comes back from the > >> + * real-server we will see RIP->DIP. > >> + */ > >> + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3 = cp->daddr; > >> + /* > >> + * This will also take care of UDP and other protocols. > >> + */ > >> + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port = cp->dport; > >> +} > > > > How does IPVS interact with conntrack helpers? If it does actually > > intend to use them (which will happen automatically), it might make > > sense to use nf_conntrack_alter_reply(), which will perform a new > > helper lookup based on the changed tuple. > > Good point, I'll use nf_conntrack_alter_reply(). IHMO IPVS only deals > with ftp in a special way, I think something need to be done there as > well, I'll investigate that. Yes, I think that is correct. FTP is the only protocol helper in IPVS. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] IPVS: make friends with nf_conntrack 2009-09-02 14:39 ` Hannes Eder (?) (?) @ 2009-09-03 19:50 ` Julian Anastasov -1 siblings, 0 replies; 18+ messages in thread From: Julian Anastasov @ 2009-09-03 19:50 UTC (permalink / raw) To: Hannes Eder Cc: lvs-devel, linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang Hello, On Wed, 2 Sep 2009, Hannes Eder wrote: > Update the nf_conntrack tuple in reply direction, as we will see > traffic from the real server (RIP) to the client (CIP). Once this is > done we can use netfilters SNAT in POSTROUTING, especially with > xt_ipvs, to do source NAT, e.g.: > > % iptables -t nat -A POSTROUTING -m ipvs --vaddr 192.168.100.30/32 --vport 8080 \ > > -j SNAT --to-source 192.168.10.10 > > Signed-off-by: Hannes Eder <heder@google.com> > --- The following changes in ip_vs_core.c may be break normal ip_vs_ftp users. Somehow you decided that this POST_ROUTING code is not needed and deleted it. This code should be present by default. From http://www.ssi.bg/~ja/LVS.txt: === Now after many changes in latest kernels I'm not sure what happens if netfilter sees IPVS traffic in POST_ROUTING. Such change require testing of ip_vs_ftp in both passive and active LVS-NAT mode, with different length of IP address:port representation in FTP commands, to check if resulting packets survive double NAT when payload size is changed. It is the best test for IPVS to see if netfilter additionally changes FTP packets leading to wrong payload. === So, you have to check the ip_vs_ftp case because double NAT for IPs and Ports usually works but double changing of SEQs and payload may be not. You can also check NFCT for IPVS (http://www.ssi.bg/~ja/nfct/) for using netfilter functions and structures (ip_vs_nfct.c) most recent rediff: http://www.ssi.bg/~ja/nfct/ipvs-nfct-2.6.28-1.diff > diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c > index b227750..27bd002 100644 > --- a/net/netfilter/ipvs/ip_vs_core.c > +++ b/net/netfilter/ipvs/ip_vs_core.c > @@ -521,26 +521,6 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb, > return NF_DROP; > } > > - > -/* > - * It is hooked before NF_IP_PRI_NAT_SRC at the NF_INET_POST_ROUTING > - * chain, and is used for VS/NAT. > - * It detects packets for VS/NAT connections and sends the packets > - * immediately. This can avoid that iptable_nat mangles the packets > - * for VS/NAT. > - */ > -static unsigned int ip_vs_post_routing(unsigned int hooknum, > - struct sk_buff *skb, > - const struct net_device *in, > - const struct net_device *out, > - int (*okfn)(struct sk_buff *)) > -{ > - if (!skb->ipvs_property) > - return NF_ACCEPT; > - /* The packet was sent from IPVS, exit this chain */ > - return NF_STOP; > -} > - > __sum16 ip_vs_checksum_complete(struct sk_buff *skb, int offset) > { > return csum_fold(skb_checksum(skb, offset, skb->len - offset, 0)); > @@ -1431,14 +1411,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = { > .hooknum = NF_INET_FORWARD, > .priority = 99, > }, > - /* Before the netfilter connection tracking, exit from POST_ROUTING */ > - { > - .hook = ip_vs_post_routing, > - .owner = THIS_MODULE, > - .pf = PF_INET, > - .hooknum = NF_INET_POST_ROUTING, > - .priority = NF_IP_PRI_NAT_SRC-1, > - }, > #ifdef CONFIG_IP_VS_IPV6 > /* After packet filtering, forward packet through VS/DR, VS/TUN, > * or VS/NAT(change destination), so that filtering rules can be > @@ -1467,14 +1439,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = { > .hooknum = NF_INET_FORWARD, > .priority = 99, > }, > - /* Before the netfilter connection tracking, exit from POST_ROUTING */ > - { > - .hook = ip_vs_post_routing, > - .owner = THIS_MODULE, > - .pf = PF_INET6, > - .hooknum = NF_INET_POST_ROUTING, > - .priority = NF_IP6_PRI_NAT_SRC-1, > - }, > #endif > }; Regards -- Julian Anastasov <ja@ssi.bg> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs 2009-09-02 14:38 ` Hannes Eder @ 2009-09-02 14:41 ` Hannes Eder -1 siblings, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 14:41 UTC (permalink / raw) To: lvs-devel Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang The user-space library for the netfilter matcher xt_ipvs. Signed-off-by: Hannes Eder <heder@google.com> --- configure.ac | 11 + extensions/libxt_ipvs.c | 349 +++++++++++++++++++++++++++++++++++++ extensions/libxt_ipvs.man | 21 ++ include/linux/netfilter/xt_ipvs.h | 23 ++ 4 files changed, 401 insertions(+), 3 deletions(-) create mode 100644 extensions/libxt_ipvs.c create mode 100644 extensions/libxt_ipvs.man create mode 100644 include/linux/netfilter/xt_ipvs.h diff --git a/configure.ac b/configure.ac index bc74efe..e55ab43 100644 --- a/configure.ac +++ b/configure.ac @@ -1,4 +1,3 @@ - AC_INIT([iptables], [1.4.4]) # See libtool.info "Libtool's versioning system" @@ -47,12 +46,18 @@ AC_ARG_WITH([pkgconfigdir], AS_HELP_STRING([--with-pkgconfigdir=PATH], [Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]), [pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig']) -AC_CHECK_HEADER([linux/dccp.h]) - blacklist_modules=""; + +AC_CHECK_HEADER([linux/dccp.h]) if test "$ac_cv_header_linux_dccp_h" != "yes"; then blacklist_modules="$blacklist_modules dccp"; fi; + +AC_CHECK_HEADER([linux/ip_vs.h]) +if test "$ac_cv_header_linux_ip_vs_h" != "yes"; then + blacklist_modules="$blacklist_modules ipvs"; +fi; + AC_SUBST([blacklist_modules]) AM_CONDITIONAL([ENABLE_STATIC], [test "$enable_static" = "yes"]) diff --git a/extensions/libxt_ipvs.c b/extensions/libxt_ipvs.c new file mode 100644 index 0000000..9fd007f --- /dev/null +++ b/extensions/libxt_ipvs.c @@ -0,0 +1,349 @@ +/* Shared library add-on to iptables to add IPVS matching. + * + * Detailed doc is in the kernel module source net/netfilter/xt_ipvs.c + * + * Author: Hannes Eder <heder@google.com> + */ +#include <sys/types.h> +#include <assert.h> +#include <ctype.h> +#include <errno.h> +#include <getopt.h> +#include <netdb.h> +#include <stdlib.h> +#include <stdio.h> +#include <string.h> +#include <xtables.h> +#include <linux/ip_vs.h> +#include <linux/netfilter/xt_ipvs.h> + +static const struct option ipvs_mt_opts[] = { + { .name = "ipvs", .has_arg = false, .val = '0' }, + { .name = "vproto", .has_arg = true, .val = '1' }, + { .name = "vaddr", .has_arg = true, .val = '2' }, + { .name = "vport", .has_arg = true, .val = '3' }, + { .name = "vdir", .has_arg = true, .val = '4' }, + { .name = "vmethod", .has_arg = true, .val = '5' }, + { .name = NULL } +}; + +static void ipvs_mt_help(void) +{ + printf( +"IPVS match options:\n" +"[!] --ipvs packet belongs to an IPVS connection\n" +"\n" +"Any of the following options implies --ipvs (even negated)\n" +"[!] --vproto protocol VIP protocol to match; by number or name,\n" +" e.g. \"tcp\"\n" +"[!] --vaddr address[/mask] VIP address to match\n" +"[!] --vport port VIP port to match; by number or name,\n" +" e.g. \"http\"\n" +" --vdir {ORIGINAL|REPLY} flow direction of packet\n" +"[!] --vmethod {GATE|IPIP|MASQ} IPVS forwarding method used\n" + ); +} + +static void ipvs_mt_parse_addr_and_mask(const char *arg, + union nf_inet_addr *address, + union nf_inet_addr *mask, + unsigned int family) +{ + struct in_addr *addr = NULL; + struct in6_addr *addr6 = NULL; + unsigned int naddrs = 0; + + if (family == NFPROTO_IPV4) { + xtables_ipparse_any(arg, &addr, &mask->in, &naddrs); + if (naddrs > 1) + xtables_error(PARAMETER_PROBLEM, + "multiple IP addresses not allowed"); + if (naddrs == 1) + memcpy(&address->in, addr, sizeof(*addr)); + } else if (family == NFPROTO_IPV6) { + xtables_ip6parse_any(arg, &addr6, &mask->in6, &naddrs); + if (naddrs > 1) + xtables_error(PARAMETER_PROBLEM, + "multiple IP addresses not allowed"); + if (naddrs == 1) + memcpy(&address->in6, addr6, sizeof(*addr6)); + } else { + /* Hu? */ + assert(false); + } +} + +/* Function which parses command options; returns true if it ate an option */ +static int ipvs_mt_parse(int c, char **argv, int invert, unsigned int *flags, + const void *entry, struct xt_entry_match **match, + unsigned int family) +{ + struct xt_ipvs *data = (void *)(*match)->data; + char *p = NULL; + u_int8_t op = 0; + + if ('0' <= c && c <= '5') { + int ops[] = { + XT_IPVS_IPVS_PROPERTY, + XT_IPVS_PROTO, + XT_IPVS_VADDR, + XT_IPVS_VPORT, + XT_IPVS_DIR, + XT_IPVS_METHOD + }; + op = ops[c - '0']; + } else + return 0; + + if (*flags & op & XT_IPVS_ONCE_MASK) + goto multiple_use; + + switch (c) { + case '0': /* --ipvs */ + /* Nothing to do here. */ + break; + + case '1': /* --vproto */ + /* Canonicalize into lower case */ + for (p = optarg; *p != '\0'; ++p) + *p = tolower(*p); + + data->l4proto = xtables_parse_protocol(optarg); + break; + + case '2': /* --vaddr */ + ipvs_mt_parse_addr_and_mask(optarg, &data->vaddr, + &data->vmask, family); + break; + + case '3': /* --vport */ + data->vport = htons(xtables_parse_port(optarg, "tcp")); + break; + + case '4': /* --vdir */ + xtables_param_act(XTF_NO_INVERT, "ipvs", "--vdir", invert); + if (strcasecmp(optarg, "ORIGINAL") == 0) { + data->bitmask |= XT_IPVS_DIR; + data->invert &= ~XT_IPVS_DIR; + } else if (strcasecmp(optarg, "REPLY") == 0) { + data->bitmask |= XT_IPVS_DIR; + data->invert |= XT_IPVS_DIR; + } else { + xtables_param_act(XTF_BAD_VALUE, + "ipvs", "--vdir", optarg); + } + break; + + case '5': /* --vmethod */ + if (strcasecmp(optarg, "GATE") == 0) + data->fwd_method = IP_VS_CONN_F_DROUTE; + else if (strcasecmp(optarg, "IPIP") == 0) + data->fwd_method = IP_VS_CONN_F_TUNNEL; + else if (strcasecmp(optarg, "MASQ") == 0) + data->fwd_method = IP_VS_CONN_F_MASQ; + else + xtables_param_act(XTF_BAD_VALUE, + "ipvs", "--vmethod", optarg); + break; + + default: + /* Hu? How did we come here? */ + assert(false); + return 0; + } + + if (op & XT_IPVS_ONCE_MASK) { + if (data->invert & XT_IPVS_IPVS_PROPERTY) + xtables_error(PARAMETER_PROBLEM, + "! --ipvs cannot be together with" + " other options"); + data->bitmask |= XT_IPVS_IPVS_PROPERTY; + } + + data->bitmask |= op; + if (invert) + data->invert |= op; + *flags |= op; + return 1; + +multiple_use: + xtables_error(PARAMETER_PROBLEM, + "multiple use of the same IPVS option is not allowed"); +} + +static int ipvs_mt4_parse(int c, char **argv, int invert, unsigned int *flags, + const void *entry, struct xt_entry_match **match) +{ + return ipvs_mt_parse(c, argv, invert, flags, entry, match, + NFPROTO_IPV4); +} + +static int ipvs_mt6_parse(int c, char **argv, int invert, unsigned int *flags, + const void *entry, struct xt_entry_match **match) +{ + return ipvs_mt_parse(c, argv, invert, flags, entry, match, + NFPROTO_IPV6); +} + +static void ipvs_mt_check(unsigned int flags) +{ + if (flags == 0) + xtables_error(PARAMETER_PROBLEM, + "IPVS: At least one option is required"); +} + +/* Shamelessly copied from libxt_conntrack.c */ +static void ipvs_mt_dump_addr(const union nf_inet_addr *addr, + const union nf_inet_addr *mask, + unsigned int family, bool numeric) +{ + char buf[BUFSIZ]; + + if (family == NFPROTO_IPV4) { + if (!numeric && addr->ip == 0) { + printf("anywhere "); + return; + } + if (numeric) + strcpy(buf, xtables_ipaddr_to_numeric(&addr->in)); + else + strcpy(buf, xtables_ipaddr_to_anyname(&addr->in)); + strcat(buf, xtables_ipmask_to_numeric(&mask->in)); + printf("%s ", buf); + } else if (family == NFPROTO_IPV6) { + if (!numeric && addr->ip6[0] == 0 && addr->ip6[1] == 0 && + addr->ip6[2] == 0 && addr->ip6[3] == 0) { + printf("anywhere "); + return; + } + if (numeric) + strcpy(buf, xtables_ip6addr_to_numeric(&addr->in6)); + else + strcpy(buf, xtables_ip6addr_to_anyname(&addr->in6)); + strcat(buf, xtables_ip6mask_to_numeric(&mask->in6)); + printf("%s ", buf); + } +} + +static void ipvs_mt_dump(const void *ip, const struct xt_ipvs *data, + unsigned int family, bool numeric, const char *prefix) +{ + if (data->bitmask == XT_IPVS_IPVS_PROPERTY) { + if (data->invert & XT_IPVS_IPVS_PROPERTY) + printf("! "); + printf("%sipvs ", prefix); + } + + if (data->bitmask & XT_IPVS_PROTO) { + if (data->invert & XT_IPVS_PROTO) + printf("! "); + printf("%sproto %u ", prefix, data->l4proto); + } + + if (data->bitmask & XT_IPVS_VADDR) { + if (data->invert & XT_IPVS_VADDR) + printf("! "); + + printf("%svaddr ", prefix); + ipvs_mt_dump_addr(&data->vaddr, &data->vmask, family, numeric); + } + + if (data->bitmask & XT_IPVS_VPORT) { + if (data->invert & XT_IPVS_VPORT) + printf("! "); + + printf("%svport %u ", prefix, ntohs(data->vport)); + } + + if (data->bitmask & XT_IPVS_DIR) { + if (data->invert & XT_IPVS_DIR) + printf("%svdir REPLY ", prefix); + else + printf("%svdir ORIGINAL ", prefix); + } + + if (data->bitmask & XT_IPVS_METHOD) { + if (data->invert & XT_IPVS_METHOD) + printf("! "); + + printf("%svmethod ", prefix); + switch (data->fwd_method) { + case IP_VS_CONN_F_DROUTE: + printf("GATE "); + break; + case IP_VS_CONN_F_TUNNEL: + printf("IPIP "); + break; + case IP_VS_CONN_F_MASQ: + printf("MASQ "); + break; + default: + /* Hu? */ + printf("UNKNOWN "); + break; + } + } +} + +static void ipvs_mt4_print(const void *ip, const struct xt_entry_match *match, + int numeric) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV4, numeric, ""); +} + +static void ipvs_mt6_print(const void *ip, const struct xt_entry_match *match, + int numeric) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV6, numeric, ""); +} + +static void ipvs_mt4_save(const void *ip, const struct xt_entry_match *match) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV4, true, "--"); +} + +static void ipvs_mt6_save(const void *ip, const struct xt_entry_match *match) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV6, true, "--"); +} + +static struct xtables_match ipvs_matches_reg[] = { + { + .version = XTABLES_VERSION, + .name = "ipvs", + .revision = 0, + .family = NFPROTO_IPV4, + .size = XT_ALIGN(sizeof(struct xt_ipvs)), + .userspacesize = XT_ALIGN(sizeof(struct xt_ipvs)), + .help = ipvs_mt_help, + .parse = ipvs_mt4_parse, + .final_check = ipvs_mt_check, + .print = ipvs_mt4_print, + .save = ipvs_mt4_save, + .extra_opts = ipvs_mt_opts, + }, + { + .version = XTABLES_VERSION, + .name = "ipvs", + .revision = 0, + .family = NFPROTO_IPV6, + .size = XT_ALIGN(sizeof(struct xt_ipvs)), + .userspacesize = XT_ALIGN(sizeof(struct xt_ipvs)), + .help = ipvs_mt_help, + .parse = ipvs_mt6_parse, + .final_check = ipvs_mt_check, + .print = ipvs_mt6_print, + .save = ipvs_mt6_save, + .extra_opts = ipvs_mt_opts, + }, +}; + +void _init(void) +{ + xtables_register_matches(ipvs_matches_reg, + ARRAY_SIZE(ipvs_matches_reg)); +} diff --git a/extensions/libxt_ipvs.man b/extensions/libxt_ipvs.man new file mode 100644 index 0000000..7fe915f --- /dev/null +++ b/extensions/libxt_ipvs.man @@ -0,0 +1,21 @@ +Match IPVS connection properties. +.TP +[\fB!\fR] \fB\-\-ipvs\fP +packet belongs to an IPVS connection +.TP +Any of the following options implies \-\-ipvs (even negated) +.TP +[\fB!\fR] \fB\-\-vproto\fP \fIprotocol\fP +VIP protocol to match; by number or name, e.g. "tcp" +.TP +[\fB!\fR] \fB\-\-vaddr\fP \fIaddress\fP[\fB/\fP\fImask\fP] +VIP address to match +.TP +[\fB!\fR] \fB\-\-vport\fP \fIport\fP +VIP port to match; by number or name, e.g. "http" +.TP +\fB\-\-vdir\fP {\fBORIGINAL\fP|\fBREPLY\fP} +flow direction of packet +.TP +[\fB!\fR] \fB\-\-vmethod\fP {\fBGATE\fP|\fBIPIP\fP|\fBMASQ\fP} +IPVS forwarding method used diff --git a/include/linux/netfilter/xt_ipvs.h b/include/linux/netfilter/xt_ipvs.h new file mode 100644 index 0000000..eb09759 --- /dev/null +++ b/include/linux/netfilter/xt_ipvs.h @@ -0,0 +1,23 @@ +#ifndef _XT_IPVS_H +#define _XT_IPVS_H 1 + +#define XT_IPVS_IPVS_PROPERTY 0x01 /* this is implied by all other options */ +#define XT_IPVS_PROTO 0x02 +#define XT_IPVS_VADDR 0x04 +#define XT_IPVS_VPORT 0x08 +#define XT_IPVS_DIR 0x10 +#define XT_IPVS_METHOD 0x20 +#define XT_IPVS_MASK (0x40 - 1) +#define XT_IPVS_ONCE_MASK (XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY) + +struct xt_ipvs { + union nf_inet_addr vaddr, vmask; + __be16 vport; + __u16 l4proto; + __u16 fwd_method; + + __u8 invert; + __u8 bitmask; +}; + +#endif /* _XT_IPVS_H */ ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/3] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs @ 2009-09-02 14:41 ` Hannes Eder 0 siblings, 0 replies; 18+ messages in thread From: Hannes Eder @ 2009-09-02 14:41 UTC (permalink / raw) To: lvs-devel Cc: linux-kernel, netdev, netfilter-devel, Fabien Duchêne, Jan Engelhardt, Jean-Luc Fortemaison, Julian Anastasov, Julius Volz, Laurent Grawet, Patrick McHardy, Simon Horman, Wensong Zhang The user-space library for the netfilter matcher xt_ipvs. Signed-off-by: Hannes Eder <heder@google.com> --- configure.ac | 11 + extensions/libxt_ipvs.c | 349 +++++++++++++++++++++++++++++++++++++ extensions/libxt_ipvs.man | 21 ++ include/linux/netfilter/xt_ipvs.h | 23 ++ 4 files changed, 401 insertions(+), 3 deletions(-) create mode 100644 extensions/libxt_ipvs.c create mode 100644 extensions/libxt_ipvs.man create mode 100644 include/linux/netfilter/xt_ipvs.h diff --git a/configure.ac b/configure.ac index bc74efe..e55ab43 100644 --- a/configure.ac +++ b/configure.ac @@ -1,4 +1,3 @@ - AC_INIT([iptables], [1.4.4]) # See libtool.info "Libtool's versioning system" @@ -47,12 +46,18 @@ AC_ARG_WITH([pkgconfigdir], AS_HELP_STRING([--with-pkgconfigdir=PATH], [Path to the pkgconfig directory [[LIBDIR/pkgconfig]]]), [pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig']) -AC_CHECK_HEADER([linux/dccp.h]) - blacklist_modules=""; + +AC_CHECK_HEADER([linux/dccp.h]) if test "$ac_cv_header_linux_dccp_h" != "yes"; then blacklist_modules="$blacklist_modules dccp"; fi; + +AC_CHECK_HEADER([linux/ip_vs.h]) +if test "$ac_cv_header_linux_ip_vs_h" != "yes"; then + blacklist_modules="$blacklist_modules ipvs"; +fi; + AC_SUBST([blacklist_modules]) AM_CONDITIONAL([ENABLE_STATIC], [test "$enable_static" = "yes"]) diff --git a/extensions/libxt_ipvs.c b/extensions/libxt_ipvs.c new file mode 100644 index 0000000..9fd007f --- /dev/null +++ b/extensions/libxt_ipvs.c @@ -0,0 +1,349 @@ +/* Shared library add-on to iptables to add IPVS matching. + * + * Detailed doc is in the kernel module source net/netfilter/xt_ipvs.c + * + * Author: Hannes Eder <heder@google.com> + */ +#include <sys/types.h> +#include <assert.h> +#include <ctype.h> +#include <errno.h> +#include <getopt.h> +#include <netdb.h> +#include <stdlib.h> +#include <stdio.h> +#include <string.h> +#include <xtables.h> +#include <linux/ip_vs.h> +#include <linux/netfilter/xt_ipvs.h> + +static const struct option ipvs_mt_opts[] = { + { .name = "ipvs", .has_arg = false, .val = '0' }, + { .name = "vproto", .has_arg = true, .val = '1' }, + { .name = "vaddr", .has_arg = true, .val = '2' }, + { .name = "vport", .has_arg = true, .val = '3' }, + { .name = "vdir", .has_arg = true, .val = '4' }, + { .name = "vmethod", .has_arg = true, .val = '5' }, + { .name = NULL } +}; + +static void ipvs_mt_help(void) +{ + printf( +"IPVS match options:\n" +"[!] --ipvs packet belongs to an IPVS connection\n" +"\n" +"Any of the following options implies --ipvs (even negated)\n" +"[!] --vproto protocol VIP protocol to match; by number or name,\n" +" e.g. \"tcp\"\n" +"[!] --vaddr address[/mask] VIP address to match\n" +"[!] --vport port VIP port to match; by number or name,\n" +" e.g. \"http\"\n" +" --vdir {ORIGINAL|REPLY} flow direction of packet\n" +"[!] --vmethod {GATE|IPIP|MASQ} IPVS forwarding method used\n" + ); +} + +static void ipvs_mt_parse_addr_and_mask(const char *arg, + union nf_inet_addr *address, + union nf_inet_addr *mask, + unsigned int family) +{ + struct in_addr *addr = NULL; + struct in6_addr *addr6 = NULL; + unsigned int naddrs = 0; + + if (family == NFPROTO_IPV4) { + xtables_ipparse_any(arg, &addr, &mask->in, &naddrs); + if (naddrs > 1) + xtables_error(PARAMETER_PROBLEM, + "multiple IP addresses not allowed"); + if (naddrs == 1) + memcpy(&address->in, addr, sizeof(*addr)); + } else if (family == NFPROTO_IPV6) { + xtables_ip6parse_any(arg, &addr6, &mask->in6, &naddrs); + if (naddrs > 1) + xtables_error(PARAMETER_PROBLEM, + "multiple IP addresses not allowed"); + if (naddrs == 1) + memcpy(&address->in6, addr6, sizeof(*addr6)); + } else { + /* Hu? */ + assert(false); + } +} + +/* Function which parses command options; returns true if it ate an option */ +static int ipvs_mt_parse(int c, char **argv, int invert, unsigned int *flags, + const void *entry, struct xt_entry_match **match, + unsigned int family) +{ + struct xt_ipvs *data = (void *)(*match)->data; + char *p = NULL; + u_int8_t op = 0; + + if ('0' <= c && c <= '5') { + int ops[] = { + XT_IPVS_IPVS_PROPERTY, + XT_IPVS_PROTO, + XT_IPVS_VADDR, + XT_IPVS_VPORT, + XT_IPVS_DIR, + XT_IPVS_METHOD + }; + op = ops[c - '0']; + } else + return 0; + + if (*flags & op & XT_IPVS_ONCE_MASK) + goto multiple_use; + + switch (c) { + case '0': /* --ipvs */ + /* Nothing to do here. */ + break; + + case '1': /* --vproto */ + /* Canonicalize into lower case */ + for (p = optarg; *p != '\0'; ++p) + *p = tolower(*p); + + data->l4proto = xtables_parse_protocol(optarg); + break; + + case '2': /* --vaddr */ + ipvs_mt_parse_addr_and_mask(optarg, &data->vaddr, + &data->vmask, family); + break; + + case '3': /* --vport */ + data->vport = htons(xtables_parse_port(optarg, "tcp")); + break; + + case '4': /* --vdir */ + xtables_param_act(XTF_NO_INVERT, "ipvs", "--vdir", invert); + if (strcasecmp(optarg, "ORIGINAL") == 0) { + data->bitmask |= XT_IPVS_DIR; + data->invert &= ~XT_IPVS_DIR; + } else if (strcasecmp(optarg, "REPLY") == 0) { + data->bitmask |= XT_IPVS_DIR; + data->invert |= XT_IPVS_DIR; + } else { + xtables_param_act(XTF_BAD_VALUE, + "ipvs", "--vdir", optarg); + } + break; + + case '5': /* --vmethod */ + if (strcasecmp(optarg, "GATE") == 0) + data->fwd_method = IP_VS_CONN_F_DROUTE; + else if (strcasecmp(optarg, "IPIP") == 0) + data->fwd_method = IP_VS_CONN_F_TUNNEL; + else if (strcasecmp(optarg, "MASQ") == 0) + data->fwd_method = IP_VS_CONN_F_MASQ; + else + xtables_param_act(XTF_BAD_VALUE, + "ipvs", "--vmethod", optarg); + break; + + default: + /* Hu? How did we come here? */ + assert(false); + return 0; + } + + if (op & XT_IPVS_ONCE_MASK) { + if (data->invert & XT_IPVS_IPVS_PROPERTY) + xtables_error(PARAMETER_PROBLEM, + "! --ipvs cannot be together with" + " other options"); + data->bitmask |= XT_IPVS_IPVS_PROPERTY; + } + + data->bitmask |= op; + if (invert) + data->invert |= op; + *flags |= op; + return 1; + +multiple_use: + xtables_error(PARAMETER_PROBLEM, + "multiple use of the same IPVS option is not allowed"); +} + +static int ipvs_mt4_parse(int c, char **argv, int invert, unsigned int *flags, + const void *entry, struct xt_entry_match **match) +{ + return ipvs_mt_parse(c, argv, invert, flags, entry, match, + NFPROTO_IPV4); +} + +static int ipvs_mt6_parse(int c, char **argv, int invert, unsigned int *flags, + const void *entry, struct xt_entry_match **match) +{ + return ipvs_mt_parse(c, argv, invert, flags, entry, match, + NFPROTO_IPV6); +} + +static void ipvs_mt_check(unsigned int flags) +{ + if (flags == 0) + xtables_error(PARAMETER_PROBLEM, + "IPVS: At least one option is required"); +} + +/* Shamelessly copied from libxt_conntrack.c */ +static void ipvs_mt_dump_addr(const union nf_inet_addr *addr, + const union nf_inet_addr *mask, + unsigned int family, bool numeric) +{ + char buf[BUFSIZ]; + + if (family == NFPROTO_IPV4) { + if (!numeric && addr->ip == 0) { + printf("anywhere "); + return; + } + if (numeric) + strcpy(buf, xtables_ipaddr_to_numeric(&addr->in)); + else + strcpy(buf, xtables_ipaddr_to_anyname(&addr->in)); + strcat(buf, xtables_ipmask_to_numeric(&mask->in)); + printf("%s ", buf); + } else if (family == NFPROTO_IPV6) { + if (!numeric && addr->ip6[0] == 0 && addr->ip6[1] == 0 && + addr->ip6[2] == 0 && addr->ip6[3] == 0) { + printf("anywhere "); + return; + } + if (numeric) + strcpy(buf, xtables_ip6addr_to_numeric(&addr->in6)); + else + strcpy(buf, xtables_ip6addr_to_anyname(&addr->in6)); + strcat(buf, xtables_ip6mask_to_numeric(&mask->in6)); + printf("%s ", buf); + } +} + +static void ipvs_mt_dump(const void *ip, const struct xt_ipvs *data, + unsigned int family, bool numeric, const char *prefix) +{ + if (data->bitmask == XT_IPVS_IPVS_PROPERTY) { + if (data->invert & XT_IPVS_IPVS_PROPERTY) + printf("! "); + printf("%sipvs ", prefix); + } + + if (data->bitmask & XT_IPVS_PROTO) { + if (data->invert & XT_IPVS_PROTO) + printf("! "); + printf("%sproto %u ", prefix, data->l4proto); + } + + if (data->bitmask & XT_IPVS_VADDR) { + if (data->invert & XT_IPVS_VADDR) + printf("! "); + + printf("%svaddr ", prefix); + ipvs_mt_dump_addr(&data->vaddr, &data->vmask, family, numeric); + } + + if (data->bitmask & XT_IPVS_VPORT) { + if (data->invert & XT_IPVS_VPORT) + printf("! "); + + printf("%svport %u ", prefix, ntohs(data->vport)); + } + + if (data->bitmask & XT_IPVS_DIR) { + if (data->invert & XT_IPVS_DIR) + printf("%svdir REPLY ", prefix); + else + printf("%svdir ORIGINAL ", prefix); + } + + if (data->bitmask & XT_IPVS_METHOD) { + if (data->invert & XT_IPVS_METHOD) + printf("! "); + + printf("%svmethod ", prefix); + switch (data->fwd_method) { + case IP_VS_CONN_F_DROUTE: + printf("GATE "); + break; + case IP_VS_CONN_F_TUNNEL: + printf("IPIP "); + break; + case IP_VS_CONN_F_MASQ: + printf("MASQ "); + break; + default: + /* Hu? */ + printf("UNKNOWN "); + break; + } + } +} + +static void ipvs_mt4_print(const void *ip, const struct xt_entry_match *match, + int numeric) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV4, numeric, ""); +} + +static void ipvs_mt6_print(const void *ip, const struct xt_entry_match *match, + int numeric) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV6, numeric, ""); +} + +static void ipvs_mt4_save(const void *ip, const struct xt_entry_match *match) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV4, true, "--"); +} + +static void ipvs_mt6_save(const void *ip, const struct xt_entry_match *match) +{ + const struct xt_ipvs *data = (const void *)match->data; + ipvs_mt_dump(ip, data, NFPROTO_IPV6, true, "--"); +} + +static struct xtables_match ipvs_matches_reg[] = { + { + .version = XTABLES_VERSION, + .name = "ipvs", + .revision = 0, + .family = NFPROTO_IPV4, + .size = XT_ALIGN(sizeof(struct xt_ipvs)), + .userspacesize = XT_ALIGN(sizeof(struct xt_ipvs)), + .help = ipvs_mt_help, + .parse = ipvs_mt4_parse, + .final_check = ipvs_mt_check, + .print = ipvs_mt4_print, + .save = ipvs_mt4_save, + .extra_opts = ipvs_mt_opts, + }, + { + .version = XTABLES_VERSION, + .name = "ipvs", + .revision = 0, + .family = NFPROTO_IPV6, + .size = XT_ALIGN(sizeof(struct xt_ipvs)), + .userspacesize = XT_ALIGN(sizeof(struct xt_ipvs)), + .help = ipvs_mt_help, + .parse = ipvs_mt6_parse, + .final_check = ipvs_mt_check, + .print = ipvs_mt6_print, + .save = ipvs_mt6_save, + .extra_opts = ipvs_mt_opts, + }, +}; + +void _init(void) +{ + xtables_register_matches(ipvs_matches_reg, + ARRAY_SIZE(ipvs_matches_reg)); +} diff --git a/extensions/libxt_ipvs.man b/extensions/libxt_ipvs.man new file mode 100644 index 0000000..7fe915f --- /dev/null +++ b/extensions/libxt_ipvs.man @@ -0,0 +1,21 @@ +Match IPVS connection properties. +.TP +[\fB!\fR] \fB\-\-ipvs\fP +packet belongs to an IPVS connection +.TP +Any of the following options implies \-\-ipvs (even negated) +.TP +[\fB!\fR] \fB\-\-vproto\fP \fIprotocol\fP +VIP protocol to match; by number or name, e.g. "tcp" +.TP +[\fB!\fR] \fB\-\-vaddr\fP \fIaddress\fP[\fB/\fP\fImask\fP] +VIP address to match +.TP +[\fB!\fR] \fB\-\-vport\fP \fIport\fP +VIP port to match; by number or name, e.g. "http" +.TP +\fB\-\-vdir\fP {\fBORIGINAL\fP|\fBREPLY\fP} +flow direction of packet +.TP +[\fB!\fR] \fB\-\-vmethod\fP {\fBGATE\fP|\fBIPIP\fP|\fBMASQ\fP} +IPVS forwarding method used diff --git a/include/linux/netfilter/xt_ipvs.h b/include/linux/netfilter/xt_ipvs.h new file mode 100644 index 0000000..eb09759 --- /dev/null +++ b/include/linux/netfilter/xt_ipvs.h @@ -0,0 +1,23 @@ +#ifndef _XT_IPVS_H +#define _XT_IPVS_H 1 + +#define XT_IPVS_IPVS_PROPERTY 0x01 /* this is implied by all other options */ +#define XT_IPVS_PROTO 0x02 +#define XT_IPVS_VADDR 0x04 +#define XT_IPVS_VPORT 0x08 +#define XT_IPVS_DIR 0x10 +#define XT_IPVS_METHOD 0x20 +#define XT_IPVS_MASK (0x40 - 1) +#define XT_IPVS_ONCE_MASK (XT_IPVS_MASK & ~XT_IPVS_IPVS_PROPERTY) + +struct xt_ipvs { + union nf_inet_addr vaddr, vmask; + __be16 vport; + __u16 l4proto; + __u16 fwd_method; + + __u8 invert; + __u8 bitmask; +}; + +#endif /* _XT_IPVS_H */ ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-09-03 19:50 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-09-02 14:38 [PATCH 0/3] IPVS full NAT support + netfilter 'ipvs' match support Hannes Eder 2009-09-02 14:38 ` Hannes Eder 2009-09-02 14:39 ` [PATCH 1/3] netfilter: xt_ipvs (netfilter matcher for IPVS) Hannes Eder 2009-09-02 14:39 ` Hannes Eder 2009-09-02 14:54 ` Patrick McHardy 2009-09-02 15:33 ` Hannes Eder 2009-09-02 15:36 ` Patrick McHardy 2009-09-02 15:49 ` Jan Engelhardt 2009-09-02 16:05 ` Hannes Eder 2009-09-02 17:51 ` Patrick McHardy 2009-09-02 14:39 ` [PATCH 2/3] IPVS: make friends with nf_conntrack Hannes Eder 2009-09-02 14:39 ` Hannes Eder 2009-09-02 14:56 ` Patrick McHardy 2009-09-03 10:22 ` Hannes Eder 2009-09-03 11:04 ` Simon Horman 2009-09-03 19:50 ` Julian Anastasov 2009-09-02 14:41 ` [PATCH 3/3] libxt_ipvs: user-space lib for netfilter matcher xt_ipvs Hannes Eder 2009-09-02 14:41 ` Hannes Eder
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.