* [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS
@ 2012-09-11 12:36 Jesper Dangaard Brouer
2012-09-11 12:36 ` [PATCH V3 1/8] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
` (8 more replies)
0 siblings, 9 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:36 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
The following patchset implement IPv6 fragment handling for IPVS.
This work is based upon patches from Hans Schillstrom. I have taken
over the patchset, in close agreement with Hans, because he don't have
(gotten allocated) time to complete his work.
I have cleaned up the patchset significantly, and split the patchset
up into eight patches.
The first 4 patches, are ready to be merged
Patch01: Trivial changes, use compressed IPv6 address in output
Patch02: IPv6 extend ICMPv6 handling for future types
Patch03: Use config macro IS_ENABLED()
Patch04: Fix bug in IPVS IPv6 NAT mangling of ports inside ICMPv6 packets
The next 4 patches, I consider V3 of the patches I have submitted
earlier, where I have incorporated all of Julian's feedback. I have
also tried to make the patches easier to review, by reorganizing the
changes, to be more strictly split (exthdr vs. fragment handling).
I have also removed the API changes, and moved those to patch07. This
is done, (1) to make it easier to review the patches, and (2) to allow
easier integration of Patricks idea and my RFC patch of caching exthdr
info in skb->cb[]. Thus, we can get these patches applied (and later
go back and apply the caching scheme easier).
Patch05: Fix faulty IPv6 extension header handling in IPVS
Patch06: Complete IPv6 fragment handling for IPVS
Patch07: IPVS API change to avoid rescan of IPv6 exthdr
Patch08: IPVS SIP fragment handling
The SIP frag handling have been split into its own patch, as I have
not been able to test this part my self.
This patchset is based upon:
Pablo's nf-next tree: git://1984.lsi.us.es/nf-next
On top of commit 0edd94887d19ad73539477395c17ea0d6898947a
---
Jesper Dangaard Brouer (8):
ipvs: SIP fragment handling
ipvs: API change to avoid rescan of IPv6 exthdr
ipvs: Complete IPv6 fragment handling for IPVS
ipvs: Fix faulty IPv6 extension header handling in IPVS
ipvs: Fix bug in IPv6 NAT mangling of ports inside ICMPv6 packets
ipvs: Use config macro IS_ENABLED()
ipvs: IPv6 extend ICMPv6 handling for future types
ipvs: Trivial changes, use compressed IPv6 address in output
include/net/ip_vs.h | 194 +++++++++++-----
net/netfilter/ipvs/Kconfig | 7 -
net/netfilter/ipvs/ip_vs_conn.c | 15 -
net/netfilter/ipvs/ip_vs_core.c | 384 +++++++++++++++++--------------
net/netfilter/ipvs/ip_vs_dh.c | 2
net/netfilter/ipvs/ip_vs_lblc.c | 2
net/netfilter/ipvs/ip_vs_lblcr.c | 2
net/netfilter/ipvs/ip_vs_pe_sip.c | 21 +-
net/netfilter/ipvs/ip_vs_proto.c | 6
net/netfilter/ipvs/ip_vs_proto_ah_esp.c | 9 -
net/netfilter/ipvs/ip_vs_proto_sctp.c | 42 +--
net/netfilter/ipvs/ip_vs_proto_tcp.c | 40 +--
net/netfilter/ipvs/ip_vs_proto_udp.c | 41 +--
net/netfilter/ipvs/ip_vs_sched.c | 2
net/netfilter/ipvs/ip_vs_sh.c | 2
net/netfilter/ipvs/ip_vs_xmit.c | 73 +++---
net/netfilter/xt_ipvs.c | 4
17 files changed, 491 insertions(+), 355 deletions(-)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH V3 1/8] ipvs: Trivial changes, use compressed IPv6 address in output
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
@ 2012-09-11 12:36 ` Jesper Dangaard Brouer
2012-09-11 12:36 ` [PATCH V3 2/8] ipvs: IPv6 extend ICMPv6 handling for future types Jesper Dangaard Brouer
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:36 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
Have not converted the proc file output to compressed IPv6 addresses.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/ip_vs.h | 2 +-
net/netfilter/ipvs/ip_vs_core.c | 2 +-
net/netfilter/ipvs/ip_vs_proto.c | 6 +++---
net/netfilter/ipvs/ip_vs_sched.c | 2 +-
net/netfilter/ipvs/ip_vs_xmit.c | 10 +++++-----
5 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index ee75ccd..aba0bb2 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -165,7 +165,7 @@ static inline const char *ip_vs_dbg_addr(int af, char *buf, size_t buf_len,
int len;
#ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6)
- len = snprintf(&buf[*idx], buf_len - *idx, "[%pI6]",
+ len = snprintf(&buf[*idx], buf_len - *idx, "[%pI6c]",
&addr->in6) + 1;
else
#endif
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 58918e2..4edb654 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1487,7 +1487,7 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
if (ic == NULL)
return NF_DROP;
- IP_VS_DBG(12, "Incoming ICMPv6 (%d,%d) %pI6->%pI6\n",
+ IP_VS_DBG(12, "Incoming ICMPv6 (%d,%d) %pI6c->%pI6c\n",
ic->icmp6_type, ntohs(icmpv6_id(ic)),
&iph->saddr, &iph->daddr);
diff --git a/net/netfilter/ipvs/ip_vs_proto.c b/net/netfilter/ipvs/ip_vs_proto.c
index 50d8218..939f7fb 100644
--- a/net/netfilter/ipvs/ip_vs_proto.c
+++ b/net/netfilter/ipvs/ip_vs_proto.c
@@ -280,17 +280,17 @@ ip_vs_tcpudp_debug_packet_v6(struct ip_vs_protocol *pp,
if (ih == NULL)
sprintf(buf, "TRUNCATED");
else if (ih->nexthdr == IPPROTO_FRAGMENT)
- sprintf(buf, "%pI6->%pI6 frag", &ih->saddr, &ih->daddr);
+ sprintf(buf, "%pI6c->%pI6c frag", &ih->saddr, &ih->daddr);
else {
__be16 _ports[2], *pptr;
pptr = skb_header_pointer(skb, offset + sizeof(struct ipv6hdr),
sizeof(_ports), _ports);
if (pptr == NULL)
- sprintf(buf, "TRUNCATED %pI6->%pI6",
+ sprintf(buf, "TRUNCATED %pI6c->%pI6c",
&ih->saddr, &ih->daddr);
else
- sprintf(buf, "%pI6:%u->%pI6:%u",
+ sprintf(buf, "%pI6c:%u->%pI6c:%u",
&ih->saddr, ntohs(pptr[0]),
&ih->daddr, ntohs(pptr[1]));
}
diff --git a/net/netfilter/ipvs/ip_vs_sched.c b/net/netfilter/ipvs/ip_vs_sched.c
index 08dbdd5..d6bf20d 100644
--- a/net/netfilter/ipvs/ip_vs_sched.c
+++ b/net/netfilter/ipvs/ip_vs_sched.c
@@ -159,7 +159,7 @@ void ip_vs_scheduler_err(struct ip_vs_service *svc, const char *msg)
svc->fwmark, msg);
#ifdef CONFIG_IP_VS_IPV6
} else if (svc->af == AF_INET6) {
- IP_VS_ERR_RL("%s: %s [%pI6]:%d - %s\n",
+ IP_VS_ERR_RL("%s: %s [%pI6c]:%d - %s\n",
svc->scheduler->name,
ip_vs_proto_name(svc->protocol),
&svc->addr.in6, ntohs(svc->port), msg);
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 56f6d5d..1060bd5 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -335,7 +335,7 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
local = __ip_vs_is_local_route6(rt);
if (!((local ? IP_VS_RT_MODE_LOCAL : IP_VS_RT_MODE_NON_LOCAL) &
rt_mode)) {
- IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI6\n",
+ IP_VS_DBG_RL("Stopping traffic to %s address, dest: %pI6c\n",
local ? "local":"non-local", daddr);
dst_release(&rt->dst);
return NULL;
@@ -343,8 +343,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
if (local && !(rt_mode & IP_VS_RT_MODE_RDR) &&
!((ort = (struct rt6_info *) skb_dst(skb)) &&
__ip_vs_is_local_route6(ort))) {
- IP_VS_DBG_RL("Redirect from non-local address %pI6 to local "
- "requires NAT method, dest: %pI6\n",
+ IP_VS_DBG_RL("Redirect from non-local address %pI6c to local "
+ "requires NAT method, dest: %pI6c\n",
&ipv6_hdr(skb)->daddr, daddr);
dst_release(&rt->dst);
return NULL;
@@ -352,8 +352,8 @@ __ip_vs_get_out_rt_v6(struct sk_buff *skb, struct ip_vs_dest *dest,
if (unlikely(!local && (!skb->dev || skb->dev->flags & IFF_LOOPBACK) &&
ipv6_addr_type(&ipv6_hdr(skb)->saddr) &
IPV6_ADDR_LOOPBACK)) {
- IP_VS_DBG_RL("Stopping traffic from loopback address %pI6 "
- "to non-local address, dest: %pI6\n",
+ IP_VS_DBG_RL("Stopping traffic from loopback address %pI6c "
+ "to non-local address, dest: %pI6c\n",
&ipv6_hdr(skb)->saddr, daddr);
dst_release(&rt->dst);
return NULL;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V3 2/8] ipvs: IPv6 extend ICMPv6 handling for future types
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
2012-09-11 12:36 ` [PATCH V3 1/8] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
@ 2012-09-11 12:36 ` Jesper Dangaard Brouer
2012-09-11 12:37 ` [PATCH V3 3/8] ipvs: Use config macro IS_ENABLED() Jesper Dangaard Brouer
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:36 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
Extend handling of ICMPv6, to all none Informational Messages
(via ICMPV6_INFOMSG_MASK). This actually only extend our handling to
type ICMPV6_PARAMPROB (Parameter Problem), and future types.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
This change have been pulled out the frag handling patch
as it was not related to IPv6 frag handling.
net/netfilter/ipvs/ip_vs_core.c | 8 ++------
1 files changed, 2 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 4edb654..ebd105c 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -950,9 +950,7 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
* this means that some packets will manage to get a long way
* down this stack and then be rejected, but that's life.
*/
- if ((ic->icmp6_type != ICMPV6_DEST_UNREACH) &&
- (ic->icmp6_type != ICMPV6_PKT_TOOBIG) &&
- (ic->icmp6_type != ICMPV6_TIME_EXCEED)) {
+ if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) {
*related = 0;
return NF_ACCEPT;
}
@@ -1498,9 +1496,7 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
* this means that some packets will manage to get a long way
* down this stack and then be rejected, but that's life.
*/
- if ((ic->icmp6_type != ICMPV6_DEST_UNREACH) &&
- (ic->icmp6_type != ICMPV6_PKT_TOOBIG) &&
- (ic->icmp6_type != ICMPV6_TIME_EXCEED)) {
+ if (ic->icmp6_type & ICMPV6_INFOMSG_MASK) {
*related = 0;
return NF_ACCEPT;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V3 3/8] ipvs: Use config macro IS_ENABLED()
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
2012-09-11 12:36 ` [PATCH V3 1/8] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
2012-09-11 12:36 ` [PATCH V3 2/8] ipvs: IPv6 extend ICMPv6 handling for future types Jesper Dangaard Brouer
@ 2012-09-11 12:37 ` Jesper Dangaard Brouer
2012-09-11 12:37 ` [PATCH V3 4/8] ipvs: Fix bug in IPv6 NAT mangling of ports inside ICMPv6 packets Jesper Dangaard Brouer
` (5 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:37 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
Cleanup patch.
Use the IS_ENABLED macro, instead of having to check
both the build and the module CONFIG_ option.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
include/net/ip_vs.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index aba0bb2..c8b2bdb 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -22,7 +22,7 @@
#include <linux/ip.h>
#include <linux/ipv6.h> /* for struct ipv6hdr */
#include <net/ipv6.h>
-#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
#include <net/netfilter/nf_conntrack.h>
#endif
#include <net/net_namespace.h> /* Netw namespace */
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V3 4/8] ipvs: Fix bug in IPv6 NAT mangling of ports inside ICMPv6 packets
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
` (2 preceding siblings ...)
2012-09-11 12:37 ` [PATCH V3 3/8] ipvs: Use config macro IS_ENABLED() Jesper Dangaard Brouer
@ 2012-09-11 12:37 ` Jesper Dangaard Brouer
2012-09-11 12:37 ` [PATCH V3 5/8] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
` (4 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:37 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
ICMPv6 return traffic, which needs to be NAT modified, does
not get modified correctly, because the SKB have not been
made sufficiently "writable".
Make sure SKB is writable in ip_vs_nat_icmp_v6().
Note, the calling code path have handled this case for IPv4, but
not for IPv6. I have placed the change in ip_vs_nat_icmp_v6()
in-order to reduce the changes/impact of that path.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
net/netfilter/ipvs/ip_vs_core.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index ebd105c..fd50f47 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -737,6 +737,12 @@ void ip_vs_nat_icmp_v6(struct sk_buff *skb, struct ip_vs_protocol *pp,
icmp_offset);
struct ipv6hdr *ciph = (struct ipv6hdr *)(icmph + 1);
+ /* Make sure SKB is writable */
+ unsigned int write;
+ write = icmp_offset + sizeof(struct icmp6hdr) + sizeof(struct ipv6hdr);
+ if (!skb_make_writable(skb, write + 2 * sizeof(__u16)))
+ return;
+
if (inout) {
iph->saddr = cp->vaddr.in6;
ciph->daddr = cp->vaddr.in6;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V3 5/8] ipvs: Fix faulty IPv6 extension header handling in IPVS
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
` (3 preceding siblings ...)
2012-09-11 12:37 ` [PATCH V3 4/8] ipvs: Fix bug in IPv6 NAT mangling of ports inside ICMPv6 packets Jesper Dangaard Brouer
@ 2012-09-11 12:37 ` Jesper Dangaard Brouer
2012-09-11 12:38 ` [PATCH V3 6/8] ipvs: Complete IPv6 fragment handling for IPVS Jesper Dangaard Brouer
` (3 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:37 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
IPv6 packets can contain extension headers, thus its wrong to assume
that the transport/upper-layer header, starts right after (struct
ipv6hdr) the IPv6 header. IPVS uses this false assumption, and will
write SNAT & DNAT modifications at a fixed pos which will corrupt the
message.
To fix this, proper header position must be found before modifying
packets. Introducing ip_vs_fill_iph_skb(), which uses ipv6_find_hdr()
to skip the exthdrs. It finds (1) the transport header offset, (2) the
protocol, and (3) detects if the packet is a fragment.
Note, that fragments in IPv6 is represented via an exthdr. Thus, this
is detected while skipping through the exthdrs.
This patch depends on commit 84018f55a:
"netfilter: ip6_tables: add flags parameter to ipv6_find_hdr()"
This also adds a dependency to ip6_tables.
Originally based on patch from: Hans Schillstrom
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
V3:
- No big API changes (these are postpone to later patch)
- Make ip_vs_in_icmp_v6() easier to review, avoid spliting related
changes across patches.
- Fix port write offset bug in ip_vs_in_icmp_v6()
- Move ip_vs_pe_sip.c changes to a seperate patch.
V2:
- Use macro IS_ENABLED(CONFIG_XXX)
- Re-add NULL pointer check in ip_vs_out_icmp_v6()
include/net/ip_vs.h | 72 +++++++++++-
net/netfilter/ipvs/Kconfig | 1
net/netfilter/ipvs/ip_vs_core.c | 191 +++++++++++++++------------------
net/netfilter/ipvs/ip_vs_dh.c | 2
net/netfilter/ipvs/ip_vs_lblc.c | 2
net/netfilter/ipvs/ip_vs_lblcr.c | 2
net/netfilter/ipvs/ip_vs_pe_sip.c | 2
net/netfilter/ipvs/ip_vs_proto_sctp.c | 22 ++--
net/netfilter/ipvs/ip_vs_proto_tcp.c | 22 ++--
net/netfilter/ipvs/ip_vs_proto_udp.c | 22 ++--
net/netfilter/ipvs/ip_vs_sh.c | 2
net/netfilter/ipvs/ip_vs_xmit.c | 5 +
net/netfilter/xt_ipvs.c | 2
13 files changed, 198 insertions(+), 149 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index c8b2bdb..29265bf 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -22,6 +22,9 @@
#include <linux/ip.h>
#include <linux/ipv6.h> /* for struct ipv6hdr */
#include <net/ipv6.h>
+#if IS_ENABLED(CONFIG_IPV6)
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
#include <net/netfilter/nf_conntrack.h>
#endif
@@ -103,30 +106,79 @@ static inline struct net *seq_file_single_net(struct seq_file *seq)
/* Connections' size value needed by ip_vs_ctl.c */
extern int ip_vs_conn_tab_size;
-
struct ip_vs_iphdr {
- int len;
- __u8 protocol;
+ __u32 len; /* IPv4 simply where L4 starts
+ IPv6 where L4 Transport Header starts */
+ __u16 fragoffs; /* IPv6 fragment offset, 0 if first frag (or not frag)*/
+ __s16 protocol;
+ __s32 flags;
union nf_inet_addr saddr;
union nf_inet_addr daddr;
};
static inline void
-ip_vs_fill_iphdr(int af, const void *nh, struct ip_vs_iphdr *iphdr)
+ip_vs_fill_ip4hdr(const void *nh, struct ip_vs_iphdr *iphdr)
+{
+ const struct iphdr *iph = nh;
+
+ iphdr->len = iph->ihl * 4;
+ iphdr->fragoffs = 0;
+ iphdr->protocol = iph->protocol;
+ iphdr->saddr.ip = iph->saddr;
+ iphdr->daddr.ip = iph->daddr;
+}
+
+/* This function handles filling *ip_vs_iphdr, both for IPv4 and IPv6.
+ * IPv6 requires some extra work, as finding proper header position,
+ * depend on the IPv6 extension headers.
+ */
+static inline void
+ip_vs_fill_iph_skb(int af, const struct sk_buff *skb, struct ip_vs_iphdr *iphdr)
{
#ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6) {
- const struct ipv6hdr *iph = nh;
- iphdr->len = sizeof(struct ipv6hdr);
- iphdr->protocol = iph->nexthdr;
+ const struct ipv6hdr *iph =
+ (struct ipv6hdr *)skb_network_header(skb);
iphdr->saddr.in6 = iph->saddr;
iphdr->daddr.in6 = iph->daddr;
+ /* ipv6_find_hdr() updates len, flags */
+ iphdr->len = 0;
+ iphdr->flags = 0;
+ iphdr->protocol = ipv6_find_hdr(skb, &iphdr->len, -1,
+ &iphdr->fragoffs,
+ &iphdr->flags);
} else
#endif
{
- const struct iphdr *iph = nh;
- iphdr->len = iph->ihl * 4;
- iphdr->protocol = iph->protocol;
+ const struct iphdr *iph =
+ (struct iphdr *)skb_network_header(skb);
+ iphdr->len = iph->ihl * 4;
+ iphdr->fragoffs = 0;
+ iphdr->protocol = iph->protocol;
+ iphdr->saddr.ip = iph->saddr;
+ iphdr->daddr.ip = iph->daddr;
+ }
+}
+
+/* This function is a faster version of ip_vs_fill_iph_skb().
+ * Where we only populate {s,d}addr (and avoid calling ipv6_find_hdr()).
+ * This is used by the some of the ip_vs_*_schedule() functions.
+ * (Mostly done to avoid ABI breakage of external schedulers)
+ */
+static inline void
+ip_vs_fill_iph_addr_only(int af, const struct sk_buff *skb,
+ struct ip_vs_iphdr *iphdr)
+{
+#ifdef CONFIG_IP_VS_IPV6
+ if (af == AF_INET6) {
+ const struct ipv6hdr *iph =
+ (struct ipv6hdr *)skb_network_header(skb);
+ iphdr->saddr.in6 = iph->saddr;
+ iphdr->daddr.in6 = iph->daddr;
+ } else {
+#endif
+ const struct iphdr *iph =
+ (struct iphdr *)skb_network_header(skb);
iphdr->saddr.ip = iph->saddr;
iphdr->daddr.ip = iph->daddr;
}
diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index 8b2cffd..a97ae53 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -28,6 +28,7 @@ if IP_VS
config IP_VS_IPV6
bool "IPv6 support for IPVS"
depends on IPV6 = y || IP_VS = IPV6
+ select IP6_NF_IPTABLES
---help---
Add IPv6 support to IPVS. This is incomplete and might be dangerous.
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index fd50f47..caed44d 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -236,7 +236,7 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
union nf_inet_addr snet; /* source network of the client,
after masking */
- ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(svc->af, skb, &iph);
/* Mask saddr with the netmask to adjust template granularity */
#ifdef CONFIG_IP_VS_IPV6
@@ -402,7 +402,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
unsigned int flags;
*ignored = 1;
- ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(svc->af, skb, &iph);
pptr = skb_header_pointer(skb, iph.len, sizeof(_ports), _ports);
if (pptr == NULL)
return NULL;
@@ -506,7 +506,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
int unicast;
#endif
- ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(svc->af, skb, &iph);
pptr = skb_header_pointer(skb, iph.len, sizeof(_ports), _ports);
if (pptr == NULL) {
@@ -732,15 +732,21 @@ void ip_vs_nat_icmp_v6(struct sk_buff *skb, struct ip_vs_protocol *pp,
struct ip_vs_conn *cp, int inout)
{
struct ipv6hdr *iph = ipv6_hdr(skb);
- unsigned int icmp_offset = sizeof(struct ipv6hdr);
- struct icmp6hdr *icmph = (struct icmp6hdr *)(skb_network_header(skb) +
- icmp_offset);
- struct ipv6hdr *ciph = (struct ipv6hdr *)(icmph + 1);
+ unsigned int icmp_offset = 0;
+ unsigned int offs = 0; /* header offset*/
+ int protocol;
+ struct icmp6hdr *icmph;
+ struct ipv6hdr *ciph;
+ unsigned short fragoffs;
+
+ ipv6_find_hdr(skb, &icmp_offset, IPPROTO_ICMPV6, &fragoffs, NULL);
+ icmph = (struct icmp6hdr *)(skb_network_header(skb) + icmp_offset);
+ offs = icmp_offset + sizeof(struct icmp6hdr);
+ ciph = (struct ipv6hdr *)(skb_network_header(skb) + offs);
+
+ protocol = ipv6_find_hdr(skb, &offs, -1, &fragoffs, NULL);
- /* Make sure SKB is writable */
- unsigned int write;
- write = icmp_offset + sizeof(struct icmp6hdr) + sizeof(struct ipv6hdr);
- if (!skb_make_writable(skb, write + 2 * sizeof(__u16)))
+ if (!skb_make_writable(skb, offs + 2 * sizeof(__u16)))
return;
if (inout) {
@@ -752,10 +758,13 @@ void ip_vs_nat_icmp_v6(struct sk_buff *skb, struct ip_vs_protocol *pp,
}
/* the TCP/UDP/SCTP port */
- if (IPPROTO_TCP == ciph->nexthdr || IPPROTO_UDP == ciph->nexthdr ||
- IPPROTO_SCTP == ciph->nexthdr) {
- __be16 *ports = (void *)ciph + sizeof(struct ipv6hdr);
+ if (!fragoffs && (IPPROTO_TCP == protocol || IPPROTO_UDP == protocol ||
+ IPPROTO_SCTP == protocol)) {
+ __be16 *ports = (void *)(skb_network_header(skb) + offs);
+ IP_VS_DBG(11, "%s() changed port %d to %d\n", __func__,
+ ntohs(inout ? ports[1] : ports[0]),
+ ntohs(inout ? cp->vport : cp->dport));
if (inout)
ports[1] = cp->vport;
else
@@ -904,9 +913,8 @@ static int ip_vs_out_icmp(struct sk_buff *skb, int *related,
IP_VS_DBG_PKT(11, AF_INET, pp, skb, offset,
"Checking outgoing ICMP for");
- offset += cih->ihl * 4;
-
- ip_vs_fill_iphdr(AF_INET, cih, &ciph);
+ ip_vs_fill_ip4hdr(cih, &ciph);
+ ciph.len += offset;
/* The embedded headers contain source and dest in reverse order */
cp = pp->conn_out_get(AF_INET, skb, &ciph, offset, 1);
if (!cp)
@@ -914,40 +922,33 @@ static int ip_vs_out_icmp(struct sk_buff *skb, int *related,
snet.ip = iph->saddr;
return handle_response_icmp(AF_INET, skb, &snet, cih->protocol, cp,
- pp, offset, ihl);
+ pp, ciph.len, ihl);
}
#ifdef CONFIG_IP_VS_IPV6
static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
unsigned int hooknum)
{
- struct ipv6hdr *iph;
struct icmp6hdr _icmph, *ic;
- struct ipv6hdr _ciph, *cih; /* The ip header contained
+ struct ipv6hdr _ip6, *ip6; /* The ip header contained
within the ICMP */
- struct ip_vs_iphdr ciph;
struct ip_vs_conn *cp;
struct ip_vs_protocol *pp;
- unsigned int offset;
union nf_inet_addr snet;
- *related = 1;
+ struct ip_vs_iphdr ipvsh_stack;
+ struct ip_vs_iphdr *ipvsh = &ipvsh_stack;
+ ip_vs_fill_iph_skb(AF_INET6, skb, ipvsh);
- /* reassemble IP fragments */
- if (ipv6_hdr(skb)->nexthdr == IPPROTO_FRAGMENT) {
- if (ip_vs_gather_frags_v6(skb, ip_vs_defrag_user(hooknum)))
- return NF_STOLEN;
- }
+ *related = 1;
- iph = ipv6_hdr(skb);
- offset = sizeof(struct ipv6hdr);
- ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph);
+ ic = skb_header_pointer(skb, ipvsh->len, sizeof(_icmph), &_icmph);
if (ic == NULL)
return NF_DROP;
- IP_VS_DBG(12, "Outgoing ICMPv6 (%d,%d) %pI6->%pI6\n",
+ IP_VS_DBG(12, "Outgoing ICMPv6 (%d,%d) %pI6c->%pI6c\n",
ic->icmp6_type, ntohs(icmpv6_id(ic)),
- &iph->saddr, &iph->daddr);
+ &ipvsh->saddr, &ipvsh->daddr);
/*
* Work through seeing if this is for us.
@@ -962,34 +963,28 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
}
/* Now find the contained IP header */
- offset += sizeof(_icmph);
- cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
- if (cih == NULL)
+ ipvsh->len += sizeof(_icmph);
+ ip6 = skb_header_pointer(skb, ipvsh->len, sizeof(_ip6), &_ip6);
+ if (ip6 == NULL)
return NF_ACCEPT; /* The packet looks wrong, ignore */
+ ipvsh->protocol = ipv6_find_hdr(skb, &ipvsh->len, -1,
+ &ipvsh->fragoffs, &ipvsh->flags);
- pp = ip_vs_proto_get(cih->nexthdr);
- if (!pp)
- return NF_ACCEPT;
-
- /* Is the embedded protocol header present? */
- /* TODO: we don't support fragmentation at the moment anyways */
- if (unlikely(cih->nexthdr == IPPROTO_FRAGMENT && pp->dont_defrag))
+ pp = ip_vs_proto_get(ipvsh->protocol);
+ if (!pp || (ipvsh->protocol < 0))
return NF_ACCEPT;
+ /* fill the rest of ipvsh */
+ ipvsh->saddr.in6 = ip6->saddr;
+ ipvsh->daddr.in6 = ip6->daddr;
- IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset,
- "Checking outgoing ICMPv6 for");
-
- offset += sizeof(struct ipv6hdr);
-
- ip_vs_fill_iphdr(AF_INET6, cih, &ciph);
/* The embedded headers contain source and dest in reverse order */
- cp = pp->conn_out_get(AF_INET6, skb, &ciph, offset, 1);
+ cp = pp->conn_out_get(AF_INET6, skb, ipvsh, ipvsh->len, 1);
if (!cp)
return NF_ACCEPT;
- snet.in6 = iph->saddr;
- return handle_response_icmp(AF_INET6, skb, &snet, cih->nexthdr, cp,
- pp, offset, sizeof(struct ipv6hdr));
+ snet.in6 = ipvsh->saddr.in6;
+ return handle_response_icmp(AF_INET6, skb, &snet, ipvsh->protocol, cp,
+ pp, ipvsh->len, sizeof(struct ipv6hdr));
}
#endif
@@ -1119,7 +1114,7 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (!net_ipvs(net)->enable)
return NF_ACCEPT;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(af, skb, &iph);
#ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6) {
if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
@@ -1129,7 +1124,7 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (related)
return verdict;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(af, skb, &iph);
}
} else
#endif
@@ -1139,7 +1134,7 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (related)
return verdict;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_ip4hdr(skb_network_header(skb), &iph);
}
pd = ip_vs_proto_data_get(net, iph.protocol);
@@ -1149,22 +1144,14 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
/* reassemble IP fragments */
#ifdef CONFIG_IP_VS_IPV6
- if (af == AF_INET6) {
- if (ipv6_hdr(skb)->nexthdr == IPPROTO_FRAGMENT) {
- if (ip_vs_gather_frags_v6(skb,
- ip_vs_defrag_user(hooknum)))
- return NF_STOLEN;
- }
-
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
- } else
+ if (af == AF_INET)
#endif
if (unlikely(ip_is_fragment(ip_hdr(skb)) && !pp->dont_defrag)) {
if (ip_vs_gather_frags(skb,
ip_vs_defrag_user(hooknum)))
return NF_STOLEN;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_ip4hdr(skb_network_header(skb), &iph);
}
/*
@@ -1379,9 +1366,9 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
"Checking incoming ICMP for");
offset2 = offset;
- offset += cih->ihl * 4;
-
- ip_vs_fill_iphdr(AF_INET, cih, &ciph);
+ ip_vs_fill_ip4hdr(cih, &ciph);
+ ciph.len += offset;
+ offset = ciph.len;
/* The embedded headers contain source and dest in reverse order.
* For IPIP this is error for request, not for reply.
*/
@@ -1467,27 +1454,21 @@ static int
ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
{
struct net *net = NULL;
- struct ipv6hdr *iph;
+ struct ipv6hdr _ip6h, *ip6h;
struct icmp6hdr _icmph, *ic;
- struct ipv6hdr _ciph, *cih; /* The ip header contained
- within the ICMP */
struct ip_vs_iphdr ciph;
struct ip_vs_conn *cp;
struct ip_vs_protocol *pp;
struct ip_vs_proto_data *pd;
- unsigned int offset, verdict;
+ unsigned int offs_ciph, verdict;
- *related = 1;
+ struct ip_vs_iphdr iph_stack;
+ struct ip_vs_iphdr *iph = &iph_stack;
+ ip_vs_fill_iph_skb(AF_INET6, skb, iph);
- /* reassemble IP fragments */
- if (ipv6_hdr(skb)->nexthdr == IPPROTO_FRAGMENT) {
- if (ip_vs_gather_frags_v6(skb, ip_vs_defrag_user(hooknum)))
- return NF_STOLEN;
- }
+ *related = 1;
- iph = ipv6_hdr(skb);
- offset = sizeof(struct ipv6hdr);
- ic = skb_header_pointer(skb, offset, sizeof(_icmph), &_icmph);
+ ic = skb_header_pointer(skb, iph->len, sizeof(_icmph), &_icmph);
if (ic == NULL)
return NF_DROP;
@@ -1508,39 +1489,40 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
}
/* Now find the contained IP header */
- offset += sizeof(_icmph);
- cih = skb_header_pointer(skb, offset, sizeof(_ciph), &_ciph);
- if (cih == NULL)
+ ciph.len = iph->len + sizeof(_icmph);
+ offs_ciph = ciph.len; /* Save ip header offset */
+ ip6h = skb_header_pointer(skb, ciph.len, sizeof(_ip6h), (void *)&_ip6h);
+ if (ip6h == NULL)
return NF_ACCEPT; /* The packet looks wrong, ignore */
+ ciph.protocol = ipv6_find_hdr(skb, &ciph.len, -1, &ciph.fragoffs, NULL);
+ ciph.saddr.in6 = ip6h->saddr; /* conn_in_get() handles reverse order */
+ ciph.daddr.in6 = ip6h->daddr;
net = skb_net(skb);
- pd = ip_vs_proto_data_get(net, cih->nexthdr);
+ pd = ip_vs_proto_data_get(net, ciph.protocol);
if (!pd)
return NF_ACCEPT;
pp = pd->pp;
- /* Is the embedded protocol header present? */
- /* TODO: we don't support fragmentation at the moment anyways */
- if (unlikely(cih->nexthdr == IPPROTO_FRAGMENT && pp->dont_defrag))
+ /* Cannot handle fragmented embedded protocol */
+ if (ciph.fragoffs)
return NF_ACCEPT;
- IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offset,
+ IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offs_ciph,
"Checking incoming ICMPv6 for");
- offset += sizeof(struct ipv6hdr);
-
- ip_vs_fill_iphdr(AF_INET6, cih, &ciph);
/* The embedded headers contain source and dest in reverse order */
- cp = pp->conn_in_get(AF_INET6, skb, &ciph, offset, 1);
+ cp = pp->conn_in_get(AF_INET6, skb, &ciph, ciph.len, 1);
if (!cp)
return NF_ACCEPT;
/* do the statistics and put it back */
ip_vs_in_stats(cp, skb);
- if (IPPROTO_TCP == cih->nexthdr || IPPROTO_UDP == cih->nexthdr ||
- IPPROTO_SCTP == cih->nexthdr)
- offset += 2 * sizeof(__u16);
- verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offset, hooknum);
+ if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
+ IPPROTO_SCTP == ciph.protocol)
+ offs_ciph = ciph.len + (2 * sizeof(__u16));
+
+ verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offs_ciph, hooknum);
__ip_vs_conn_put(cp);
@@ -1576,7 +1558,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
if (unlikely((skb->pkt_type != PACKET_HOST &&
hooknum != NF_INET_LOCAL_OUT) ||
!skb_dst(skb))) {
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(af, skb, &iph);
IP_VS_DBG_BUF(12, "packet type=%d proto=%d daddr=%s"
" ignored in hook %u\n",
skb->pkt_type, iph.protocol,
@@ -1588,7 +1570,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
if (!net_ipvs(net)->enable)
return NF_ACCEPT;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(af, skb, &iph);
/* Bad... Do not break raw sockets */
if (unlikely(skb->sk != NULL && hooknum == NF_INET_LOCAL_OUT &&
@@ -1608,7 +1590,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
if (related)
return verdict;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
}
} else
#endif
@@ -1618,7 +1599,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
if (related)
return verdict;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
}
/* Protocol supported? */
@@ -1628,10 +1608,11 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
pp = pd->pp;
/*
* Check if the packet belongs to an existing connection entry
+ * Only sched first IPv6 fragment.
*/
cp = pp->conn_in_get(af, skb, &iph, iph.len, 0);
- if (unlikely(!cp)) {
+ if (unlikely(!cp) && !iph.fragoffs) {
int v;
if (!pp->conn_schedule(af, skb, pd, &v, &cp))
@@ -1795,8 +1776,10 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
{
int r;
struct net *net;
+ struct ip_vs_iphdr iphdr;
- if (ipv6_hdr(skb)->nexthdr != IPPROTO_ICMPV6)
+ ip_vs_fill_iph_skb(AF_INET6, skb, &iphdr);
+ if (iphdr.protocol != IPPROTO_ICMPV6)
return NF_ACCEPT;
/* ipvs enabled in this netns ? */
diff --git a/net/netfilter/ipvs/ip_vs_dh.c b/net/netfilter/ipvs/ip_vs_dh.c
index 8b7dca9..7f3b0cc 100644
--- a/net/netfilter/ipvs/ip_vs_dh.c
+++ b/net/netfilter/ipvs/ip_vs_dh.c
@@ -215,7 +215,7 @@ ip_vs_dh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
struct ip_vs_dh_bucket *tbl;
struct ip_vs_iphdr iph;
- ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
diff --git a/net/netfilter/ipvs/ip_vs_lblc.c b/net/netfilter/ipvs/ip_vs_lblc.c
index df646cc..cbd3748 100644
--- a/net/netfilter/ipvs/ip_vs_lblc.c
+++ b/net/netfilter/ipvs/ip_vs_lblc.c
@@ -479,7 +479,7 @@ ip_vs_lblc_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
struct ip_vs_dest *dest = NULL;
struct ip_vs_lblc_entry *en;
- ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
diff --git a/net/netfilter/ipvs/ip_vs_lblcr.c b/net/netfilter/ipvs/ip_vs_lblcr.c
index 570e31e..161b679 100644
--- a/net/netfilter/ipvs/ip_vs_lblcr.c
+++ b/net/netfilter/ipvs/ip_vs_lblcr.c
@@ -649,7 +649,7 @@ ip_vs_lblcr_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
struct ip_vs_dest *dest = NULL;
struct ip_vs_lblcr_entry *en;
- ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
IP_VS_DBG(6, "%s(): Scheduling...\n", __func__);
diff --git a/net/netfilter/ipvs/ip_vs_pe_sip.c b/net/netfilter/ipvs/ip_vs_pe_sip.c
index 1aa5cac..ee4e2e3 100644
--- a/net/netfilter/ipvs/ip_vs_pe_sip.c
+++ b/net/netfilter/ipvs/ip_vs_pe_sip.c
@@ -73,7 +73,7 @@ ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
const char *dptr;
int retc;
- ip_vs_fill_iphdr(p->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(p->af, skb, &iph);
/* Only useful with UDP */
if (iph.protocol != IPPROTO_UDP)
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 9f3fb75..b903db6 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -18,7 +18,7 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
sctp_sctphdr_t *sh, _sctph;
struct ip_vs_iphdr iph;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(af, skb, &iph);
sh = skb_header_pointer(skb, iph.len, sizeof(_sctph), &_sctph);
if (sh == NULL)
@@ -72,12 +72,14 @@ sctp_snat_handler(struct sk_buff *skb,
struct sk_buff *iter;
__be32 crc32;
+ struct ip_vs_iphdr iph;
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
+ sctphoff = iph.len;
+
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6)
- sctphoff = sizeof(struct ipv6hdr);
- else
+ if (cp->af == AF_INET6 && iph.fragoffs)
+ return 1;
#endif
- sctphoff = ip_hdrlen(skb);
/* csum_check requires unshared skb */
if (!skb_make_writable(skb, sctphoff + sizeof(*sctph)))
@@ -116,12 +118,14 @@ sctp_dnat_handler(struct sk_buff *skb,
struct sk_buff *iter;
__be32 crc32;
+ struct ip_vs_iphdr iph;
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
+ sctphoff = iph.len;
+
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6)
- sctphoff = sizeof(struct ipv6hdr);
- else
+ if (cp->af == AF_INET6 && iph.fragoffs)
+ return 1;
#endif
- sctphoff = ip_hdrlen(skb);
/* csum_check requires unshared skb */
if (!skb_make_writable(skb, sctphoff + sizeof(*sctph)))
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index cd609cc..8a96069 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -40,7 +40,7 @@ tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
struct tcphdr _tcph, *th;
struct ip_vs_iphdr iph;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(af, skb, &iph);
th = skb_header_pointer(skb, iph.len, sizeof(_tcph), &_tcph);
if (th == NULL) {
@@ -136,12 +136,14 @@ tcp_snat_handler(struct sk_buff *skb,
int oldlen;
int payload_csum = 0;
+ struct ip_vs_iphdr iph;
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
+ tcphoff = iph.len;
+
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6)
- tcphoff = sizeof(struct ipv6hdr);
- else
+ if (cp->af == AF_INET6 && iph.fragoffs)
+ return 1;
#endif
- tcphoff = ip_hdrlen(skb);
oldlen = skb->len - tcphoff;
/* csum_check requires unshared skb */
@@ -216,12 +218,14 @@ tcp_dnat_handler(struct sk_buff *skb,
int oldlen;
int payload_csum = 0;
+ struct ip_vs_iphdr iph;
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
+ tcphoff = iph.len;
+
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6)
- tcphoff = sizeof(struct ipv6hdr);
- else
+ if (cp->af == AF_INET6 && iph.fragoffs)
+ return 1;
#endif
- tcphoff = ip_hdrlen(skb);
oldlen = skb->len - tcphoff;
/* csum_check requires unshared skb */
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index 2fedb2d..d6f4eee 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -37,7 +37,7 @@ udp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
struct udphdr _udph, *uh;
struct ip_vs_iphdr iph;
- ip_vs_fill_iphdr(af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(af, skb, &iph);
uh = skb_header_pointer(skb, iph.len, sizeof(_udph), &_udph);
if (uh == NULL) {
@@ -133,12 +133,14 @@ udp_snat_handler(struct sk_buff *skb,
int oldlen;
int payload_csum = 0;
+ struct ip_vs_iphdr iph;
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
+ udphoff = iph.len;
+
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6)
- udphoff = sizeof(struct ipv6hdr);
- else
+ if (cp->af == AF_INET6 && iph.fragoffs)
+ return 1;
#endif
- udphoff = ip_hdrlen(skb);
oldlen = skb->len - udphoff;
/* csum_check requires unshared skb */
@@ -218,12 +220,14 @@ udp_dnat_handler(struct sk_buff *skb,
int oldlen;
int payload_csum = 0;
+ struct ip_vs_iphdr iph;
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
+ udphoff = iph.len;
+
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6)
- udphoff = sizeof(struct ipv6hdr);
- else
+ if (cp->af == AF_INET6 && iph.fragoffs)
+ return 1;
#endif
- udphoff = ip_hdrlen(skb);
oldlen = skb->len - udphoff;
/* csum_check requires unshared skb */
diff --git a/net/netfilter/ipvs/ip_vs_sh.c b/net/netfilter/ipvs/ip_vs_sh.c
index 0512652..e331269 100644
--- a/net/netfilter/ipvs/ip_vs_sh.c
+++ b/net/netfilter/ipvs/ip_vs_sh.c
@@ -228,7 +228,7 @@ ip_vs_sh_schedule(struct ip_vs_service *svc, const struct sk_buff *skb)
struct ip_vs_sh_bucket *tbl;
struct ip_vs_iphdr iph;
- ip_vs_fill_iphdr(svc->af, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_addr_only(svc->af, skb, &iph);
IP_VS_DBG(6, "ip_vs_sh_schedule(): Scheduling...\n");
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 1060bd5..428de75 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -679,14 +679,15 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
struct rt6_info *rt; /* Route to the other host */
int mtu;
int local;
+ struct ip_vs_iphdr iph;
EnterFunction(10);
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
/* check if it is a connection of no-client-port */
if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT)) {
__be16 _pt, *p;
- p = skb_header_pointer(skb, sizeof(struct ipv6hdr),
- sizeof(_pt), &_pt);
+ p = skb_header_pointer(skb, iph.len, sizeof(_pt), &_pt);
if (p == NULL)
goto tx_error;
ip_vs_conn_fill_cport(cp, *p);
diff --git a/net/netfilter/xt_ipvs.c b/net/netfilter/xt_ipvs.c
index bb10b07..3f9b8cd 100644
--- a/net/netfilter/xt_ipvs.c
+++ b/net/netfilter/xt_ipvs.c
@@ -67,7 +67,7 @@ ipvs_mt(const struct sk_buff *skb, struct xt_action_param *par)
goto out;
}
- ip_vs_fill_iphdr(family, skb_network_header(skb), &iph);
+ ip_vs_fill_iph_skb(family, skb, &iph);
if (data->bitmask & XT_IPVS_PROTO)
if ((iph.protocol == data->l4proto) ^
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V3 6/8] ipvs: Complete IPv6 fragment handling for IPVS
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
` (4 preceding siblings ...)
2012-09-11 12:37 ` [PATCH V3 5/8] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
@ 2012-09-11 12:38 ` Jesper Dangaard Brouer
2012-09-11 12:38 ` [PATCH V3 7/8] ipvs: API change to avoid rescan of IPv6 exthdr Jesper Dangaard Brouer
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:38 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
IPVS now supports fragmented packets, with support from nf_conntrack_reasm.c
Based on patch from: Hans Schillstrom.
IPVS do like conntrack i.e. use the skb->nfct_reasm
(i.e. when all fragments is collected, nf_ct_frag6_output()
starts a "re-play" of all fragments into the interrupted
PREROUTING chain at prio -399 (NF_IP6_PRI_CONNTRACK_DEFRAG+1)
with nfct_reasm pointing to the assembled packet.)
Notice, module nf_defrag_ipv6 must be loaded for this to work.
Report unhandled fragments, and recommend user to load nf_defrag_ipv6.
To handle fw-mark for fragments. Add a new IPVS hook into prerouting
chain at prio -99 (NF_IP6_PRI_NAT_DST+1) to catch fragments, and copy
fw-mark info from the first packet with an upper layer header.
IPv6 fragment handling should be the last thing on the IPVS IPv6
missing support list.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
---
V3:
- In case of no nf_defrag_ipv6, second fragments could create
strange conn entries, fix that.
- Report unhandled fragments, and recommend user to load nf_defrag_ipv6.
- Move ICMPv6 improvements to seperate patch
V2:
- In ip_vs_in_icmp_v6() hint that &ciph.len can be updated
- Add NULL pointer check in ip_vs_in_icmp_v6()
V1:
- Fixed refcnt bug since last.
include/net/ip_vs.h | 39 +++++++++++++
net/netfilter/ipvs/Kconfig | 6 +-
net/netfilter/ipvs/ip_vs_conn.c | 2 -
net/netfilter/ipvs/ip_vs_core.c | 117 ++++++++++++++++++++++++++++++++-------
net/netfilter/ipvs/ip_vs_xmit.c | 33 ++++++++---
5 files changed, 162 insertions(+), 35 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 29265bf..98806b6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -109,6 +109,7 @@ extern int ip_vs_conn_tab_size;
struct ip_vs_iphdr {
__u32 len; /* IPv4 simply where L4 starts
IPv6 where L4 Transport Header starts */
+ __u32 thoff_reasm; /* Transport Header Offset in nfct_reasm skb */
__u16 fragoffs; /* IPv6 fragment offset, 0 if first frag (or not frag)*/
__s16 protocol;
__s32 flags;
@@ -116,6 +117,35 @@ struct ip_vs_iphdr {
union nf_inet_addr daddr;
};
+/* Dependency to module: nf_defrag_ipv6 */
+#if defined(CONFIG_NF_DEFRAG_IPV6) || defined(CONFIG_NF_DEFRAG_IPV6_MODULE)
+static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
+{
+ return skb->nfct_reasm;
+}
+static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
+ int len, void *buffer,
+ const struct ip_vs_iphdr *ipvsh)
+{
+ if (unlikely(ipvsh->fragoffs && skb_nfct_reasm(skb)))
+ return skb_header_pointer(skb_nfct_reasm(skb),
+ ipvsh->thoff_reasm, len, buffer);
+
+ return skb_header_pointer(skb, offset, len, buffer);
+}
+#else
+static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
+{
+ return NULL;
+}
+static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
+ int len, void *buffer,
+ const struct ip_vs_iphdr *ipvsh)
+{
+ return skb_header_pointer(skb, offset, len, buffer);
+}
+#endif
+
static inline void
ip_vs_fill_ip4hdr(const void *nh, struct ip_vs_iphdr *iphdr)
{
@@ -141,12 +171,19 @@ ip_vs_fill_iph_skb(int af, const struct sk_buff *skb, struct ip_vs_iphdr *iphdr)
(struct ipv6hdr *)skb_network_header(skb);
iphdr->saddr.in6 = iph->saddr;
iphdr->daddr.in6 = iph->daddr;
- /* ipv6_find_hdr() updates len, flags */
+ /* ipv6_find_hdr() updates len, flags, thoff_reasm */
+ iphdr->thoff_reasm = 0;
iphdr->len = 0;
iphdr->flags = 0;
iphdr->protocol = ipv6_find_hdr(skb, &iphdr->len, -1,
&iphdr->fragoffs,
&iphdr->flags);
+ /* get proto from re-assembled packet and it's offset */
+ if (skb_nfct_reasm(skb))
+ iphdr->protocol = ipv6_find_hdr(skb_nfct_reasm(skb),
+ &iphdr->thoff_reasm,
+ -1, NULL, NULL);
+
} else
#endif
{
diff --git a/net/netfilter/ipvs/Kconfig b/net/netfilter/ipvs/Kconfig
index a97ae53..0c3b167 100644
--- a/net/netfilter/ipvs/Kconfig
+++ b/net/netfilter/ipvs/Kconfig
@@ -30,11 +30,9 @@ config IP_VS_IPV6
depends on IPV6 = y || IP_VS = IPV6
select IP6_NF_IPTABLES
---help---
- Add IPv6 support to IPVS. This is incomplete and might be dangerous.
+ Add IPv6 support to IPVS.
- See http://www.mindbasket.com/ipvs for more information.
-
- Say N if unsure.
+ Say Y if unsure.
config IP_VS_DEBUG
bool "IP virtual server debugging"
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index 1548df9..d6c1c26 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -314,7 +314,7 @@ ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
__be16 _ports[2], *pptr;
struct net *net = skb_net(skb);
- pptr = skb_header_pointer(skb, proto_off, sizeof(_ports), _ports);
+ pptr = frag_safe_skb_hp(skb, proto_off, sizeof(_ports), _ports, iph);
if (pptr == NULL)
return 1;
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index caed44d..4b9995e 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -402,8 +402,12 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
unsigned int flags;
*ignored = 1;
+
+ /*
+ * IPv6 frags, only the first hit here.
+ */
ip_vs_fill_iph_skb(svc->af, skb, &iph);
- pptr = skb_header_pointer(skb, iph.len, sizeof(_ports), _ports);
+ pptr = frag_safe_skb_hp(skb, iph.len, sizeof(_ports), _ports, &iph);
if (pptr == NULL)
return NULL;
@@ -507,8 +511,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
#endif
ip_vs_fill_iph_skb(svc->af, skb, &iph);
-
- pptr = skb_header_pointer(skb, iph.len, sizeof(_ports), _ports);
+ pptr = frag_safe_skb_hp(skb, iph.len, sizeof(_ports), _ports, &iph);
if (pptr == NULL) {
ip_vs_service_put(svc);
return NF_DROP;
@@ -654,14 +657,6 @@ static inline int ip_vs_gather_frags(struct sk_buff *skb, u_int32_t user)
return err;
}
-#ifdef CONFIG_IP_VS_IPV6
-static inline int ip_vs_gather_frags_v6(struct sk_buff *skb, u_int32_t user)
-{
- /* TODO IPv6: Find out what to do here for IPv6 */
- return 0;
-}
-#endif
-
static int ip_vs_route_me_harder(int af, struct sk_buff *skb)
{
#ifdef CONFIG_IP_VS_IPV6
@@ -941,8 +936,7 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
ip_vs_fill_iph_skb(AF_INET6, skb, ipvsh);
*related = 1;
-
- ic = skb_header_pointer(skb, ipvsh->len, sizeof(_icmph), &_icmph);
+ ic = frag_safe_skb_hp(skb, ipvsh->len, sizeof(_icmph), &_icmph, ipvsh);
if (ic == NULL)
return NF_DROP;
@@ -961,6 +955,11 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
*related = 0;
return NF_ACCEPT;
}
+ /* Fragment header that is before ICMP header tells us that:
+ * it's not an error message since they can't be fragmented.
+ */
+ if (ipvsh->flags & IP6T_FH_F_FRAG)
+ return NF_DROP;
/* Now find the contained IP header */
ipvsh->len += sizeof(_icmph);
@@ -1117,6 +1116,12 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
ip_vs_fill_iph_skb(af, skb, &iph);
#ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6) {
+ if (!iph.fragoffs && skb_nfct_reasm(skb)) {
+ struct sk_buff *reasm = skb_nfct_reasm(skb);
+ /* Save fw mark for coming frags */
+ reasm->ipvs_property = 1;
+ reasm->mark = skb->mark;
+ }
if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
int related;
int verdict = ip_vs_out_icmp_v6(skb, &related,
@@ -1124,7 +1129,6 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (related)
return verdict;
- ip_vs_fill_iph_skb(af, skb, &iph);
}
} else
#endif
@@ -1134,7 +1138,6 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (related)
return verdict;
- ip_vs_fill_ip4hdr(skb_network_header(skb), &iph);
}
pd = ip_vs_proto_data_get(net, iph.protocol);
@@ -1167,8 +1170,8 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
pp->protocol == IPPROTO_SCTP)) {
__be16 _ports[2], *pptr;
- pptr = skb_header_pointer(skb, iph.len,
- sizeof(_ports), _ports);
+ pptr = frag_safe_skb_hp(skb, iph.len,
+ sizeof(_ports), _ports, &iph);
if (pptr == NULL)
return NF_ACCEPT; /* Not for me */
if (ip_vs_lookup_real_service(net, af, iph.protocol,
@@ -1468,7 +1471,7 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
*related = 1;
- ic = skb_header_pointer(skb, iph->len, sizeof(_icmph), &_icmph);
+ ic = frag_safe_skb_hp(skb, iph->len, sizeof(_icmph), &_icmph, iph);
if (ic == NULL)
return NF_DROP;
@@ -1487,6 +1490,11 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
*related = 0;
return NF_ACCEPT;
}
+ /* Fragment header that is before ICMP header tells us that:
+ * it's not an error message since they can't be fragmented.
+ */
+ if (iph->flags & IP6T_FH_F_FRAG)
+ return NF_DROP;
/* Now find the contained IP header */
ciph.len = iph->len + sizeof(_icmph);
@@ -1511,10 +1519,20 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
IP_VS_DBG_PKT(11, AF_INET6, pp, skb, offs_ciph,
"Checking incoming ICMPv6 for");
- /* The embedded headers contain source and dest in reverse order */
- cp = pp->conn_in_get(AF_INET6, skb, &ciph, ciph.len, 1);
+ /* The embedded headers contain source and dest in reverse order
+ * if not from localhost
+ */
+ cp = pp->conn_in_get(AF_INET6, skb, &ciph, ciph.len,
+ (hooknum == NF_INET_LOCAL_OUT) ? 0 : 1);
+
if (!cp)
return NF_ACCEPT;
+ /* VS/TUN, VS/DR and LOCALNODE just let it go */
+ if ((hooknum == NF_INET_LOCAL_OUT) &&
+ (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ)) {
+ __ip_vs_conn_put(cp);
+ return NF_ACCEPT;
+ }
/* do the statistics and put it back */
ip_vs_in_stats(cp, skb);
@@ -1584,6 +1602,12 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
#ifdef CONFIG_IP_VS_IPV6
if (af == AF_INET6) {
+ if (!iph.fragoffs && skb_nfct_reasm(skb)) {
+ struct sk_buff *reasm = skb_nfct_reasm(skb);
+ /* Save fw mark for coming frags. */
+ reasm->ipvs_property = 1;
+ reasm->mark = skb->mark;
+ }
if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
int related;
int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum);
@@ -1608,13 +1632,16 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
pp = pd->pp;
/*
* Check if the packet belongs to an existing connection entry
- * Only sched first IPv6 fragment.
*/
cp = pp->conn_in_get(af, skb, &iph, iph.len, 0);
if (unlikely(!cp) && !iph.fragoffs) {
+ /* No (second) fragments need to enter here, as nf_defrag_ipv6
+ * replayed fragment zero will already have created the cp
+ */
int v;
+ /* Schedule and create new connection entry into &cp */
if (!pp->conn_schedule(af, skb, pd, &v, &cp))
return v;
}
@@ -1623,6 +1650,14 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
/* sorry, all this trouble for a no-hit :) */
IP_VS_DBG_PKT(12, af, pp, skb, 0,
"ip_vs_in: packet continues traversal as normal");
+ if (iph.fragoffs && !skb_nfct_reasm(skb)) {
+ /* Fragment that couldn't be mapped to a conn entry
+ * and don't have any pointer to a reasm skb
+ * is missing module nf_defrag_ipv6
+ */
+ IP_VS_DBG_RL("Unhandled frag, load nf_defrag_ipv6\n");
+ IP_VS_DBG_PKT(7, af, pp, skb, 0, "unhandled fragment");
+ }
return NF_ACCEPT;
}
@@ -1707,6 +1742,38 @@ ip_vs_local_request4(unsigned int hooknum, struct sk_buff *skb,
#ifdef CONFIG_IP_VS_IPV6
/*
+ * AF_INET6 fragment handling
+ * Copy info from first fragment, to the rest of them.
+ */
+static unsigned int
+ip_vs_preroute_frag6(unsigned int hooknum, struct sk_buff *skb,
+ const struct net_device *in,
+ const struct net_device *out,
+ int (*okfn)(struct sk_buff *))
+{
+ struct sk_buff *reasm = skb_nfct_reasm(skb);
+ struct net *net;
+
+ /* Skip if not a "replay" from nf_ct_frag6_output or first fragment.
+ * ipvs_property is set when checking first fragment
+ * in ip_vs_in() and ip_vs_out().
+ */
+ if (reasm)
+ IP_VS_DBG(2, "Fragment recv prop:%d\n", reasm->ipvs_property);
+ if (!reasm || !reasm->ipvs_property)
+ return NF_ACCEPT;
+
+ net = skb_net(skb);
+ if (!net_ipvs(net)->enable)
+ return NF_ACCEPT;
+
+ /* Copy stored fw mark, saved in ip_vs_{in,out} */
+ skb->mark = reasm->mark;
+
+ return NF_ACCEPT;
+}
+
+/*
* AF_INET6 handler in NF_INET_LOCAL_IN chain
* Schedule and forward packets from remote clients
*/
@@ -1845,6 +1912,14 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
.priority = 100,
},
#ifdef CONFIG_IP_VS_IPV6
+ /* After mangle & nat fetch 2:nd fragment and following */
+ {
+ .hook = ip_vs_preroute_frag6,
+ .owner = THIS_MODULE,
+ .pf = NFPROTO_IPV6,
+ .hooknum = NF_INET_PRE_ROUTING,
+ .priority = NF_IP6_PRI_NAT_DST + 1,
+ },
/* After packet filtering, change source only for VS/NAT */
{
.hook = ip_vs_reply6,
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 428de75..e44933f 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -496,12 +496,13 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
struct ip_vs_protocol *pp)
{
struct rt6_info *rt; /* Route to the other host */
- struct ipv6hdr *iph = ipv6_hdr(skb);
+ struct ip_vs_iphdr iph;
int mtu;
EnterFunction(10);
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
- if (!(rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph->daddr, NULL, 0,
+ if (!(rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph.daddr.in6, NULL, 0,
IP_VS_RT_MODE_NON_LOCAL)))
goto tx_error_icmp;
@@ -513,7 +514,9 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
- icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ /* only send ICMP too big on first fragment */
+ if (!iph.fragoffs)
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
dst_release(&rt->dst);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
goto tx_error;
@@ -685,7 +688,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
ip_vs_fill_iph_skb(cp->af, skb, &iph);
/* check if it is a connection of no-client-port */
- if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT)) {
+ if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT && !iph.fragoffs)) {
__be16 _pt, *p;
p = skb_header_pointer(skb, iph.len, sizeof(_pt), &_pt);
if (p == NULL)
@@ -735,7 +738,9 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
- icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ /* only send ICMP too big on first fragment */
+ if (!iph.fragoffs)
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
IP_VS_DBG_RL_PKT(0, AF_INET6, pp, skb, 0,
"ip_vs_nat_xmit_v6(): frag needed for");
goto tx_error_put;
@@ -940,8 +945,10 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
unsigned int max_headroom; /* The extra header space needed */
int mtu;
int ret;
+ struct ip_vs_iphdr ipvsh;
EnterFunction(10);
+ ip_vs_fill_iph_skb(cp->af, skb, &ipvsh);
if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6,
&saddr, 1, (IP_VS_RT_MODE_LOCAL |
@@ -970,7 +977,9 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
- icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ /* only send ICMP too big on first fragment */
+ if (!ipvsh.fragoffs)
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
goto tx_error_put;
}
@@ -1116,8 +1125,10 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
{
struct rt6_info *rt; /* Route to the other host */
int mtu;
+ struct ip_vs_iphdr iph;
EnterFunction(10);
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
0, (IP_VS_RT_MODE_LOCAL |
@@ -1136,7 +1147,9 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
- icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ /* only send ICMP too big on first fragment */
+ if (!iph.fragoffs)
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
dst_release(&rt->dst);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
goto tx_error;
@@ -1308,8 +1321,10 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
int rc;
int local;
int rt_mode;
+ struct ip_vs_iphdr iph;
EnterFunction(10);
+ ip_vs_fill_iph_skb(cp->af, skb, &iph);
/* The ICMP packet for VS/TUN, VS/DR and LOCALNODE will be
forwarded directly here, because there is no need to
@@ -1372,7 +1387,9 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
- icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ /* only send ICMP too big on first fragment */
+ if (!iph.fragoffs)
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
goto tx_error_put;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V3 7/8] ipvs: API change to avoid rescan of IPv6 exthdr
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
` (5 preceding siblings ...)
2012-09-11 12:38 ` [PATCH V3 6/8] ipvs: Complete IPv6 fragment handling for IPVS Jesper Dangaard Brouer
@ 2012-09-11 12:38 ` Jesper Dangaard Brouer
2012-09-11 12:39 ` [PATCH V3 8/8] ipvs: SIP fragment handling Jesper Dangaard Brouer
2012-09-12 22:57 ` [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Julian Anastasov
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:38 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
Reduce the number of times we scan/skip the IPv6 exthdrs.
This patch contains a lot of API changes. This is done, to avoid
repeating the scan of finding the IPv6 headers, via ipv6_find_hdr(),
which is called by ip_vs_fill_iph_skb().
Finding the IPv6 headers is done as early as possible, and passed on
as a pointer "struct ip_vs_iphdr *" to the affected functions.
This patch reduce/removes 19 calls to ip_vs_fill_iph_skb().
Notice, I have choosen, not to change the API of function
pointer "(*schedule)" (in struct ip_vs_scheduler) as it can be
used by external schedulers, via {un,}register_ip_vs_scheduler.
Only 4 out of 10 schedulers use info from ip_vs_iphdr*, and when
they do, they are only interested in iph->{s,d}addr.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
V3:
- This patch is introduced in V3, to allow easier integration of
Patricks idea and my RFC patch of caching exthdr info in skb->cb[]
include/net/ip_vs.h | 81 +++++++++++-----------
net/netfilter/ipvs/ip_vs_conn.c | 15 ++--
net/netfilter/ipvs/ip_vs_core.c | 116 ++++++++++++++-----------------
net/netfilter/ipvs/ip_vs_proto_ah_esp.c | 9 +-
net/netfilter/ipvs/ip_vs_proto_sctp.c | 42 ++++-------
net/netfilter/ipvs/ip_vs_proto_tcp.c | 40 ++++-------
net/netfilter/ipvs/ip_vs_proto_udp.c | 41 ++++-------
net/netfilter/ipvs/ip_vs_xmit.c | 61 +++++++---------
net/netfilter/xt_ipvs.c | 2 -
9 files changed, 177 insertions(+), 230 deletions(-)
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 98806b6..a681ad6 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -487,27 +487,26 @@ struct ip_vs_protocol {
int (*conn_schedule)(int af, struct sk_buff *skb,
struct ip_vs_proto_data *pd,
- int *verdict, struct ip_vs_conn **cpp);
+ int *verdict, struct ip_vs_conn **cpp,
+ struct ip_vs_iphdr *iph);
struct ip_vs_conn *
(*conn_in_get)(int af,
const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
- unsigned int proto_off,
int inverse);
struct ip_vs_conn *
(*conn_out_get)(int af,
const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
- unsigned int proto_off,
int inverse);
- int (*snat_handler)(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp);
+ int (*snat_handler)(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph);
- int (*dnat_handler)(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp);
+ int (*dnat_handler)(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph);
int (*csum_check)(int af, struct sk_buff *skb,
struct ip_vs_protocol *pp);
@@ -607,7 +606,7 @@ struct ip_vs_conn {
NF_ACCEPT can be returned when destination is local.
*/
int (*packet_xmit)(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp);
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
/* Note: we can group the following members into a structure,
in order to save more space, and the following members are
@@ -858,13 +857,11 @@ struct ip_vs_app {
struct ip_vs_conn *
(*conn_in_get)(const struct sk_buff *skb, struct ip_vs_app *app,
- const struct iphdr *iph, unsigned int proto_off,
- int inverse);
+ const struct iphdr *iph, int inverse);
struct ip_vs_conn *
(*conn_out_get)(const struct sk_buff *skb, struct ip_vs_app *app,
- const struct iphdr *iph, unsigned int proto_off,
- int inverse);
+ const struct iphdr *iph, int inverse);
int (*state_transition)(struct ip_vs_conn *cp, int direction,
const struct sk_buff *skb,
@@ -1163,14 +1160,12 @@ struct ip_vs_conn *ip_vs_ct_in_get(const struct ip_vs_conn_param *p);
struct ip_vs_conn * ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
- unsigned int proto_off,
int inverse);
struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p);
struct ip_vs_conn * ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
- unsigned int proto_off,
int inverse);
/* put back the conn without restarting its timer */
@@ -1343,9 +1338,10 @@ extern struct ip_vs_scheduler *ip_vs_scheduler_get(const char *sched_name);
extern void ip_vs_scheduler_put(struct ip_vs_scheduler *scheduler);
extern struct ip_vs_conn *
ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
- struct ip_vs_proto_data *pd, int *ignored);
+ struct ip_vs_proto_data *pd, int *ignored,
+ struct ip_vs_iphdr *iph);
extern int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
- struct ip_vs_proto_data *pd);
+ struct ip_vs_proto_data *pd, struct ip_vs_iphdr *iph);
extern void ip_vs_scheduler_err(struct ip_vs_service *svc, const char *msg);
@@ -1404,33 +1400,38 @@ extern void ip_vs_read_estimator(struct ip_vs_stats_user *dst,
/*
* Various IPVS packet transmitters (from ip_vs_xmit.c)
*/
-extern int ip_vs_null_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_bypass_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_nat_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_tunnel_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_dr_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_icmp_xmit
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp,
- int offset, unsigned int hooknum);
+extern int ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp,
+ struct ip_vs_iphdr *iph);
+extern int ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp,
+ struct ip_vs_iphdr *iph);
+extern int ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp, int offset,
+ unsigned int hooknum, struct ip_vs_iphdr *iph);
extern void ip_vs_dst_reset(struct ip_vs_dest *dest);
#ifdef CONFIG_IP_VS_IPV6
-extern int ip_vs_bypass_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_nat_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_tunnel_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_dr_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp);
-extern int ip_vs_icmp_xmit_v6
-(struct sk_buff *skb, struct ip_vs_conn *cp, struct ip_vs_protocol *pp,
- int offset, unsigned int hooknum);
+extern int ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp,
+ struct ip_vs_iphdr *iph);
+extern int ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp,
+ struct ip_vs_iphdr *iph);
+extern int ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp,
+ struct ip_vs_iphdr *iph);
+extern int ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph);
+extern int ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
+ struct ip_vs_protocol *pp, int offset,
+ unsigned int hooknum, struct ip_vs_iphdr *iph);
#endif
#ifdef CONFIG_SYSCTL
diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
index d6c1c26..30e764a 100644
--- a/net/netfilter/ipvs/ip_vs_conn.c
+++ b/net/netfilter/ipvs/ip_vs_conn.c
@@ -308,13 +308,12 @@ struct ip_vs_conn *ip_vs_conn_in_get(const struct ip_vs_conn_param *p)
static int
ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
const struct ip_vs_iphdr *iph,
- unsigned int proto_off, int inverse,
- struct ip_vs_conn_param *p)
+ int inverse, struct ip_vs_conn_param *p)
{
__be16 _ports[2], *pptr;
struct net *net = skb_net(skb);
- pptr = frag_safe_skb_hp(skb, proto_off, sizeof(_ports), _ports, iph);
+ pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports, iph);
if (pptr == NULL)
return 1;
@@ -329,12 +328,11 @@ ip_vs_conn_fill_param_proto(int af, const struct sk_buff *skb,
struct ip_vs_conn *
ip_vs_conn_in_get_proto(int af, const struct sk_buff *skb,
- const struct ip_vs_iphdr *iph,
- unsigned int proto_off, int inverse)
+ const struct ip_vs_iphdr *iph, int inverse)
{
struct ip_vs_conn_param p;
- if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
+ if (ip_vs_conn_fill_param_proto(af, skb, iph, inverse, &p))
return NULL;
return ip_vs_conn_in_get(&p);
@@ -432,12 +430,11 @@ struct ip_vs_conn *ip_vs_conn_out_get(const struct ip_vs_conn_param *p)
struct ip_vs_conn *
ip_vs_conn_out_get_proto(int af, const struct sk_buff *skb,
- const struct ip_vs_iphdr *iph,
- unsigned int proto_off, int inverse)
+ const struct ip_vs_iphdr *iph, int inverse)
{
struct ip_vs_conn_param p;
- if (ip_vs_conn_fill_param_proto(af, skb, iph, proto_off, inverse, &p))
+ if (ip_vs_conn_fill_param_proto(af, skb, iph, inverse, &p))
return NULL;
return ip_vs_conn_out_get(&p);
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 4b9995e..f72d64f 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -222,11 +222,10 @@ ip_vs_conn_fill_param_persist(const struct ip_vs_service *svc,
*/
static struct ip_vs_conn *
ip_vs_sched_persist(struct ip_vs_service *svc,
- struct sk_buff *skb,
- __be16 src_port, __be16 dst_port, int *ignored)
+ struct sk_buff *skb, __be16 src_port, __be16 dst_port,
+ int *ignored, struct ip_vs_iphdr *iph)
{
struct ip_vs_conn *cp = NULL;
- struct ip_vs_iphdr iph;
struct ip_vs_dest *dest;
struct ip_vs_conn *ct;
__be16 dport = 0; /* destination port to forward */
@@ -236,20 +235,18 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
union nf_inet_addr snet; /* source network of the client,
after masking */
- ip_vs_fill_iph_skb(svc->af, skb, &iph);
-
/* Mask saddr with the netmask to adjust template granularity */
#ifdef CONFIG_IP_VS_IPV6
if (svc->af == AF_INET6)
- ipv6_addr_prefix(&snet.in6, &iph.saddr.in6, svc->netmask);
+ ipv6_addr_prefix(&snet.in6, &iph->saddr.in6, svc->netmask);
else
#endif
- snet.ip = iph.saddr.ip & svc->netmask;
+ snet.ip = iph->saddr.ip & svc->netmask;
IP_VS_DBG_BUF(6, "p-schedule: src %s:%u dest %s:%u "
"mnet %s\n",
- IP_VS_DBG_ADDR(svc->af, &iph.saddr), ntohs(src_port),
- IP_VS_DBG_ADDR(svc->af, &iph.daddr), ntohs(dst_port),
+ IP_VS_DBG_ADDR(svc->af, &iph->saddr), ntohs(src_port),
+ IP_VS_DBG_ADDR(svc->af, &iph->daddr), ntohs(dst_port),
IP_VS_DBG_ADDR(svc->af, &snet));
/*
@@ -266,8 +263,8 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
* is created for other persistent services.
*/
{
- int protocol = iph.protocol;
- const union nf_inet_addr *vaddr = &iph.daddr;
+ int protocol = iph->protocol;
+ const union nf_inet_addr *vaddr = &iph->daddr;
__be16 vport = 0;
if (dst_port == svc->port) {
@@ -342,14 +339,14 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
dport = dest->port;
flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
- && iph.protocol == IPPROTO_UDP)?
+ && iph->protocol == IPPROTO_UDP) ?
IP_VS_CONN_F_ONE_PACKET : 0;
/*
* Create a new connection according to the template
*/
- ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol, &iph.saddr,
- src_port, &iph.daddr, dst_port, ¶m);
+ ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol, &iph->saddr,
+ src_port, &iph->daddr, dst_port, ¶m);
cp = ip_vs_conn_new(¶m, &dest->addr, dport, flags, dest, skb->mark);
if (cp == NULL) {
@@ -392,22 +389,20 @@ ip_vs_sched_persist(struct ip_vs_service *svc,
*/
struct ip_vs_conn *
ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
- struct ip_vs_proto_data *pd, int *ignored)
+ struct ip_vs_proto_data *pd, int *ignored,
+ struct ip_vs_iphdr *iph)
{
struct ip_vs_protocol *pp = pd->pp;
struct ip_vs_conn *cp = NULL;
- struct ip_vs_iphdr iph;
struct ip_vs_dest *dest;
__be16 _ports[2], *pptr;
unsigned int flags;
*ignored = 1;
-
/*
* IPv6 frags, only the first hit here.
*/
- ip_vs_fill_iph_skb(svc->af, skb, &iph);
- pptr = frag_safe_skb_hp(skb, iph.len, sizeof(_ports), _ports, &iph);
+ pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports, iph);
if (pptr == NULL)
return NULL;
@@ -427,7 +422,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
* Do not schedule replies from local real server.
*/
if ((!skb->dev || skb->dev->flags & IFF_LOOPBACK) &&
- (cp = pp->conn_in_get(svc->af, skb, &iph, iph.len, 1))) {
+ (cp = pp->conn_in_get(svc->af, skb, iph, 1))) {
IP_VS_DBG_PKT(12, svc->af, pp, skb, 0,
"Not scheduling reply for existing connection");
__ip_vs_conn_put(cp);
@@ -438,7 +433,8 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
* Persistent service
*/
if (svc->flags & IP_VS_SVC_F_PERSISTENT)
- return ip_vs_sched_persist(svc, skb, pptr[0], pptr[1], ignored);
+ return ip_vs_sched_persist(svc, skb, pptr[0], pptr[1], ignored,
+ iph);
*ignored = 0;
@@ -460,7 +456,7 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
}
flags = (svc->flags & IP_VS_SVC_F_ONEPACKET
- && iph.protocol == IPPROTO_UDP)?
+ && iph->protocol == IPPROTO_UDP) ?
IP_VS_CONN_F_ONE_PACKET : 0;
/*
@@ -469,9 +465,9 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
{
struct ip_vs_conn_param p;
- ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
- &iph.saddr, pptr[0], &iph.daddr, pptr[1],
- &p);
+ ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
+ &iph->saddr, pptr[0], &iph->daddr,
+ pptr[1], &p);
cp = ip_vs_conn_new(&p, &dest->addr,
dest->port ? dest->port : pptr[1],
flags, dest, skb->mark);
@@ -500,18 +496,16 @@ ip_vs_schedule(struct ip_vs_service *svc, struct sk_buff *skb,
* no destination is available for a new connection.
*/
int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
- struct ip_vs_proto_data *pd)
+ struct ip_vs_proto_data *pd, struct ip_vs_iphdr *iph)
{
__be16 _ports[2], *pptr;
- struct ip_vs_iphdr iph;
#ifdef CONFIG_SYSCTL
struct net *net;
struct netns_ipvs *ipvs;
int unicast;
#endif
- ip_vs_fill_iph_skb(svc->af, skb, &iph);
- pptr = frag_safe_skb_hp(skb, iph.len, sizeof(_ports), _ports, &iph);
+ pptr = frag_safe_skb_hp(skb, iph->len, sizeof(_ports), _ports, iph);
if (pptr == NULL) {
ip_vs_service_put(svc);
return NF_DROP;
@@ -522,10 +516,10 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
#ifdef CONFIG_IP_VS_IPV6
if (svc->af == AF_INET6)
- unicast = ipv6_addr_type(&iph.daddr.in6) & IPV6_ADDR_UNICAST;
+ unicast = ipv6_addr_type(&iph->daddr.in6) & IPV6_ADDR_UNICAST;
else
#endif
- unicast = (inet_addr_type(net, iph.daddr.ip) == RTN_UNICAST);
+ unicast = (inet_addr_type(net, iph->daddr.ip) == RTN_UNICAST);
/* if it is fwmark-based service, the cache_bypass sysctl is up
and the destination is a non-local unicast, then create
@@ -535,7 +529,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
int ret;
struct ip_vs_conn *cp;
unsigned int flags = (svc->flags & IP_VS_SVC_F_ONEPACKET &&
- iph.protocol == IPPROTO_UDP)?
+ iph->protocol == IPPROTO_UDP) ?
IP_VS_CONN_F_ONE_PACKET : 0;
union nf_inet_addr daddr = { .all = { 0, 0, 0, 0 } };
@@ -545,9 +539,9 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
IP_VS_DBG(6, "%s(): create a cache_bypass entry\n", __func__);
{
struct ip_vs_conn_param p;
- ip_vs_conn_fill_param(svc->net, svc->af, iph.protocol,
- &iph.saddr, pptr[0],
- &iph.daddr, pptr[1], &p);
+ ip_vs_conn_fill_param(svc->net, svc->af, iph->protocol,
+ &iph->saddr, pptr[0],
+ &iph->daddr, pptr[1], &p);
cp = ip_vs_conn_new(&p, &daddr, 0,
IP_VS_CONN_F_BYPASS | flags,
NULL, skb->mark);
@@ -562,7 +556,7 @@ int ip_vs_leave(struct ip_vs_service *svc, struct sk_buff *skb,
ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd);
/* transmit the first SYN packet */
- ret = cp->packet_xmit(skb, cp, pd->pp);
+ ret = cp->packet_xmit(skb, cp, pd->pp, iph);
/* do not touch skb anymore */
atomic_inc(&cp->in_pkts);
@@ -911,7 +905,7 @@ static int ip_vs_out_icmp(struct sk_buff *skb, int *related,
ip_vs_fill_ip4hdr(cih, &ciph);
ciph.len += offset;
/* The embedded headers contain source and dest in reverse order */
- cp = pp->conn_out_get(AF_INET, skb, &ciph, offset, 1);
+ cp = pp->conn_out_get(AF_INET, skb, &ciph, 1);
if (!cp)
return NF_ACCEPT;
@@ -922,7 +916,7 @@ static int ip_vs_out_icmp(struct sk_buff *skb, int *related,
#ifdef CONFIG_IP_VS_IPV6
static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
- unsigned int hooknum)
+ unsigned int hooknum, struct ip_vs_iphdr *ipvsh)
{
struct icmp6hdr _icmph, *ic;
struct ipv6hdr _ip6, *ip6; /* The ip header contained
@@ -931,10 +925,6 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
struct ip_vs_protocol *pp;
union nf_inet_addr snet;
- struct ip_vs_iphdr ipvsh_stack;
- struct ip_vs_iphdr *ipvsh = &ipvsh_stack;
- ip_vs_fill_iph_skb(AF_INET6, skb, ipvsh);
-
*related = 1;
ic = frag_safe_skb_hp(skb, ipvsh->len, sizeof(_icmph), &_icmph, ipvsh);
if (ic == NULL)
@@ -977,7 +967,7 @@ static int ip_vs_out_icmp_v6(struct sk_buff *skb, int *related,
ipvsh->daddr.in6 = ip6->daddr;
/* The embedded headers contain source and dest in reverse order */
- cp = pp->conn_out_get(AF_INET6, skb, ipvsh, ipvsh->len, 1);
+ cp = pp->conn_out_get(AF_INET6, skb, ipvsh, 1);
if (!cp)
return NF_ACCEPT;
@@ -1016,17 +1006,17 @@ static inline int is_tcp_reset(const struct sk_buff *skb, int nh_len)
*/
static unsigned int
handle_response(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
- struct ip_vs_conn *cp, int ihl)
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
{
struct ip_vs_protocol *pp = pd->pp;
IP_VS_DBG_PKT(11, af, pp, skb, 0, "Outgoing packet");
- if (!skb_make_writable(skb, ihl))
+ if (!skb_make_writable(skb, iph->len))
goto drop;
/* mangle the packet */
- if (pp->snat_handler && !pp->snat_handler(skb, pp, cp))
+ if (pp->snat_handler && !pp->snat_handler(skb, pp, cp, iph))
goto drop;
#ifdef CONFIG_IP_VS_IPV6
@@ -1125,7 +1115,7 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
int related;
int verdict = ip_vs_out_icmp_v6(skb, &related,
- hooknum);
+ hooknum, &iph);
if (related)
return verdict;
@@ -1160,10 +1150,10 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
/*
* Check if the packet belongs to an existing entry
*/
- cp = pp->conn_out_get(af, skb, &iph, iph.len, 0);
+ cp = pp->conn_out_get(af, skb, &iph, 0);
if (likely(cp))
- return handle_response(af, skb, pd, cp, iph.len);
+ return handle_response(af, skb, pd, cp, &iph);
if (sysctl_nat_icmp_send(net) &&
(pp->protocol == IPPROTO_TCP ||
pp->protocol == IPPROTO_UDP ||
@@ -1375,7 +1365,7 @@ ip_vs_in_icmp(struct sk_buff *skb, int *related, unsigned int hooknum)
/* The embedded headers contain source and dest in reverse order.
* For IPIP this is error for request, not for reply.
*/
- cp = pp->conn_in_get(AF_INET, skb, &ciph, offset, ipip ? 0 : 1);
+ cp = pp->conn_in_get(AF_INET, skb, &ciph, ipip ? 0 : 1);
if (!cp)
return NF_ACCEPT;
@@ -1444,7 +1434,7 @@ ignore_ipip:
ip_vs_in_stats(cp, skb);
if (IPPROTO_TCP == cih->protocol || IPPROTO_UDP == cih->protocol)
offset += 2 * sizeof(__u16);
- verdict = ip_vs_icmp_xmit(skb, cp, pp, offset, hooknum);
+ verdict = ip_vs_icmp_xmit(skb, cp, pp, offset, hooknum, &ciph);
out:
__ip_vs_conn_put(cp);
@@ -1453,8 +1443,8 @@ out:
}
#ifdef CONFIG_IP_VS_IPV6
-static int
-ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
+static int ip_vs_in_icmp_v6(struct sk_buff *skb, int *related,
+ unsigned int hooknum, struct ip_vs_iphdr *iph)
{
struct net *net = NULL;
struct ipv6hdr _ip6h, *ip6h;
@@ -1465,10 +1455,6 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
struct ip_vs_proto_data *pd;
unsigned int offs_ciph, verdict;
- struct ip_vs_iphdr iph_stack;
- struct ip_vs_iphdr *iph = &iph_stack;
- ip_vs_fill_iph_skb(AF_INET6, skb, iph);
-
*related = 1;
ic = frag_safe_skb_hp(skb, iph->len, sizeof(_icmph), &_icmph, iph);
@@ -1522,7 +1508,7 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
/* The embedded headers contain source and dest in reverse order
* if not from localhost
*/
- cp = pp->conn_in_get(AF_INET6, skb, &ciph, ciph.len,
+ cp = pp->conn_in_get(AF_INET6, skb, &ciph,
(hooknum == NF_INET_LOCAL_OUT) ? 0 : 1);
if (!cp)
@@ -1540,7 +1526,7 @@ ip_vs_in_icmp_v6(struct sk_buff *skb, int *related, unsigned int hooknum)
IPPROTO_SCTP == ciph.protocol)
offs_ciph = ciph.len + (2 * sizeof(__u16));
- verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offs_ciph, hooknum);
+ verdict = ip_vs_icmp_xmit_v6(skb, cp, pp, offs_ciph, hooknum, &ciph);
__ip_vs_conn_put(cp);
@@ -1610,7 +1596,8 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
}
if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
int related;
- int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum);
+ int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum,
+ &iph);
if (related)
return verdict;
@@ -1633,8 +1620,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
/*
* Check if the packet belongs to an existing connection entry
*/
- cp = pp->conn_in_get(af, skb, &iph, iph.len, 0);
-
+ cp = pp->conn_in_get(af, skb, &iph, 0);
if (unlikely(!cp) && !iph.fragoffs) {
/* No (second) fragments need to enter here, as nf_defrag_ipv6
* replayed fragment zero will already have created the cp
@@ -1642,7 +1628,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
int v;
/* Schedule and create new connection entry into &cp */
- if (!pp->conn_schedule(af, skb, pd, &v, &cp))
+ if (!pp->conn_schedule(af, skb, pd, &v, &cp, &iph))
return v;
}
@@ -1680,7 +1666,7 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
ip_vs_in_stats(cp, skb);
ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pd);
if (cp->packet_xmit)
- ret = cp->packet_xmit(skb, cp, pp);
+ ret = cp->packet_xmit(skb, cp, pp, &iph);
/* do not touch skb anymore */
else {
IP_VS_DBG_RL("warning: packet_xmit is null");
@@ -1854,7 +1840,7 @@ ip_vs_forward_icmp_v6(unsigned int hooknum, struct sk_buff *skb,
if (!net_ipvs(net)->enable)
return NF_ACCEPT;
- return ip_vs_in_icmp_v6(skb, &r, hooknum);
+ return ip_vs_in_icmp_v6(skb, &r, hooknum, &iphdr);
}
#endif
diff --git a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
index 5b8eb8b..5de3dd3 100644
--- a/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_ah_esp.c
@@ -57,7 +57,7 @@ ah_esp_conn_fill_param_proto(struct net *net, int af,
static struct ip_vs_conn *
ah_esp_conn_in_get(int af, const struct sk_buff *skb,
- const struct ip_vs_iphdr *iph, unsigned int proto_off,
+ const struct ip_vs_iphdr *iph,
int inverse)
{
struct ip_vs_conn *cp;
@@ -85,9 +85,7 @@ ah_esp_conn_in_get(int af, const struct sk_buff *skb,
static struct ip_vs_conn *
ah_esp_conn_out_get(int af, const struct sk_buff *skb,
- const struct ip_vs_iphdr *iph,
- unsigned int proto_off,
- int inverse)
+ const struct ip_vs_iphdr *iph, int inverse)
{
struct ip_vs_conn *cp;
struct ip_vs_conn_param p;
@@ -110,7 +108,8 @@ ah_esp_conn_out_get(int af, const struct sk_buff *skb,
static int
ah_esp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
- int *verdict, struct ip_vs_conn **cpp)
+ int *verdict, struct ip_vs_conn **cpp,
+ struct ip_vs_iphdr *iph)
{
/*
* AH/ESP is only related traffic. Pass the packet to IP stack.
diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index b903db6..746048b 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -10,28 +10,26 @@
static int
sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
- int *verdict, struct ip_vs_conn **cpp)
+ int *verdict, struct ip_vs_conn **cpp,
+ struct ip_vs_iphdr *iph)
{
struct net *net;
struct ip_vs_service *svc;
sctp_chunkhdr_t _schunkh, *sch;
sctp_sctphdr_t *sh, _sctph;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(af, skb, &iph);
-
- sh = skb_header_pointer(skb, iph.len, sizeof(_sctph), &_sctph);
+ sh = skb_header_pointer(skb, iph->len, sizeof(_sctph), &_sctph);
if (sh == NULL)
return 0;
- sch = skb_header_pointer(skb, iph.len + sizeof(sctp_sctphdr_t),
+ sch = skb_header_pointer(skb, iph->len + sizeof(sctp_sctphdr_t),
sizeof(_schunkh), &_schunkh);
if (sch == NULL)
return 0;
net = skb_net(skb);
if ((sch->type == SCTP_CID_INIT) &&
- (svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
- &iph.daddr, sh->dest))) {
+ (svc = ip_vs_service_get(net, af, skb->mark, iph->protocol,
+ &iph->daddr, sh->dest))) {
int ignored;
if (ip_vs_todrop(net_ipvs(net))) {
@@ -47,10 +45,10 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
* Let the virtual server select a real server for the
* incoming connection, and create a connection entry.
*/
- *cpp = ip_vs_schedule(svc, skb, pd, &ignored);
+ *cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph);
if (!*cpp && ignored <= 0) {
if (!ignored)
- *verdict = ip_vs_leave(svc, skb, pd);
+ *verdict = ip_vs_leave(svc, skb, pd, iph);
else {
ip_vs_service_put(svc);
*verdict = NF_DROP;
@@ -64,20 +62,16 @@ sctp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
}
static int
-sctp_snat_handler(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+sctp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
{
sctp_sctphdr_t *sctph;
- unsigned int sctphoff;
+ unsigned int sctphoff = iph->len;
struct sk_buff *iter;
__be32 crc32;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
- sctphoff = iph.len;
-
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6 && iph.fragoffs)
+ if (cp->af == AF_INET6 && iph->fragoffs)
return 1;
#endif
@@ -110,20 +104,16 @@ sctp_snat_handler(struct sk_buff *skb,
}
static int
-sctp_dnat_handler(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+sctp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
{
sctp_sctphdr_t *sctph;
- unsigned int sctphoff;
+ unsigned int sctphoff = iph->len;
struct sk_buff *iter;
__be32 crc32;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
- sctphoff = iph.len;
-
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6 && iph.fragoffs)
+ if (cp->af == AF_INET6 && iph->fragoffs)
return 1;
#endif
diff --git a/net/netfilter/ipvs/ip_vs_proto_tcp.c b/net/netfilter/ipvs/ip_vs_proto_tcp.c
index 8a96069..9af653a 100644
--- a/net/netfilter/ipvs/ip_vs_proto_tcp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_tcp.c
@@ -33,16 +33,14 @@
static int
tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
- int *verdict, struct ip_vs_conn **cpp)
+ int *verdict, struct ip_vs_conn **cpp,
+ struct ip_vs_iphdr *iph)
{
struct net *net;
struct ip_vs_service *svc;
struct tcphdr _tcph, *th;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(af, skb, &iph);
-
- th = skb_header_pointer(skb, iph.len, sizeof(_tcph), &_tcph);
+ th = skb_header_pointer(skb, iph->len, sizeof(_tcph), &_tcph);
if (th == NULL) {
*verdict = NF_DROP;
return 0;
@@ -50,8 +48,8 @@ tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
net = skb_net(skb);
/* No !th->ack check to allow scheduling on SYN+ACK for Active FTP */
if (th->syn &&
- (svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
- &iph.daddr, th->dest))) {
+ (svc = ip_vs_service_get(net, af, skb->mark, iph->protocol,
+ &iph->daddr, th->dest))) {
int ignored;
if (ip_vs_todrop(net_ipvs(net))) {
@@ -68,10 +66,10 @@ tcp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
* Let the virtual server select a real server for the
* incoming connection, and create a connection entry.
*/
- *cpp = ip_vs_schedule(svc, skb, pd, &ignored);
+ *cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph);
if (!*cpp && ignored <= 0) {
if (!ignored)
- *verdict = ip_vs_leave(svc, skb, pd);
+ *verdict = ip_vs_leave(svc, skb, pd, iph);
else {
ip_vs_service_put(svc);
*verdict = NF_DROP;
@@ -128,20 +126,16 @@ tcp_partial_csum_update(int af, struct tcphdr *tcph,
static int
-tcp_snat_handler(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+tcp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
{
struct tcphdr *tcph;
- unsigned int tcphoff;
+ unsigned int tcphoff = iph->len;
int oldlen;
int payload_csum = 0;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
- tcphoff = iph.len;
-
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6 && iph.fragoffs)
+ if (cp->af == AF_INET6 && iph->fragoffs)
return 1;
#endif
oldlen = skb->len - tcphoff;
@@ -210,20 +204,16 @@ tcp_snat_handler(struct sk_buff *skb,
static int
-tcp_dnat_handler(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+tcp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
{
struct tcphdr *tcph;
- unsigned int tcphoff;
+ unsigned int tcphoff = iph->len;
int oldlen;
int payload_csum = 0;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
- tcphoff = iph.len;
-
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6 && iph.fragoffs)
+ if (cp->af == AF_INET6 && iph->fragoffs)
return 1;
#endif
oldlen = skb->len - tcphoff;
diff --git a/net/netfilter/ipvs/ip_vs_proto_udp.c b/net/netfilter/ipvs/ip_vs_proto_udp.c
index d6f4eee..503a842 100644
--- a/net/netfilter/ipvs/ip_vs_proto_udp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_udp.c
@@ -30,23 +30,22 @@
static int
udp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
- int *verdict, struct ip_vs_conn **cpp)
+ int *verdict, struct ip_vs_conn **cpp,
+ struct ip_vs_iphdr *iph)
{
struct net *net;
struct ip_vs_service *svc;
struct udphdr _udph, *uh;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(af, skb, &iph);
-
- uh = skb_header_pointer(skb, iph.len, sizeof(_udph), &_udph);
+ /* IPv6 fragments, only first fragment will hit this */
+ uh = skb_header_pointer(skb, iph->len, sizeof(_udph), &_udph);
if (uh == NULL) {
*verdict = NF_DROP;
return 0;
}
net = skb_net(skb);
- svc = ip_vs_service_get(net, af, skb->mark, iph.protocol,
- &iph.daddr, uh->dest);
+ svc = ip_vs_service_get(net, af, skb->mark, iph->protocol,
+ &iph->daddr, uh->dest);
if (svc) {
int ignored;
@@ -64,10 +63,10 @@ udp_conn_schedule(int af, struct sk_buff *skb, struct ip_vs_proto_data *pd,
* Let the virtual server select a real server for the
* incoming connection, and create a connection entry.
*/
- *cpp = ip_vs_schedule(svc, skb, pd, &ignored);
+ *cpp = ip_vs_schedule(svc, skb, pd, &ignored, iph);
if (!*cpp && ignored <= 0) {
if (!ignored)
- *verdict = ip_vs_leave(svc, skb, pd);
+ *verdict = ip_vs_leave(svc, skb, pd, iph);
else {
ip_vs_service_put(svc);
*verdict = NF_DROP;
@@ -125,20 +124,16 @@ udp_partial_csum_update(int af, struct udphdr *uhdr,
static int
-udp_snat_handler(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+udp_snat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
{
struct udphdr *udph;
- unsigned int udphoff;
+ unsigned int udphoff = iph->len;
int oldlen;
int payload_csum = 0;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
- udphoff = iph.len;
-
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6 && iph.fragoffs)
+ if (cp->af == AF_INET6 && iph->fragoffs)
return 1;
#endif
oldlen = skb->len - udphoff;
@@ -212,20 +207,16 @@ udp_snat_handler(struct sk_buff *skb,
static int
-udp_dnat_handler(struct sk_buff *skb,
- struct ip_vs_protocol *pp, struct ip_vs_conn *cp)
+udp_dnat_handler(struct sk_buff *skb, struct ip_vs_protocol *pp,
+ struct ip_vs_conn *cp, struct ip_vs_iphdr *iph)
{
struct udphdr *udph;
- unsigned int udphoff;
+ unsigned int udphoff = iph->len;
int oldlen;
int payload_csum = 0;
- struct ip_vs_iphdr iph;
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
- udphoff = iph.len;
-
#ifdef CONFIG_IP_VS_IPV6
- if (cp->af == AF_INET6 && iph.fragoffs)
+ if (cp->af == AF_INET6 && iph->fragoffs)
return 1;
#endif
oldlen = skb->len - udphoff;
diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index e44933f..90122eb 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -424,7 +424,7 @@ do { \
*/
int
ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
{
/* we do not touch skb and do not need pskb ptr */
IP_VS_XMIT(NFPROTO_IPV4, skb, cp, 1);
@@ -438,7 +438,7 @@ ip_vs_null_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
*/
int
ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
{
struct rtable *rt; /* Route to the other host */
struct iphdr *iph = ip_hdr(skb);
@@ -493,17 +493,16 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
#ifdef CONFIG_IP_VS_IPV6
int
ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph)
{
struct rt6_info *rt; /* Route to the other host */
- struct ip_vs_iphdr iph;
int mtu;
EnterFunction(10);
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
- if (!(rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph.daddr.in6, NULL, 0,
- IP_VS_RT_MODE_NON_LOCAL)))
+ rt = __ip_vs_get_out_rt_v6(skb, NULL, &iph->daddr.in6, NULL, 0,
+ IP_VS_RT_MODE_NON_LOCAL);
+ if (!rt)
goto tx_error_icmp;
/* MTU checking */
@@ -515,7 +514,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
/* only send ICMP too big on first fragment */
- if (!iph.fragoffs)
+ if (!iph->fragoffs)
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
dst_release(&rt->dst);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -559,7 +558,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
*/
int
ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
{
struct rtable *rt; /* Route to the other host */
int mtu;
@@ -629,7 +628,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
goto tx_error_put;
/* mangle the packet */
- if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp))
+ if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp, ipvsh))
goto tx_error_put;
ip_hdr(skb)->daddr = cp->daddr.ip;
ip_send_check(ip_hdr(skb));
@@ -677,20 +676,18 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
#ifdef CONFIG_IP_VS_IPV6
int
ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph)
{
struct rt6_info *rt; /* Route to the other host */
int mtu;
int local;
- struct ip_vs_iphdr iph;
EnterFunction(10);
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
/* check if it is a connection of no-client-port */
- if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT && !iph.fragoffs)) {
+ if (unlikely(cp->flags & IP_VS_CONN_F_NO_CPORT && !iph->fragoffs)) {
__be16 _pt, *p;
- p = skb_header_pointer(skb, iph.len, sizeof(_pt), &_pt);
+ p = skb_header_pointer(skb, iph->len, sizeof(_pt), &_pt);
if (p == NULL)
goto tx_error;
ip_vs_conn_fill_cport(cp, *p);
@@ -739,7 +736,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
/* only send ICMP too big on first fragment */
- if (!iph.fragoffs)
+ if (!iph->fragoffs)
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
IP_VS_DBG_RL_PKT(0, AF_INET6, pp, skb, 0,
"ip_vs_nat_xmit_v6(): frag needed for");
@@ -754,7 +751,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
goto tx_error_put;
/* mangle the packet */
- if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp))
+ if (pp->dnat_handler && !pp->dnat_handler(skb, pp, cp, iph))
goto tx_error;
ipv6_hdr(skb)->daddr = cp->daddr.in6;
@@ -815,7 +812,7 @@ tx_error_put:
*/
int
ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
{
struct netns_ipvs *ipvs = net_ipvs(skb_net(skb));
struct rtable *rt; /* Route to the other host */
@@ -935,7 +932,7 @@ tx_error_put:
#ifdef CONFIG_IP_VS_IPV6
int
ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
{
struct rt6_info *rt; /* Route to the other host */
struct in6_addr saddr; /* Source for tunnel */
@@ -945,10 +942,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
unsigned int max_headroom; /* The extra header space needed */
int mtu;
int ret;
- struct ip_vs_iphdr ipvsh;
EnterFunction(10);
- ip_vs_fill_iph_skb(cp->af, skb, &ipvsh);
if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6,
&saddr, 1, (IP_VS_RT_MODE_LOCAL |
@@ -978,7 +973,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
/* only send ICMP too big on first fragment */
- if (!ipvsh.fragoffs)
+ if (!ipvsh->fragoffs)
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
goto tx_error_put;
@@ -1060,7 +1055,7 @@ tx_error_put:
*/
int
ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *ipvsh)
{
struct rtable *rt; /* Route to the other host */
struct iphdr *iph = ip_hdr(skb);
@@ -1121,14 +1116,12 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
#ifdef CONFIG_IP_VS_IPV6
int
ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp)
+ struct ip_vs_protocol *pp, struct ip_vs_iphdr *iph)
{
struct rt6_info *rt; /* Route to the other host */
int mtu;
- struct ip_vs_iphdr iph;
EnterFunction(10);
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
if (!(rt = __ip_vs_get_out_rt_v6(skb, cp->dest, &cp->daddr.in6, NULL,
0, (IP_VS_RT_MODE_LOCAL |
@@ -1148,7 +1141,7 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
/* only send ICMP too big on first fragment */
- if (!iph.fragoffs)
+ if (!iph->fragoffs)
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
dst_release(&rt->dst);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -1193,7 +1186,8 @@ tx_error:
*/
int
ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp, int offset, unsigned int hooknum)
+ struct ip_vs_protocol *pp, int offset, unsigned int hooknum,
+ struct ip_vs_iphdr *iph)
{
struct rtable *rt; /* Route to the other host */
int mtu;
@@ -1208,7 +1202,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
translate address/port back */
if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) {
if (cp->packet_xmit)
- rc = cp->packet_xmit(skb, cp, pp);
+ rc = cp->packet_xmit(skb, cp, pp, iph);
else
rc = NF_ACCEPT;
/* do not touch skb anymore */
@@ -1314,24 +1308,23 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,
#ifdef CONFIG_IP_VS_IPV6
int
ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
- struct ip_vs_protocol *pp, int offset, unsigned int hooknum)
+ struct ip_vs_protocol *pp, int offset, unsigned int hooknum,
+ struct ip_vs_iphdr *iph)
{
struct rt6_info *rt; /* Route to the other host */
int mtu;
int rc;
int local;
int rt_mode;
- struct ip_vs_iphdr iph;
EnterFunction(10);
- ip_vs_fill_iph_skb(cp->af, skb, &iph);
/* The ICMP packet for VS/TUN, VS/DR and LOCALNODE will be
forwarded directly here, because there is no need to
translate address/port back */
if (IP_VS_FWD_METHOD(cp) != IP_VS_CONN_F_MASQ) {
if (cp->packet_xmit)
- rc = cp->packet_xmit(skb, cp, pp);
+ rc = cp->packet_xmit(skb, cp, pp, iph);
else
rc = NF_ACCEPT;
/* do not touch skb anymore */
@@ -1388,7 +1381,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
skb->dev = net->loopback_dev;
}
/* only send ICMP too big on first fragment */
- if (!iph.fragoffs)
+ if (!iph->fragoffs)
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
goto tx_error_put;
diff --git a/net/netfilter/xt_ipvs.c b/net/netfilter/xt_ipvs.c
index 3f9b8cd..8d47c37 100644
--- a/net/netfilter/xt_ipvs.c
+++ b/net/netfilter/xt_ipvs.c
@@ -85,7 +85,7 @@ ipvs_mt(const struct sk_buff *skb, struct xt_action_param *par)
/*
* Check if the packet belongs to an existing entry
*/
- cp = pp->conn_out_get(family, skb, &iph, iph.len, 1 /* inverse */);
+ cp = pp->conn_out_get(family, skb, &iph, 1 /* inverse */);
if (unlikely(cp == NULL)) {
match = false;
goto out;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH V3 8/8] ipvs: SIP fragment handling
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
` (6 preceding siblings ...)
2012-09-11 12:38 ` [PATCH V3 7/8] ipvs: API change to avoid rescan of IPv6 exthdr Jesper Dangaard Brouer
@ 2012-09-11 12:39 ` Jesper Dangaard Brouer
2012-09-12 22:57 ` [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Julian Anastasov
8 siblings, 0 replies; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-11 12:39 UTC (permalink / raw)
To: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Julian Anastasov
Cc: Jesper Dangaard Brouer, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
Use the nfct_reasm SKB if available.
Based on part of a patch from: Hans Schillstrom
I have left Hans'es comment in the patch (marked /HS)
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
V3:
- I have split out the SIP fragment handling into a seperate patch.
As I have not been able to test this part.
- Change the strange SKB swapping reasm = skb, reverse logic to minimize patch
net/netfilter/ipvs/ip_vs_pe_sip.c | 19 +++++++++++++++----
1 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_pe_sip.c b/net/netfilter/ipvs/ip_vs_pe_sip.c
index ee4e2e3..43acba6 100644
--- a/net/netfilter/ipvs/ip_vs_pe_sip.c
+++ b/net/netfilter/ipvs/ip_vs_pe_sip.c
@@ -68,6 +68,7 @@ static int get_callid(const char *dptr, unsigned int dataoff,
static int
ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
{
+ struct sk_buff *reasm = skb_nfct_reasm(skb);
struct ip_vs_iphdr iph;
unsigned int dataoff, datalen, matchoff, matchlen;
const char *dptr;
@@ -78,13 +79,23 @@ ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
/* Only useful with UDP */
if (iph.protocol != IPPROTO_UDP)
return -EINVAL;
+ /*
+ * todo: IPv6 fragments:
+ * I think this only should be done for the first fragment. /HS
+ */
+ if (reasm) {
+ skb = reasm;
+ dataoff = iph.thoff_reasm + sizeof(struct udphdr);
+ } else
+ dataoff = iph.len + sizeof(struct udphdr);
- /* No Data ? */
- dataoff = iph.len + sizeof(struct udphdr);
if (dataoff >= skb->len)
return -EINVAL;
-
- if ((retc=skb_linearize(skb)) < 0)
+ /*
+ * todo: Check if this will mess-up the reasm skb !!! /HS
+ */
+ retc = skb_linearize(skb);
+ if (retc < 0)
return retc;
dptr = skb->data + dataoff;
datalen = skb->len - dataoff;
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
` (7 preceding siblings ...)
2012-09-11 12:39 ` [PATCH V3 8/8] ipvs: SIP fragment handling Jesper Dangaard Brouer
@ 2012-09-12 22:57 ` Julian Anastasov
2012-09-25 13:11 ` Jesper Dangaard Brouer
8 siblings, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2012-09-12 22:57 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
Hello,
On Tue, 11 Sep 2012, Jesper Dangaard Brouer wrote:
> The following patchset implement IPv6 fragment handling for IPVS.
>
> This work is based upon patches from Hans Schillstrom. I have taken
> over the patchset, in close agreement with Hans, because he don't have
> (gotten allocated) time to complete his work.
>
> I have cleaned up the patchset significantly, and split the patchset
> up into eight patches.
>
> The first 4 patches, are ready to be merged
>
> Patch01: Trivial changes, use compressed IPv6 address in output
> Patch02: IPv6 extend ICMPv6 handling for future types
> Patch03: Use config macro IS_ENABLED()
> Patch04: Fix bug in IPVS IPv6 NAT mangling of ports inside ICMPv6 packets
>
> The next 4 patches, I consider V3 of the patches I have submitted
> earlier, where I have incorporated all of Julian's feedback. I have
> also tried to make the patches easier to review, by reorganizing the
> changes, to be more strictly split (exthdr vs. fragment handling).
>
> I have also removed the API changes, and moved those to patch07. This
> is done, (1) to make it easier to review the patches, and (2) to allow
> easier integration of Patricks idea and my RFC patch of caching exthdr
> info in skb->cb[]. Thus, we can get these patches applied (and later
> go back and apply the caching scheme easier).
>
> Patch05: Fix faulty IPv6 extension header handling in IPVS
> Patch06: Complete IPv6 fragment handling for IPVS
> Patch07: IPVS API change to avoid rescan of IPv6 exthdr
> Patch08: IPVS SIP fragment handling
>
> The SIP frag handling have been split into its own patch, as I have
> not been able to test this part my self.
>
> This patchset is based upon:
> Pablo's nf-next tree: git://1984.lsi.us.es/nf-next
> On top of commit 0edd94887d19ad73539477395c17ea0d6898947a
>
> ---
>
> Jesper Dangaard Brouer (8):
> ipvs: SIP fragment handling
> ipvs: API change to avoid rescan of IPv6 exthdr
> ipvs: Complete IPv6 fragment handling for IPVS
> ipvs: Fix faulty IPv6 extension header handling in IPVS
> ipvs: Fix bug in IPv6 NAT mangling of ports inside ICMPv6 packets
> ipvs: Use config macro IS_ENABLED()
> ipvs: IPv6 extend ICMPv6 handling for future types
> ipvs: Trivial changes, use compressed IPv6 address in output
Some comments:
- About patch 4: ip_vs_icmp_xmit_v6 already calls skb_make_writable
before ip_vs_nat_icmp_v6, that is why we provide 'offset'.
- May be we can provide the offset of ICMPv6 header
from ip_vs_in_icmp_v6 to ip_vs_icmp_xmit_v6 as additional
argument (icmp_offset) and then to ip_vs_nat_icmp_v6. By this
way we can avoid the two ipv6_find_hdr calls if we also provide
the iph argument from ip_vs_icmp_xmit_v6 to ip_vs_nat_icmp_v6,
its ->len points to the ports. ip_vs_in_icmp_v6 provides
also protocol in this ciph, so may be we have everything.
- in ip_vs_in_icmp_v6 there must be 'offs_ciph = ciph.len;'
just before this line:
if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
The idea is that we linearize for writing the inner
IP header and optionally the 2 ports. That is why old
logic was 'offset += 2 * sizeof(__u16);'
- initially, ip_vs_fill_iph_skb fills iphdr->flags from
current fragment, later ip_vs_out_icmp_v6 uses the same
ipvsh when calling ipv6_find_hdr. Should we initialize
ipvsh->flags to 0 before calling ipv6_find_hdr because
it is I/O argument?
- in patch 5: in ip_vs_nat_icmp_v6 skb_make_writable can
move data to other addresses on linearization. Any pointers
like 'ciph' should be recalculated based on offsets. But
it does not matter because we should not call skb_make_writable
here.
I also see that we should not send ICMP
errors (FRAG NEEDED/TOO BIG) in response to large
ICMP error packets but it is not related to your changes,
it needs separate change to all transmitters.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS
2012-09-12 22:57 ` [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Julian Anastasov
@ 2012-09-25 13:11 ` Jesper Dangaard Brouer
2012-09-25 20:48 ` Julian Anastasov
0 siblings, 1 reply; 12+ messages in thread
From: Jesper Dangaard Brouer @ 2012-09-25 13:11 UTC (permalink / raw)
To: Julian Anastasov
Cc: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
On Thu, 2012-09-13 at 01:57 +0300, Julian Anastasov wrote:
> Hello,
>
> On Tue, 11 Sep 2012, Jesper Dangaard Brouer wrote:
>
> > The following patchset implement IPv6 fragment handling for IPVS.
> >
> > This work is based upon patches from Hans Schillstrom. I have taken
> > over the patchset, in close agreement with Hans, because he don't have
> > (gotten allocated) time to complete his work.
> >
> > I have cleaned up the patchset significantly, and split the patchset
> > up into eight patches.
> >
> > The first 4 patches, are ready to be merged
> >
> > Patch01: Trivial changes, use compressed IPv6 address in output
> > Patch02: IPv6 extend ICMPv6 handling for future types
> > Patch03: Use config macro IS_ENABLED()
> > Patch04: Fix bug in IPVS IPv6 NAT mangling of ports inside ICMPv6 packets
> >
> > The next 4 patches, I consider V3 of the patches I have submitted
> > earlier, where I have incorporated all of Julian's feedback. I have
> > also tried to make the patches easier to review, by reorganizing the
> > changes, to be more strictly split (exthdr vs. fragment handling).
> >
> > I have also removed the API changes, and moved those to patch07. This
> > is done, (1) to make it easier to review the patches, and (2) to allow
> > easier integration of Patricks idea and my RFC patch of caching exthdr
> > info in skb->cb[]. Thus, we can get these patches applied (and later
> > go back and apply the caching scheme easier).
> >
> > Patch05: Fix faulty IPv6 extension header handling in IPVS
> > Patch06: Complete IPv6 fragment handling for IPVS
> > Patch07: IPVS API change to avoid rescan of IPv6 exthdr
> > Patch08: IPVS SIP fragment handling
> >
> > The SIP frag handling have been split into its own patch, as I have
> > not been able to test this part my self.
> >
> > This patchset is based upon:
> > Pablo's nf-next tree: git://1984.lsi.us.es/nf-next
> > On top of commit 0edd94887d19ad73539477395c17ea0d6898947a
> >
> > ---
> >
> > Jesper Dangaard Brouer (8):
> > ipvs: SIP fragment handling
> > ipvs: API change to avoid rescan of IPv6 exthdr
> > ipvs: Complete IPv6 fragment handling for IPVS
> > ipvs: Fix faulty IPv6 extension header handling in IPVS
> > ipvs: Fix bug in IPv6 NAT mangling of ports inside ICMPv6 packets
> > ipvs: Use config macro IS_ENABLED()
> > ipvs: IPv6 extend ICMPv6 handling for future types
> > ipvs: Trivial changes, use compressed IPv6 address in output
>
> Some comments:
>
> - About patch 4: ip_vs_icmp_xmit_v6 already calls skb_make_writable
> before ip_vs_nat_icmp_v6, that is why we provide 'offset'.
I see, that call path is correct, BUT I was talking about another call
path of ip_vs_nat_icmp_v6(), via handle_response_icmp() (which also
calls skb_make_writable). That call path is triggered, if the
real-server, have shutdown its service and send back an ICMPv6 packet.
Hmm, testing it again, I cannot trigger this issue. Perhaps I was
confusing my self and were using my test script that added IPv6 exthdrs
to the packet. Adding print statements to the code, also show the
correct offset now.
I'm dropping this patch-4, and I'll adjust/fix patch-5 ("ipvs: fix
faulty IPv6 extension header handling in IPVS") accordingly. And I'll
double check patch-5, that exthdr have been accounted for (in the offset
used by skb_make_writable() before calling ip_vs_nat_icmp_v6()).
> - May be we can provide the offset of ICMPv6 header
> from ip_vs_in_icmp_v6 to ip_vs_icmp_xmit_v6 as additional
> argument (icmp_offset) and then to ip_vs_nat_icmp_v6. By this
> way we can avoid the two ipv6_find_hdr calls if we also provide
> the iph argument from ip_vs_icmp_xmit_v6 to ip_vs_nat_icmp_v6,
> its ->len points to the ports. ip_vs_in_icmp_v6 provides
> also protocol in this ciph, so may be we have everything.
Is this comment for the API patch-7 ("ipvs: API change to avoid rescan
of IPv6 exthdr") ?
The API patch is going to save 19 calls to ipv6_find_hdr ().
> - in ip_vs_in_icmp_v6 there must be 'offs_ciph = ciph.len;'
> just before this line:
>
> if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
>
It would be a lot easier for me, if you commented directly on the
patches.
I can see that 'offs_ciph = ciph.len;' is set earlier in this patch, but
that value is primarily used by IP_VS_DBG_PKT. And offs_ciph, needs to
be updated, again, with the value of ciph.len after the call to
ipv6_find_hdr(). So, yes you are right ;-)
I'll rename offs_ciph to "writable" to emphasize what we are using this
value for.
> The idea is that we linearize for writing the inner
> IP header and optionally the 2 ports. That is why old
> logic was 'offset += 2 * sizeof(__u16);'
The port logic was kept. But I'll make it more clear whats happening,
and keep the "+=" coding style.
> - initially, ip_vs_fill_iph_skb fills iphdr->flags from
> current fragment, later ip_vs_out_icmp_v6 uses the same
> ipvsh when calling ipv6_find_hdr. Should we initialize
> ipvsh->flags to 0 before calling ipv6_find_hdr because
> it is I/O argument?
As we don't use the flag, after this point, we can just give
ipv6_find_hdr() a NULL value instead.
But I must give you, that it's a little confusing the way we reuse the
ipvsh variable (in ip_vs_out_icmp_v6()). Think, this needs to be
rewritten to use a separate variable, like in ip_vs_in_icmp_v6().
> - in patch 5: in ip_vs_nat_icmp_v6 skb_make_writable can
> move data to other addresses on linearization. Any pointers
> like 'ciph' should be recalculated based on offsets. But
> it does not matter because we should not call skb_make_writable
> here.
Yes, as mentioned earlier, I'll fix up patch-5 and remove the
skb_make_writable call.
> I also see that we should not send ICMP
> errors (FRAG NEEDED/TOO BIG) in response to large
> ICMP error packets but it is not related to your changes,
> it needs separate change to all transmitters.
Yes, its unrelated, lets fix that in another patchset.
I'll hopefully soon have a patchset ready with these changes/updates...
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS
2012-09-25 13:11 ` Jesper Dangaard Brouer
@ 2012-09-25 20:48 ` Julian Anastasov
0 siblings, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2012-09-25 20:48 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Hans Schillstrom, Hans Schillstrom, netdev, Patrick McHardy,
Pablo Neira Ayuso, lvs-devel, Thomas Graf, Wensong Zhang,
netfilter-devel, Simon Horman
Hello,
On Tue, 25 Sep 2012, Jesper Dangaard Brouer wrote:
> > Some comments:
> >
> > - About patch 4: ip_vs_icmp_xmit_v6 already calls skb_make_writable
> > before ip_vs_nat_icmp_v6, that is why we provide 'offset'.
>
> I see, that call path is correct, BUT I was talking about another call
> path of ip_vs_nat_icmp_v6(), via handle_response_icmp() (which also
> calls skb_make_writable). That call path is triggered, if the
> real-server, have shutdown its service and send back an ICMPv6 packet.
Yes, currently, both ip_vs_nat_icmp_v6 callers use
skb_make_writable before calling it.
> Hmm, testing it again, I cannot trigger this issue. Perhaps I was
> confusing my self and were using my test script that added IPv6 exthdrs
> to the packet. Adding print statements to the code, also show the
> correct offset now.
> I'm dropping this patch-4, and I'll adjust/fix patch-5 ("ipvs: fix
> faulty IPv6 extension header handling in IPVS") accordingly. And I'll
> double check patch-5, that exthdr have been accounted for (in the offset
> used by skb_make_writable() before calling ip_vs_nat_icmp_v6()).
Yes, patch-4 is not needed.
> > - May be we can provide the offset of ICMPv6 header
> > from ip_vs_in_icmp_v6 to ip_vs_icmp_xmit_v6 as additional
> > argument (icmp_offset) and then to ip_vs_nat_icmp_v6. By this
> > way we can avoid the two ipv6_find_hdr calls if we also provide
> > the iph argument from ip_vs_icmp_xmit_v6 to ip_vs_nat_icmp_v6,
> > its ->len points to the ports. ip_vs_in_icmp_v6 provides
> > also protocol in this ciph, so may be we have everything.
>
> Is this comment for the API patch-7 ("ipvs: API change to avoid rescan
> of IPv6 exthdr") ?
No, this comment is for patch-5 where 2 ipv6_find_hdr
calls are added to ip_vs_nat_icmp_v6. But we can change it
later in followup patch as an optimization.
> The API patch is going to save 19 calls to ipv6_find_hdr ().
>
>
> > - in ip_vs_in_icmp_v6 there must be 'offs_ciph = ciph.len;'
> > just before this line:
> >
> > if (IPPROTO_TCP == ciph.protocol || IPPROTO_UDP == ciph.protocol ||
> >
>
> It would be a lot easier for me, if you commented directly on the
> patches.
Sorry, this is for patch-5, its purpose is for
skb_make_writable in ip_vs_icmp_xmit_v6, not for debug.
> I can see that 'offs_ciph = ciph.len;' is set earlier in this patch, but
> that value is primarily used by IP_VS_DBG_PKT. And offs_ciph, needs to
> be updated, again, with the value of ciph.len after the call to
> ipv6_find_hdr(). So, yes you are right ;-)
>
> I'll rename offs_ciph to "writable" to emphasize what we are using this
> value for.
Good idea, this is its purpose.
> > The idea is that we linearize for writing the inner
> > IP header and optionally the 2 ports. That is why old
> > logic was 'offset += 2 * sizeof(__u16);'
>
> The port logic was kept. But I'll make it more clear whats happening,
> and keep the "+=" coding style.
Yes, it was in this way at 2 places before your changes,
one in handle_response_icmp and another in ip_vs_in_icmp_v6.
> > - initially, ip_vs_fill_iph_skb fills iphdr->flags from
> > current fragment, later ip_vs_out_icmp_v6 uses the same
> > ipvsh when calling ipv6_find_hdr. Should we initialize
> > ipvsh->flags to 0 before calling ipv6_find_hdr because
> > it is I/O argument?
For the record, this is patch-7
> As we don't use the flag, after this point, we can just give
> ipv6_find_hdr() a NULL value instead.
Agreed, NULL for flags looks fine.
> But I must give you, that it's a little confusing the way we reuse the
> ipvsh variable (in ip_vs_out_icmp_v6()). Think, this needs to be
> rewritten to use a separate variable, like in ip_vs_in_icmp_v6().
The initialization is risky but saves stack. So,
it is up to you to decide whether we need local ipvsh_stack
var as before patch-7.
Regards
--
Julian Anastasov <ja@ssi.bg>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-09-25 20:44 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-11 12:36 [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Jesper Dangaard Brouer
2012-09-11 12:36 ` [PATCH V3 1/8] ipvs: Trivial changes, use compressed IPv6 address in output Jesper Dangaard Brouer
2012-09-11 12:36 ` [PATCH V3 2/8] ipvs: IPv6 extend ICMPv6 handling for future types Jesper Dangaard Brouer
2012-09-11 12:37 ` [PATCH V3 3/8] ipvs: Use config macro IS_ENABLED() Jesper Dangaard Brouer
2012-09-11 12:37 ` [PATCH V3 4/8] ipvs: Fix bug in IPv6 NAT mangling of ports inside ICMPv6 packets Jesper Dangaard Brouer
2012-09-11 12:37 ` [PATCH V3 5/8] ipvs: Fix faulty IPv6 extension header handling in IPVS Jesper Dangaard Brouer
2012-09-11 12:38 ` [PATCH V3 6/8] ipvs: Complete IPv6 fragment handling for IPVS Jesper Dangaard Brouer
2012-09-11 12:38 ` [PATCH V3 7/8] ipvs: API change to avoid rescan of IPv6 exthdr Jesper Dangaard Brouer
2012-09-11 12:39 ` [PATCH V3 8/8] ipvs: SIP fragment handling Jesper Dangaard Brouer
2012-09-12 22:57 ` [PATCH V3 0/8] ipvs: IPv6 fragment handling for IPVS Julian Anastasov
2012-09-25 13:11 ` Jesper Dangaard Brouer
2012-09-25 20:48 ` Julian Anastasov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).