* [Patch net-next v8 00/11] vxlan: add ipv6 support
@ 2013-05-17 0:21 Cong Wang
2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang
` (10 more replies)
0 siblings, 11 replies; 23+ messages in thread
From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw)
To: netdev
From: Cong Wang <amwang@redhat.com>
v8: fix the bug when bindv6only=1
fix more compile errors when IPV6=m
complete the rest missing features for IPv6
v7: respect disable_ipv6 flag
back to ipv4 only when ipv6 is not supported
v6: use a stub for IPv6 mcast functions
split a few more long lines
rebased on the latest net-next
v5: make David happy on the names of the fields
fix my mistake during rebasing the patches
drop the scope_id patch, because it is broken
export in6addr_loopback
fix a udp checksum bug
rebased on the latest net-next
v4: rename ->sin to ->va_sin
rename ->sin6 to ->va_sin6
rename ->family to ->va_sa
support ll addr
fix more ugly #ifdef
rebased on the latest net-next
v3: fix many coding style issues
fix some ugly #ifdef
rename vxlan_ip to vxlan_addr
rename ->proto to ->family
rename ->ip4/->ip6 to ->sin/->sin6
v2: fix some compile error when !CONFIG_IPV6
improve some code based on Stephen's comments
use sockaddr suggested by David
Cong Wang (11):
vxlan: defer vxlan init as late as possible
ipv6: make ip6_dst_hoplimit() static inline
ipv6: move ip6_local_out into core kernel
ipv6: export a stub for IPv6 symbols used by vxlan
ipv6: export in6addr_loopback to modules
vxlan: add ipv6 support
vxlan: respect disable_ipv6 sysctl
vxlan: add ipv6 route short circuit support
vxlan: add ipv6 proxy support
vxlan: respect scope_id for ll addr
ipv6: Add generic UDP Tunnel segmentation
drivers/net/vxlan.c | 829 ++++++++++++++++++++++++++++++++++--------
include/net/addrconf.h | 18 +
include/net/ip6_route.h | 23 +-
include/net/ndisc.h | 5 +
include/uapi/linux/if_link.h | 2 +
net/ipv6/addrconf.c | 9 -
net/ipv6/addrconf_core.c | 13 +
net/ipv6/af_inet6.c | 12 +
net/ipv6/ip6_offload.c | 4 +-
net/ipv6/ip6_output.c | 25 --
net/ipv6/ndisc.c | 8 +-
net/ipv6/output_core.c | 26 ++
net/ipv6/route.c | 19 -
net/ipv6/udp_offload.c | 153 ++++++---
14 files changed, 893 insertions(+), 253 deletions(-)
--
1.7.7.6
^ permalink raw reply [flat|nested] 23+ messages in thread* [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang ` (9 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> When vxlan is compiled as builtin, its init code runs before IPv6 init, this could cause problems if we create IPv6 socket in the latter patch. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index ba81f3c..c1258c6 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1662,7 +1662,7 @@ out2: out1: return rc; } -module_init(vxlan_init_module); +late_initcall(vxlan_init_module); static void __exit vxlan_cleanup_module(void) { -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 13:02 ` Sergei Shtylyov 2013-05-17 21:13 ` David Miller 2013-05-17 0:21 ` [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel Cong Wang ` (8 subsequent siblings) 10 siblings, 2 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> It will be used by vxlan module, so move it from ipv6 module to core kernel. I think it is small enough to be inlined. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- include/net/ip6_route.h | 23 +++++++++++++++++++++-- net/ipv6/route.c | 19 ------------------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 260f83f..7e9192e 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -21,6 +21,7 @@ struct route_info { #include <net/flow.h> #include <net/ip6_fib.h> #include <net/sock.h> +#include <net/addrconf.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/route.h> @@ -112,8 +113,6 @@ extern struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev, const struct in6_addr *addr, bool anycast); -extern int ip6_dst_hoplimit(struct dst_entry *dst); - /* * support functions for ND * @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr return dest; } +#if IS_ENABLED(CONFIG_IPV6) +static inline int ip6_dst_hoplimit(struct dst_entry *dst) +{ + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); + if (hoplimit == 0) { + struct net_device *dev = dst->dev; + struct inet6_dev *idev; + + rcu_read_lock(); + idev = __in6_dev_get(dev); + if (idev) + hoplimit = idev->cnf.hop_limit; + else + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; + rcu_read_unlock(); + } + return hoplimit; +} +#endif + #endif diff --git a/net/ipv6/route.c b/net/ipv6/route.c index ad0aa6b..0d9c531 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1310,25 +1310,6 @@ out: return entries > rt_max_size; } -int ip6_dst_hoplimit(struct dst_entry *dst) -{ - int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); - if (hoplimit == 0) { - struct net_device *dev = dst->dev; - struct inet6_dev *idev; - - rcu_read_lock(); - idev = __in6_dev_get(dev); - if (idev) - hoplimit = idev->cnf.hop_limit; - else - hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; - rcu_read_unlock(); - } - return hoplimit; -} -EXPORT_SYMBOL(ip6_dst_hoplimit); - /* * */ -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang @ 2013-05-17 13:02 ` Sergei Shtylyov 2013-05-17 21:13 ` David Miller 1 sibling, 0 replies; 23+ messages in thread From: Sergei Shtylyov @ 2013-05-17 13:02 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, David S. Miller Hello. On 17-05-2013 4:21, Cong Wang wrote: > From: Cong Wang <amwang@redhat.com> > It will be used by vxlan module, so move it from ipv6 module > to core kernel. I think it is small enough to be inlined. > Cc: David S. Miller <davem@davemloft.net> > Signed-off-by: Cong Wang <amwang@redhat.com> > --- > include/net/ip6_route.h | 23 +++++++++++++++++++++-- > net/ipv6/route.c | 19 ------------------- > 2 files changed, 21 insertions(+), 21 deletions(-) > diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h > index 260f83f..7e9192e 100644 > --- a/include/net/ip6_route.h > +++ b/include/net/ip6_route.h [...] > @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr > return dest; > } > > +#if IS_ENABLED(CONFIG_IPV6) > +static inline int ip6_dst_hoplimit(struct dst_entry *dst) > +{ > + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); Empty line wouldn't hurt here, after the declaration, like below... > + if (hoplimit == 0) { > + struct net_device *dev = dst->dev; > + struct inet6_dev *idev; > + > + rcu_read_lock(); > + idev = __in6_dev_get(dev); > + if (idev) > + hoplimit = idev->cnf.hop_limit; > + else > + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; > + rcu_read_unlock(); > + } > + return hoplimit; > +} > +#endif > + WBR, Sergei ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang 2013-05-17 13:02 ` Sergei Shtylyov @ 2013-05-17 21:13 ` David Miller 2013-05-22 4:54 ` Cong Wang 1 sibling, 1 reply; 23+ messages in thread From: David Miller @ 2013-05-17 21:13 UTC (permalink / raw) To: amwang; +Cc: netdev From: Cong Wang <amwang@redhat.com> Date: Fri, 17 May 2013 08:21:30 +0800 > @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr > return dest; > } > > +#if IS_ENABLED(CONFIG_IPV6) > +static inline int ip6_dst_hoplimit(struct dst_entry *dst) > +{ > + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); > + if (hoplimit == 0) { > + struct net_device *dev = dst->dev; > + struct inet6_dev *idev; > + > + rcu_read_lock(); > + idev = __in6_dev_get(dev); > + if (idev) > + hoplimit = idev->cnf.hop_limit; > + else > + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; > + rcu_read_unlock(); > + } > + return hoplimit; > +} > +#endif Create a dummy stub version in an #else branch here, so that you have to ifdef less in vxlan.c In fact I think you can avoid nearly every ifdef in vxlan.c if you apply this technique throughout your changes. Please do that and resubmit this series. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 21:13 ` David Miller @ 2013-05-22 4:54 ` Cong Wang 2013-05-22 7:14 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-22 4:54 UTC (permalink / raw) To: David Miller; +Cc: netdev On Fri, 2013-05-17 at 14:13 -0700, David Miller wrote: > From: Cong Wang <amwang@redhat.com> > Date: Fri, 17 May 2013 08:21:30 +0800 > > > @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr > > return dest; > > } > > > > +#if IS_ENABLED(CONFIG_IPV6) > > +static inline int ip6_dst_hoplimit(struct dst_entry *dst) > > +{ > > + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); > > + if (hoplimit == 0) { > > + struct net_device *dev = dst->dev; > > + struct inet6_dev *idev; > > + > > + rcu_read_lock(); > > + idev = __in6_dev_get(dev); > > + if (idev) > > + hoplimit = idev->cnf.hop_limit; > > + else > > + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; > > + rcu_read_unlock(); > > + } > > + return hoplimit; > > +} > > +#endif > > Create a dummy stub version in an #else branch here, so that you have > to ifdef less in vxlan.c The reason why we need #if IS_ENABLED(CONFIG_IPV6) here is that dev_net(dev)->ipv6 is defined only in such case, not just for its caller in vxlan. Nor I think anyone will seriously call ip6_dst_hoplimit() for !CONFIG_IPV6 case, since its name is obvious. > > In fact I think you can avoid nearly every ifdef in vxlan.c if you apply > this technique throughout your changes. > > Please do that and resubmit this series. > Actually that is exactly what I _did_ in v1 or RFC, IIRC, it is David Stevens who prefers to use #ifdef inside these functions, so I changed it based on his suggestion. I myself don't have any strong opinion here, either is okay, I just don't like changing it again and again. :) Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 4:54 ` Cong Wang @ 2013-05-22 7:14 ` David Miller 2013-05-22 10:28 ` Cong Wang 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2013-05-22 7:14 UTC (permalink / raw) To: amwang; +Cc: netdev From: Cong Wang <amwang@redhat.com> Date: Wed, 22 May 2013 12:54:13 +0800 > Actually that is exactly what I _did_ in v1 or RFC, IIRC, it is David > Stevens who prefers to use #ifdef inside these functions, so I changed > it based on his suggestion. > > I myself don't have any strong opinion here, either is okay, I just > don't like changing it again and again. :) The driver looks like complete shit with all the ifdefs in there, this isn't the BSD kernel. I do not want to seem them there at all. You can abstract everything behind helper functions in a header file, keep the mess there. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 7:14 ` David Miller @ 2013-05-22 10:28 ` Cong Wang 2013-05-22 15:50 ` Mike Rapoport 0 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-22 10:28 UTC (permalink / raw) To: David Miller; +Cc: netdev On Wed, 2013-05-22 at 00:14 -0700, David Miller wrote: > From: Cong Wang <amwang@redhat.com> > Date: Wed, 22 May 2013 12:54:13 +0800 > > > Actually that is exactly what I _did_ in v1 or RFC, IIRC, it is David > > Stevens who prefers to use #ifdef inside these functions, so I changed > > it based on his suggestion. > > > > I myself don't have any strong opinion here, either is okay, I just > > don't like changing it again and again. :) > > The driver looks like complete shit with all the ifdefs in there, > this isn't the BSD kernel. > > I do not want to seem them there at all. > > You can abstract everything behind helper functions in a header > file, keep the mess there. Alright, I will change all such functions in vxlan.c back to what you are suggesting. For example, change static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { #if IS_ENABLED(CONFIG_IPV6) if (a->sa.sa_family != b->sa.sa_family) return false; if (a->sa.sa_family == AF_INET6) return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); else #endif return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } to #if IS_ENABLED(CONFIG_IPV6) static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { if (a->sa.sa_family != b->sa.sa_family) return false; if (a->sa.sa_family == AF_INET6) return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); else return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } #else static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } #endif just in case I misunderstand you. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 10:28 ` Cong Wang @ 2013-05-22 15:50 ` Mike Rapoport 2013-05-22 16:03 ` Cong Wang 0 siblings, 1 reply; 23+ messages in thread From: Mike Rapoport @ 2013-05-22 15:50 UTC (permalink / raw) To: Cong Wang; +Cc: David Miller, netdev On Wed, May 22, 2013 at 1:28 PM, Cong Wang <amwang@redhat.com> wrote: > On Wed, 2013-05-22 at 00:14 -0700, David Miller wrote: >> From: Cong Wang <amwang@redhat.com> >> Date: Wed, 22 May 2013 12:54:13 +0800 >> > > Alright, I will change all such functions in vxlan.c back to what you > are suggesting. > > For example, change > > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr > *b) > { > #if IS_ENABLED(CONFIG_IPV6) > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, > &b->sin6.sin6_addr); > else > #endif > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } > > to > > #if IS_ENABLED(CONFIG_IPV6) > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr > *b) > { > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, > &b->sin6.sin6_addr); > else > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } > #else > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr > *b) > { > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } > #endif I think you can just drop #ifdefs in 90% of the cases rather than create two versions of code for IPv4 and IPv6.... > just in case I misunderstand you. > > Thanks. > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 15:50 ` Mike Rapoport @ 2013-05-22 16:03 ` Cong Wang 2013-05-22 16:10 ` Mike Rapoport 0 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-22 16:03 UTC (permalink / raw) To: Mike Rapoport; +Cc: David Miller, netdev ----- Original Message ----- > > I think you can just drop #ifdefs in 90% of the cases rather than > create two versions of code for IPv4 and IPv6.... > I know we can use memcmp(), but comparing 16+ bytes even for IPv4 is not a good idea, also we have to zalloc() every instance of union vxlan_addr. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 16:03 ` Cong Wang @ 2013-05-22 16:10 ` Mike Rapoport 2013-05-24 5:10 ` Cong Wang 2013-05-24 5:15 ` Cong Wang 0 siblings, 2 replies; 23+ messages in thread From: Mike Rapoport @ 2013-05-22 16:10 UTC (permalink / raw) To: Cong Wang; +Cc: David Miller, netdev On Wed, May 22, 2013 at 12:03:23PM -0400, Cong Wang wrote: > > > ----- Original Message ----- > > > > I think you can just drop #ifdefs in 90% of the cases rather than > > create two versions of code for IPv4 and IPv6.... > > > > I know we can use memcmp(), but comparing 16+ bytes even for IPv4 is not > a good idea, also we have to zalloc() every instance of union vxlan_addr. I've lost you here... Why not just: static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { if (a->sa.sa_family != b->sa.sa_family) return false; if (a->sa.sa_family == AF_INET6) return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); else return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } -- Sincrely yours, Mike. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 16:10 ` Mike Rapoport @ 2013-05-24 5:10 ` Cong Wang 2013-05-24 5:15 ` Cong Wang 1 sibling, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-24 5:10 UTC (permalink / raw) To: Mike Rapoport; +Cc: David Miller, netdev On Wed, 2013-05-22 at 19:10 +0300, Mike Rapoport wrote: > I've lost you here... Why not just: > > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) > { > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); > else > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } I see your point now, but for !CONFIG_IPV6, the first two 'if' is obviously useless. Is GCC smart enough to know ->sa.sa_family == AF_INET4 is always true in such case? I doubt... ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 16:10 ` Mike Rapoport 2013-05-24 5:10 ` Cong Wang @ 2013-05-24 5:15 ` Cong Wang 1 sibling, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-24 5:15 UTC (permalink / raw) To: Mike Rapoport; +Cc: David Miller, netdev On Wed, 2013-05-22 at 19:10 +0300, Mike Rapoport wrote: > I've lost you here... Why not just: > > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) > { > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); > else > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } I see your point now, but for !CONFIG_IPV6, the first two 'if' is obviously useless. Is GCC smart enough to know ->sa.sa_family == AF_INET4 is always true in such case? I doubt... ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang ` (7 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> It will be used by vxlan module too. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- net/ipv6/ip6_output.c | 25 ------------------------- net/ipv6/output_core.c | 26 ++++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 25 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d2eedf1..316895e 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -56,31 +56,6 @@ #include <net/checksum.h> #include <linux/mroute6.h> -int __ip6_local_out(struct sk_buff *skb) -{ - int len; - - len = skb->len - sizeof(struct ipv6hdr); - if (len > IPV6_MAXPLEN) - len = 0; - ipv6_hdr(skb)->payload_len = htons(len); - - return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, skb, NULL, - skb_dst(skb)->dev, dst_output); -} - -int ip6_local_out(struct sk_buff *skb) -{ - int err; - - err = __ip6_local_out(skb); - if (likely(err == 1)) - err = dst_output(skb); - - return err; -} -EXPORT_SYMBOL_GPL(ip6_local_out); - static int ip6_finish_output2(struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c index c2e73e6..030d03f 100644 --- a/net/ipv6/output_core.c +++ b/net/ipv6/output_core.c @@ -74,3 +74,29 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr) return offset; } EXPORT_SYMBOL(ip6_find_1stfragopt); + +int __ip6_local_out(struct sk_buff *skb) +{ + int len; + + len = skb->len - sizeof(struct ipv6hdr); + if (len > IPV6_MAXPLEN) + len = 0; + ipv6_hdr(skb)->payload_len = htons(len); + + return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, skb, NULL, + skb_dst(skb)->dev, dst_output); +} +EXPORT_SYMBOL_GPL(__ip6_local_out); + +int ip6_local_out(struct sk_buff *skb) +{ + int err; + + err = __ip6_local_out(skb); + if (likely(err == 1)) + err = dst_output(skb); + + return err; +} +EXPORT_SYMBOL_GPL(ip6_local_out); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (2 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules Cong Wang ` (6 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev Cc: Ben Hutchings, Bjørn Mork, Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> In case IPv6 is compiled as a module, introduce a stub for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used by vxlan module. This is an ugly but easy solution for now. Suggested-by: Ben Hutchings <bhutchings@solarflare.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Bjørn Mork <bjorn@mork.no> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- include/net/addrconf.h | 14 ++++++++++++++ net/ipv6/addrconf_core.c | 3 +++ net/ipv6/af_inet6.c | 11 +++++++++++ 3 files changed, 28 insertions(+), 0 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 84a6440..d09d42c 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -140,6 +140,20 @@ extern bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr, const struct in6_addr *src_addr); +/* A stub used by vxlan module. This is ugly, ideally these + * symbols should be built into the core kernel. + */ +struct ipv6_stub { + int (*ipv6_sock_mc_join)(struct sock *sk, int ifindex, + const struct in6_addr *addr); + int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex, + const struct in6_addr *addr); + int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst, + struct flowi6 *fl6); + void (*udpv6_encap_enable)(void); +}; +extern const struct ipv6_stub *ipv6_stub __read_mostly; + extern int ipv6_dev_mc_inc(struct net_device *dev, const struct in6_addr *addr); extern int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr); extern int ipv6_dev_mc_dec(struct net_device *dev, const struct in6_addr *addr); diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c index 7210456..3807a79 100644 --- a/net/ipv6/addrconf_core.c +++ b/net/ipv6/addrconf_core.c @@ -97,3 +97,6 @@ int inet6addr_notifier_call_chain(unsigned long val, void *v) return atomic_notifier_call_chain(&inet6addr_chain, val, v); } EXPORT_SYMBOL(inet6addr_notifier_call_chain); + +const struct ipv6_stub *ipv6_stub __read_mostly; +EXPORT_SYMBOL_GPL(ipv6_stub); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index ab5c7ad..58de055 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -808,6 +808,13 @@ static struct pernet_operations inet6_net_ops = { .exit = inet6_net_exit, }; +static const struct ipv6_stub ipv6_stub_impl = { + .ipv6_sock_mc_join = ipv6_sock_mc_join, + .ipv6_sock_mc_drop = ipv6_sock_mc_drop, + .ipv6_dst_lookup = ip6_dst_lookup, + .udpv6_encap_enable = udpv6_encap_enable, +}; + static int __init inet6_init(void) { struct list_head *r; @@ -879,6 +886,9 @@ static int __init inet6_init(void) err = igmp6_init(); if (err) goto igmp_fail; + + ipv6_stub = &ipv6_stub_impl; + err = ipv6_netfilter_init(); if (err) goto netfilter_fail; @@ -1027,6 +1037,7 @@ static void __exit inet6_exit(void) raw6_proc_exit(); #endif ipv6_netfilter_fini(); + ipv6_stub = NULL; igmp6_cleanup(); ndisc_cleanup(); ip6_mr_cleanup(); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (3 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 06/11] vxlan: add ipv6 support Cong Wang ` (5 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: Mike Rapoport, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> It is needed by vxlan module. Noticed by Mike. Cc: Mike Rapoport <mike.rapoport@ravellosystems.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- net/ipv6/addrconf.c | 9 --------- net/ipv6/addrconf_core.c | 10 ++++++++++ 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index d1ab6ab..650a109 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -238,15 +238,6 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .accept_dad = 1, }; -/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ -const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; -const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; -const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT; -const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT; -const struct in6_addr in6addr_interfacelocal_allnodes = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT; -const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT; -const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT; - /* Check if a valid qdisc is available */ static inline bool addrconf_qdisc_ok(const struct net_device *dev) { diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c index 3807a79..cb991dd 100644 --- a/net/ipv6/addrconf_core.c +++ b/net/ipv6/addrconf_core.c @@ -100,3 +100,13 @@ EXPORT_SYMBOL(inet6addr_notifier_call_chain); const struct ipv6_stub *ipv6_stub __read_mostly; EXPORT_SYMBOL_GPL(ipv6_stub); + +/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ +const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; +EXPORT_SYMBOL(in6addr_loopback); +const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; +const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT; +const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT; +const struct in6_addr in6addr_interfacelocal_allnodes = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT; +const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT; +const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 06/11] vxlan: add ipv6 support 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (4 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang ` (4 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David Stevens, Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <dlstevens@us.ibm.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 698 +++++++++++++++++++++++++++++++++--------- include/uapi/linux/if_link.h | 2 + 2 files changed, 560 insertions(+), 140 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index c1258c6..46c59a6 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -6,9 +6,6 @@ * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. - * - * TODO - * - IPv6 (not in RFC) */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -41,6 +38,11 @@ #include <net/inet_ecn.h> #include <net/net_namespace.h> #include <net/netns/generic.h> +#if IS_ENABLED(CONFIG_IPV6) +#include <net/addrconf.h> +#include <net/ip6_route.h> +#include <net/ip6_tunnel.h> +#endif #define VXLAN_VERSION "0.1" @@ -55,6 +57,8 @@ #define VXLAN_VID_MASK (VXLAN_N_VID - 1) /* IP header + UDP + VXLAN + Ethernet header */ #define VXLAN_HEADROOM (20 + 8 + 8 + 14) +/* IPv6 header + UDP + VXLAN + Ethernet header */ +#define VXLAN6_HEADROOM (40 + 8 + 8 + 14) #define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags required value. */ @@ -76,16 +80,27 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); +#if IS_ENABLED(CONFIG_IPV6) +static bool ipv6_disabled = false; +#endif + /* per-net private data for this module */ static unsigned int vxlan_net_id; struct vxlan_net { struct socket *sock; /* UDP encap socket */ + struct socket *sock6; struct hlist_head vni_list[VNI_HASH_SIZE]; }; +union vxlan_addr { + struct sockaddr_in sin; + struct sockaddr_in6 sin6; + struct sockaddr sa; +}; + struct vxlan_rdst { struct rcu_head rcu; - __be32 remote_ip; + union vxlan_addr remote_ip; __be16 remote_port; u32 remote_vni; u32 remote_ifindex; @@ -109,7 +124,7 @@ struct vxlan_dev { struct hlist_node hlist; struct net_device *dev; struct vxlan_rdst default_dst; /* default destination */ - __be32 saddr; /* source address */ + union vxlan_addr saddr; /* source address */ __be16 dst_port; __u16 port_min; /* source port range */ __u16 port_max; @@ -132,6 +147,69 @@ struct vxlan_dev { #define VXLAN_F_L2MISS 0x08 #define VXLAN_F_L3MISS 0x10 +static inline +bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (a->sa.sa_family != b->sa.sa_family) + return false; + if (a->sa.sa_family == AF_INET6) + return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); + else +#endif + return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; +} + +static inline bool vxlan_addr_any(const union vxlan_addr *ipa) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (ipa->sa.sa_family == AF_INET6) + return ipv6_addr_any(&ipa->sin6.sin6_addr); + else +#endif + return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY); +} + +static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (ipa->sa.sa_family == AF_INET6) + return ipv6_addr_is_multicast(&ipa->sin6.sin6_addr); + else +#endif + return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr)); +} + +static int vxlan_nla_get_addr(union vxlan_addr *ip, struct nlattr *nla) +{ + if (nla_len(nla) >= sizeof(struct in6_addr)) { +#if IS_ENABLED(CONFIG_IPV6) + nla_memcpy(&ip->sin6.sin6_addr, nla, sizeof(struct in6_addr)); + ip->sa.sa_family = AF_INET6; + return 0; +#else + return -EAFNOSUPPORT; +#endif + } else if (nla_len(nla) >= sizeof(__be32)) { + ip->sin.sin_addr.s_addr = nla_get_be32(nla); + ip->sa.sa_family = AF_INET; + return 0; + } else { + return -EAFNOSUPPORT; + } +} + +static int vxlan_nla_put_addr(struct sk_buff *skb, int attr, + const union vxlan_addr *ip) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (ip->sa.sa_family == AF_INET6) + return nla_put(skb, attr, sizeof(struct in6_addr), &ip->sin6.sin6_addr); + else +#endif + return nla_put_be32(skb, attr, ip->sin.sin_addr.s_addr); +} + /* salt for hash table */ static u32 vxlan_salt __read_mostly; @@ -178,7 +256,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, if (type == RTM_GETNEIGH) { ndm->ndm_family = AF_INET; - send_ip = rdst->remote_ip != htonl(INADDR_ANY); + send_ip = !vxlan_addr_any(&rdst->remote_ip); send_eth = !is_zero_ether_addr(fdb->eth_addr); } else ndm->ndm_family = AF_BRIDGE; @@ -190,7 +268,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr)) goto nla_put_failure; - if (send_ip && nla_put_be32(skb, NDA_DST, rdst->remote_ip)) + if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip)) goto nla_put_failure; if (rdst->remote_port && rdst->remote_port != vxlan->dst_port && @@ -222,7 +300,7 @@ static inline size_t vxlan_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN) /* NDA_LLADDR */ - + nla_total_size(sizeof(__be32)) /* NDA_DST */ + + nla_total_size(sizeof(struct in6_addr)) /* NDA_DST */ + nla_total_size(sizeof(__be16)) /* NDA_PORT */ + nla_total_size(sizeof(__be32)) /* NDA_VNI */ + nla_total_size(sizeof(__u32)) /* NDA_IFINDEX */ @@ -255,14 +333,14 @@ errout: rtnl_set_sk_err(net, RTNLGRP_NEIGH, err); } -static void vxlan_ip_miss(struct net_device *dev, __be32 ipa) +static void vxlan_ip_miss(struct net_device *dev, union vxlan_addr *ipa) { struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_fdb f; memset(&f, 0, sizeof f); f.state = NUD_STALE; - f.remote.remote_ip = ipa; /* goes to NDA_DST */ + f.remote.remote_ip = *ipa; /* goes to NDA_DST */ f.remote.remote_vni = VXLAN_N_VID; vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH); @@ -317,14 +395,14 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan, } /* Add/update destinations for multicast */ -static int vxlan_fdb_append(struct vxlan_fdb *f, - __be32 ip, __be16 port, __u32 vni, __u32 ifindex) +static int vxlan_fdb_append(struct vxlan_fdb *f, union vxlan_addr *ip, + __be16 port, __u32 vni, __u32 ifindex) { struct vxlan_rdst *rd_prev, *rd; rd_prev = NULL; for (rd = &f->remote; rd; rd = rd->remote_next) { - if (rd->remote_ip == ip && + if (vxlan_addr_equal(&rd->remote_ip, ip) && rd->remote_port == port && rd->remote_vni == vni && rd->remote_ifindex == ifindex) @@ -334,7 +412,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f, rd = kmalloc(sizeof(*rd), GFP_ATOMIC); if (rd == NULL) return -ENOBUFS; - rd->remote_ip = ip; + rd->remote_ip = *ip; rd->remote_port = port; rd->remote_vni = vni; rd->remote_ifindex = ifindex; @@ -345,7 +423,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f, /* Add new entry to forwarding table -- assumes lock held */ static int vxlan_fdb_create(struct vxlan_dev *vxlan, - const u8 *mac, __be32 ip, + const u8 *mac, union vxlan_addr *ip, __u16 state, __u16 flags, __be16 port, __u32 vni, __u32 ifindex, __u8 ndm_flags) @@ -385,13 +463,20 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan, if (vxlan->addrmax && vxlan->addrcnt >= vxlan->addrmax) return -ENOSPC; - netdev_dbg(vxlan->dev, "add %pM -> %pI4\n", mac, &ip); +#if IS_ENABLED(CONFIG_IPV6) + if (ip->sa.sa_family == AF_INET6) + netdev_dbg(vxlan->dev, "add %pM -> %pI6\n", mac, + &ip->sin6.sin6_addr); + else +#endif + netdev_dbg(vxlan->dev, "add %pM -> %pI4\n", mac, + &ip->sin.sin_addr.s_addr); f = kmalloc(sizeof(*f), GFP_ATOMIC); if (!f) return -ENOMEM; notify = 1; - f->remote.remote_ip = ip; + f->remote.remote_ip = *ip; f->remote.remote_port = port; f->remote.remote_vni = vni; f->remote.remote_ifindex = ifindex; @@ -444,7 +529,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], { struct vxlan_dev *vxlan = netdev_priv(dev); struct net *net = dev_net(vxlan->dev); - __be32 ip; + union vxlan_addr ip; __be16 port; u32 vni, ifindex; int err; @@ -458,10 +543,9 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], if (tb[NDA_DST] == NULL) return -EINVAL; - if (nla_len(tb[NDA_DST]) != sizeof(__be32)) - return -EAFNOSUPPORT; - - ip = nla_get_be32(tb[NDA_DST]); + err = vxlan_nla_get_addr(&ip, tb[NDA_DST]); + if (err) + return err; if (tb[NDA_PORT]) { if (nla_len(tb[NDA_PORT]) != sizeof(__be16)) @@ -491,7 +575,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], ifindex = 0; spin_lock_bh(&vxlan->hash_lock); - err = vxlan_fdb_create(vxlan, addr, ip, ndm->ndm_state, flags, + err = vxlan_fdb_create(vxlan, addr, &ip, ndm->ndm_state, flags, port, vni, ifindex, ndm->ndm_flags); spin_unlock_bh(&vxlan->hash_lock); @@ -555,7 +639,7 @@ skip: * and Tunnel endpoint. */ static void vxlan_snoop(struct net_device *dev, - __be32 src_ip, const u8 *src_mac) + union vxlan_addr *src_ip, const u8 *src_mac) { struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_fdb *f; @@ -564,15 +648,25 @@ static void vxlan_snoop(struct net_device *dev, f = vxlan_find_mac(vxlan, src_mac); if (likely(f)) { f->used = jiffies; - if (likely(f->remote.remote_ip == src_ip)) + if (likely(vxlan_addr_equal(&f->remote.remote_ip, src_ip))) return; - if (net_ratelimit()) - netdev_info(dev, - "%pM migrated from %pI4 to %pI4\n", - src_mac, &f->remote.remote_ip, &src_ip); + if (net_ratelimit()) { +#if IS_ENABLED(CONFIG_IPV6) + if (src_ip->sa.sa_family == AF_INET6) + netdev_info(dev, + "%pM migrated from %pI6 to %pI6\n", + src_mac, &f->remote.remote_ip.sin6.sin6_addr, + &src_ip->sin6.sin6_addr); + else +#endif + netdev_info(dev, + "%pM migrated from %pI4 to %pI4\n", + src_mac, &f->remote.remote_ip.sin.sin_addr.s_addr, + &src_ip->sin.sin_addr.s_addr); + } - f->remote.remote_ip = src_ip; + f->remote.remote_ip = *src_ip; f->updated = jiffies; } else { /* learned new entry */ @@ -603,7 +697,8 @@ static bool vxlan_group_used(struct vxlan_net *vn, if (!netif_running(vxlan->dev)) continue; - if (vxlan->default_dst.remote_ip == this->default_dst.remote_ip) + if (vxlan_addr_equal(&vxlan->default_dst.remote_ip, + &this->default_dst.remote_ip)) return true; } @@ -616,11 +711,12 @@ static int vxlan_join_group(struct net_device *dev) struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); struct sock *sk = vn->sock->sk; + union vxlan_addr *ip = &vxlan->default_dst.remote_ip; struct ip_mreqn mreq = { - .imr_multiaddr.s_addr = vxlan->default_dst.remote_ip, + .imr_multiaddr.s_addr = ip->sin.sin_addr.s_addr, .imr_ifindex = vxlan->default_dst.remote_ifindex, }; - int err; + int err = 0; /* Already a member of group */ if (vxlan_group_used(vn, vxlan)) @@ -628,8 +724,17 @@ static int vxlan_join_group(struct net_device *dev) /* Need to drop RTNL to call multicast join */ rtnl_unlock(); - lock_sock(sk); - err = ip_mc_join_group(sk, &mreq); + if (ip->sa.sa_family == AF_INET) { + lock_sock(sk); + err = ip_mc_join_group(sk, &mreq); +#if IS_ENABLED(CONFIG_IPV6) + } else { + sk = vn->sock6->sk; + lock_sock(sk); + err = ipv6_stub->ipv6_sock_mc_join(sk, vxlan->default_dst.remote_ifindex, + &ip->sin6.sin6_addr); +#endif + } release_sock(sk); rtnl_lock(); @@ -644,8 +749,9 @@ static int vxlan_leave_group(struct net_device *dev) struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); int err = 0; struct sock *sk = vn->sock->sk; + union vxlan_addr *ip = &vxlan->default_dst.remote_ip; struct ip_mreqn mreq = { - .imr_multiaddr.s_addr = vxlan->default_dst.remote_ip, + .imr_multiaddr.s_addr = ip->sin.sin_addr.s_addr, .imr_ifindex = vxlan->default_dst.remote_ifindex, }; @@ -655,8 +761,17 @@ static int vxlan_leave_group(struct net_device *dev) /* Need to drop RTNL to call multicast leave */ rtnl_unlock(); - lock_sock(sk); - err = ip_mc_leave_group(sk, &mreq); + if (ip->sa.sa_family == AF_INET) { + lock_sock(sk); + err = ip_mc_leave_group(sk, &mreq); +#if IS_ENABLED(CONFIG_IPV6) + } else { + sk = vn->sock6->sk; + lock_sock(sk); + err = ipv6_stub->ipv6_sock_mc_drop(sk, vxlan->default_dst.remote_ifindex, + &ip->sin6.sin6_addr); +#endif + } release_sock(sk); rtnl_lock(); @@ -666,12 +781,16 @@ static int vxlan_leave_group(struct net_device *dev) /* Callback from net/ipv4/udp.c to receive packets */ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) { - struct iphdr *oip; + struct iphdr *oip = NULL; +#if IS_ENABLED(CONFIG_IPV6) + struct ipv6hdr *oip6 = NULL; +#endif struct vxlanhdr *vxh; struct vxlan_dev *vxlan; struct pcpu_tstats *stats; + union vxlan_addr src_ip; __u32 vni; - int err; + int err = 0; /* pop off outer UDP header */ __skb_pull(skb, sizeof(struct udphdr)); @@ -708,7 +827,13 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) skb_reset_mac_header(skb); /* Re-examine inner Ethernet packet */ - oip = ip_hdr(skb); + if (skb->protocol == htons(ETH_P_IP)) + oip = ip_hdr(skb); +#if IS_ENABLED(CONFIG_IPV6) + if (skb->protocol == htons(ETH_P_IPV6)) + oip6 = ipv6_hdr(skb); +#endif + skb->protocol = eth_type_trans(skb, vxlan->dev); /* Ignore packet loops (and multicast echo) */ @@ -716,8 +841,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) vxlan->dev->dev_addr) == 0) goto drop; - if (vxlan->flags & VXLAN_F_LEARN) - vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source); + if (vxlan->flags & VXLAN_F_LEARN) { + if (oip) { + src_ip.sin.sin_addr.s_addr = oip->saddr; + src_ip.sa.sa_family = AF_INET; + } +#if IS_ENABLED(CONFIG_IPV6) + if (oip6) { + src_ip.sin6.sin6_addr = oip6->saddr; + src_ip.sa.sa_family = AF_INET6; + } +#endif + vxlan_snoop(skb->dev, &src_ip, eth_hdr(skb)->h_source); + } __skb_tunnel_rx(skb, vxlan->dev); skb_reset_network_header(skb); @@ -733,11 +869,24 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) skb->encapsulation = 0; - err = IP_ECN_decapsulate(oip, skb); +#if IS_ENABLED(CONFIG_IPV6) + if (oip6) + err = IP6_ECN_decapsulate(oip6, skb); +#endif + if (oip) + err = IP_ECN_decapsulate(oip, skb); + if (unlikely(err)) { - if (log_ecn_error) - net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n", - &oip->saddr, oip->tos); + if (log_ecn_error) { +#if IS_ENABLED(CONFIG_IPV6) + if (oip6) + net_info_ratelimited("non-ECT from %pI6\n", + &oip6->saddr); +#endif + if (oip) + net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n", + &oip->saddr, oip->tos); + } if (err > 1) { ++vxlan->dev->stats.rx_frame_errors; ++vxlan->dev->stats.rx_errors; @@ -772,6 +921,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb) u8 *arpptr, *sha; __be32 sip, tip; struct neighbour *n; + union vxlan_addr ipa; if (dev->flags & IFF_NOARP) goto out; @@ -813,7 +963,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb) } f = vxlan_find_mac(vxlan, n->ha); - if (f && f->remote.remote_ip == htonl(INADDR_ANY)) { + if (f && vxlan_addr_any(&f->remote.remote_ip)) { /* bridge-local neighbor */ neigh_release(n); goto out; @@ -831,8 +981,11 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb) if (netif_rx_ni(reply) == NET_RX_DROP) dev->stats.rx_dropped++; - } else if (vxlan->flags & VXLAN_F_L3MISS) - vxlan_ip_miss(dev, tip); + } else if (vxlan->flags & VXLAN_F_L3MISS) { + ipa.sin.sin_addr.s_addr = tip; + ipa.sa.sa_family = AF_INET; + vxlan_ip_miss(dev, &ipa); + } out: consume_skb(skb); return NETDEV_TX_OK; @@ -854,6 +1007,14 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; pip = ip_hdr(skb); n = neigh_lookup(&arp_tbl, &pip->daddr, dev); + if (!n && vxlan->flags & VXLAN_F_L3MISS) { + union vxlan_addr ipa; + ipa.sin.sin_addr.s_addr = pip->daddr; + ipa.sa.sa_family = AF_INET; + vxlan_ip_miss(dev, &ipa); + return false; + } + break; default: return false; @@ -870,8 +1031,8 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) } neigh_release(n); return diff; - } else if (vxlan->flags & VXLAN_F_L3MISS) - vxlan_ip_miss(dev, pip->daddr); + } + return false; } @@ -881,10 +1042,11 @@ static void vxlan_sock_free(struct sk_buff *skb) } /* On transmit, associate with the tunnel socket */ -static void vxlan_set_owner(struct net_device *dev, struct sk_buff *skb) +static void vxlan_set_owner(struct net_device *dev, struct sk_buff *skb, + bool ipv6) { struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); - struct sock *sk = vn->sock->sk; + struct sock *sk = ipv6 ? vn->sock6->sk : vn->sock->sk; skb_orphan(skb); sock_hold(sk); @@ -930,15 +1092,26 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, { struct pcpu_tstats *tx_stats = this_cpu_ptr(src_vxlan->dev->tstats); struct pcpu_tstats *rx_stats = this_cpu_ptr(dst_vxlan->dev->tstats); + union vxlan_addr loopback; skb->pkt_type = PACKET_HOST; skb->encapsulation = 0; skb->dev = dst_vxlan->dev; __skb_pull(skb, skb_network_offset(skb)); + if (dst_vxlan->default_dst.remote_ip.sa.sa_family == AF_INET) { + loopback.sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + loopback.sa.sa_family = AF_INET; + } +#if IS_ENABLED(CONFIG_IPV6) + else { + loopback.sin6.sin6_addr = in6addr_loopback; + loopback.sa.sa_family = AF_INET6; + } +#endif + if (dst_vxlan->flags & VXLAN_F_LEARN) - vxlan_snoop(skb->dev, htonl(INADDR_LOOPBACK), - eth_hdr(skb)->h_source); + vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source); u64_stats_update_begin(&tx_stats->syncp); tx_stats->tx_packets++; @@ -960,22 +1133,29 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, { struct vxlan_dev *vxlan = netdev_priv(dev); struct rtable *rt; - const struct iphdr *old_iph; + const struct iphdr *old_iph = NULL; struct iphdr *iph; struct vxlanhdr *vxh; struct udphdr *uh; struct flowi4 fl4; - __be32 dst; - __be16 src_port, dst_port; +#if IS_ENABLED(CONFIG_IPV6) + struct flowi6 fl6; + struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); + struct sock *sk = vn->sock6->sk; + struct ipv6hdr *ip6h; +#endif + const union vxlan_addr *dst; + struct dst_entry *ndst = NULL; + __be16 src_port = 0, dst_port; u32 vni; __be16 df = 0; __u8 tos, ttl; dst_port = rdst->remote_port ? rdst->remote_port : vxlan->dst_port; vni = rdst->remote_vni; - dst = rdst->remote_ip; + dst = &rdst->remote_ip; - if (!dst) { + if (vxlan_addr_any(dst)) { if (did_rsc) { /* short-circuited back to local bridge */ vxlan_encap_bypass(skb, vxlan, vxlan); @@ -989,60 +1169,119 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, skb->encapsulation = 1; } - /* Need space for new headers (invalidates iph ptr) */ - if (skb_cow_head(skb, VXLAN_HEADROOM)) - goto drop; + ttl = vxlan->ttl; + tos = vxlan->tos; + if (dst->sa.sa_family == AF_INET) { + /* Need space for new headers (invalidates iph ptr) */ + if (skb_cow_head(skb, VXLAN_HEADROOM)) + goto drop; - old_iph = ip_hdr(skb); + old_iph = ip_hdr(skb); + if (!ttl && IN_MULTICAST(ntohl(dst->sin.sin_addr.s_addr))) + ttl = 1; - ttl = vxlan->ttl; - if (!ttl && IN_MULTICAST(ntohl(dst))) - ttl = 1; + if (tos == 1) + tos = ip_tunnel_get_dsfield(old_iph, skb); - tos = vxlan->tos; - if (tos == 1) - tos = ip_tunnel_get_dsfield(old_iph, skb); - - src_port = vxlan_src_port(vxlan, skb); - - memset(&fl4, 0, sizeof(fl4)); - fl4.flowi4_oif = rdst->remote_ifindex; - fl4.flowi4_tos = RT_TOS(tos); - fl4.daddr = dst; - fl4.saddr = vxlan->saddr; - - rt = ip_route_output_key(dev_net(dev), &fl4); - if (IS_ERR(rt)) { - netdev_dbg(dev, "no route to %pI4\n", &dst); - dev->stats.tx_carrier_errors++; - goto tx_error; - } + src_port = vxlan_src_port(vxlan, skb); - if (rt->dst.dev == dev) { - netdev_dbg(dev, "circular route to %pI4\n", &dst); - ip_rt_put(rt); - dev->stats.collisions++; - goto tx_error; - } + memset(&fl4, 0, sizeof(fl4)); + fl4.flowi4_oif = rdst->remote_ifindex; + fl4.flowi4_tos = RT_TOS(tos); + fl4.daddr = dst->sin.sin_addr.s_addr; + fl4.saddr = vxlan->saddr.sin.sin_addr.s_addr; + + rt = ip_route_output_key(dev_net(dev), &fl4); + if (IS_ERR(rt)) { + netdev_dbg(dev, "no route to %pI4\n", + &dst->sin.sin_addr.s_addr); + dev->stats.tx_carrier_errors++; + goto tx_error; + } + + if (rt->dst.dev == dev) { + netdev_dbg(dev, "circular route to %pI4\n", + &dst->sin.sin_addr.s_addr); + ip_rt_put(rt); + dev->stats.collisions++; + goto tx_error; + } + + /* Bypass encapsulation if the destination is local */ + if (rt->rt_flags & RTCF_LOCAL && + !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { + struct vxlan_dev *dst_vxlan; + + ip_rt_put(rt); + dst_vxlan = vxlan_find_vni(dev_net(dev), vni); + if (!dst_vxlan) + goto tx_error; + vxlan_encap_bypass(skb, vxlan, dst_vxlan); + return NETDEV_TX_OK; + } + + ndst = &rt->dst; +#if IS_ENABLED(CONFIG_IPV6) + } else { + const struct ipv6hdr *old_iph6; + u32 flags; + + /* Need space for new headers (invalidates ipv6h ptr) */ + if (skb_cow_head(skb, VXLAN6_HEADROOM)) + goto drop; + + old_iph6 = ipv6_hdr(skb); + if (!ttl && ipv6_addr_is_multicast(&dst->sin6.sin6_addr)) + ttl = 1; + + if (tos == 1) + tos = ipv6_get_dsfield(old_iph6); + + src_port = vxlan_src_port(vxlan, skb); - /* Bypass encapsulation if the destination is local */ - if (rt->rt_flags & RTCF_LOCAL && - !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { - struct vxlan_dev *dst_vxlan; + memset(&fl6, 0, sizeof(fl6)); + fl6.flowi6_oif = rdst->remote_ifindex; + fl6.flowi6_tos = RT_TOS(tos); + fl6.daddr = dst->sin6.sin6_addr; + fl6.saddr = vxlan->saddr.sin6.sin6_addr; + fl6.flowi6_proto = skb->protocol; - ip_rt_put(rt); - dst_vxlan = vxlan_find_vni(dev_net(dev), vni); - if (!dst_vxlan) + if (ipv6_stub->ipv6_dst_lookup(sk, &ndst, &fl6)) { + netdev_dbg(dev, "no route to %pI6\n", + &dst->sin6.sin6_addr); + dev->stats.tx_carrier_errors++; goto tx_error; - vxlan_encap_bypass(skb, vxlan, dst_vxlan); - return NETDEV_TX_OK; + } + + if (ndst->dev == dev) { + netdev_dbg(dev, "circular route to %pI6\n", + &dst->sin6.sin6_addr); + dst_release(ndst); + dev->stats.collisions++; + goto tx_error; + } + + /* Bypass encapsulation if the destination is local */ + flags = ((struct rt6_info *)ndst)->rt6i_flags; + if (flags & RTF_LOCAL && + !(flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { + struct vxlan_dev *dst_vxlan; + + dst_release(ndst); + dst_vxlan = vxlan_find_vni(dev_net(dev), vni); + if (!dst_vxlan) + goto tx_error; + vxlan_encap_bypass(skb, vxlan, dst_vxlan); + return NETDEV_TX_OK; + } +#endif } memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED | IPSKB_REROUTED); skb_dst_drop(skb); - skb_dst_set(skb, &rt->dst); + skb_dst_set(skb, ndst); vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh)); vxh->vx_flags = htonl(VXLAN_FLAGS); @@ -1058,27 +1297,65 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, uh->len = htons(skb->len); uh->check = 0; - __skb_push(skb, sizeof(*iph)); - skb_reset_network_header(skb); - iph = ip_hdr(skb); - iph->version = 4; - iph->ihl = sizeof(struct iphdr) >> 2; - iph->frag_off = df; - iph->protocol = IPPROTO_UDP; - iph->tos = ip_tunnel_ecn_encap(tos, old_iph, skb); - iph->daddr = dst; - iph->saddr = fl4.saddr; - iph->ttl = ttl ? : ip4_dst_hoplimit(&rt->dst); - tunnel_ip_select_ident(skb, old_iph, &rt->dst); - - nf_reset(skb); + if (dst->sa.sa_family == AF_INET) { + __skb_push(skb, sizeof(*iph)); + skb_reset_network_header(skb); + iph = ip_hdr(skb); + iph->version = 4; + iph->ihl = sizeof(struct iphdr) >> 2; + iph->frag_off = df; + iph->protocol = IPPROTO_UDP; + iph->tos = ip_tunnel_ecn_encap(tos, old_iph, skb); + iph->daddr = dst->sin.sin_addr.s_addr; + iph->saddr = fl4.saddr; + iph->ttl = ttl ? : ip4_dst_hoplimit(ndst); + tunnel_ip_select_ident(skb, old_iph, ndst); + + vxlan_set_owner(dev, skb, false); +#if IS_ENABLED(CONFIG_IPV6) + } else { + if (!skb_is_gso(skb) && !(ndst->dev->features & NETIF_F_IPV6_CSUM)) { + __wsum csum = skb_checksum(skb, 0, skb->len, 0); + skb->ip_summed = CHECKSUM_UNNECESSARY; + uh->check = csum_ipv6_magic(&fl6.saddr, &fl6.daddr, skb->len, + IPPROTO_UDP, csum); + if (uh->check == 0) + uh->check = CSUM_MANGLED_0; + } else { + skb->ip_summed = CHECKSUM_PARTIAL; + skb->csum_start = skb_transport_header(skb) - skb->head; + skb->csum_offset = offsetof(struct udphdr, check); + uh->check = ~csum_ipv6_magic(&fl6.saddr, &fl6.daddr, + skb->len, IPPROTO_UDP, 0); + } - vxlan_set_owner(dev, skb); + __skb_push(skb, sizeof(*ip6h)); + skb_reset_network_header(skb); + ip6h = ipv6_hdr(skb); + ip6h->version = 6; + ip6h->priority = 0; + ip6h->flow_lbl[0] = 0; + ip6h->flow_lbl[1] = 0; + ip6h->flow_lbl[2] = 0; + ip6h->payload_len = htons(skb->len); + ip6h->nexthdr = IPPROTO_UDP; + ip6h->hop_limit = ttl ? : ip6_dst_hoplimit(ndst); + ip6h->daddr = fl6.daddr; + ip6h->saddr = fl6.saddr; + + vxlan_set_owner(dev, skb, true); +#endif + } if (handle_offloads(skb)) goto drop; - iptunnel_xmit(skb, dev); +#if IS_ENABLED(CONFIG_IPV6) + if (dst->sa.sa_family == AF_INET6) + ip6tunnel_xmit(skb, dev); + else +#endif + iptunnel_xmit(skb, dev); return NETDEV_TX_OK; drop: @@ -1126,7 +1403,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) if (f == NULL) { rdst0 = &vxlan->default_dst; - if (rdst0->remote_ip == htonl(INADDR_ANY) && + if (vxlan_addr_any(&rdst0->remote_ip) && (vxlan->flags & VXLAN_F_L2MISS) && !is_multicast_ether_addr(eth->h_dest)) vxlan_fdb_miss(vxlan, eth->h_dest); @@ -1204,7 +1481,7 @@ static int vxlan_open(struct net_device *dev) struct vxlan_dev *vxlan = netdev_priv(dev); int err; - if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip))) { + if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip)) { err = vxlan_join_group(dev); if (err) return err; @@ -1238,7 +1515,7 @@ static int vxlan_stop(struct net_device *dev) { struct vxlan_dev *vxlan = netdev_priv(dev); - if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip))) + if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip)) vxlan_leave_group(dev); del_timer_sync(&vxlan->age_timer); @@ -1288,7 +1565,12 @@ static void vxlan_setup(struct net_device *dev) eth_hw_addr_random(dev); ether_setup(dev); - dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM; +#if IS_ENABLED(CONFIG_IPV6) + if (vxlan->default_dst.remote_ip.sa.sa_family == AF_INET6) + dev->hard_header_len = ETH_HLEN + VXLAN6_HEADROOM; + else +#endif + dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM; dev->netdev_ops = &vxlan_netdev_ops; dev->destructor = vxlan_free; @@ -1326,8 +1608,10 @@ static void vxlan_setup(struct net_device *dev) static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = { [IFLA_VXLAN_ID] = { .type = NLA_U32 }, [IFLA_VXLAN_GROUP] = { .len = FIELD_SIZEOF(struct iphdr, daddr) }, + [IFLA_VXLAN_GROUP6] = { .len = sizeof(struct in6_addr) }, [IFLA_VXLAN_LINK] = { .type = NLA_U32 }, [IFLA_VXLAN_LOCAL] = { .len = FIELD_SIZEOF(struct iphdr, saddr) }, + [IFLA_VXLAN_LOCAL6] = { .len = sizeof(struct in6_addr) }, [IFLA_VXLAN_TOS] = { .type = NLA_U8 }, [IFLA_VXLAN_TTL] = { .type = NLA_U8 }, [IFLA_VXLAN_LEARNING] = { .type = NLA_U8 }, @@ -1408,11 +1692,37 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, } dst->remote_vni = vni; - if (data[IFLA_VXLAN_GROUP]) - dst->remote_ip = nla_get_be32(data[IFLA_VXLAN_GROUP]); + if (data[IFLA_VXLAN_GROUP]) { + dst->remote_ip.sin.sin_addr.s_addr = nla_get_be32(data[IFLA_VXLAN_GROUP]); + dst->remote_ip.sa.sa_family = AF_INET; + } else if (data[IFLA_VXLAN_GROUP6]) { +#if IS_ENABLED(CONFIG_IPV6) + if (ipv6_disabled) + return -EPFNOSUPPORT; + + nla_memcpy(&dst->remote_ip.sin6.sin6_addr, data[IFLA_VXLAN_GROUP6], + sizeof(struct in6_addr)); + dst->remote_ip.sa.sa_family = AF_INET6; +#else + return -EPFNOSUPPORT; +#endif + } - if (data[IFLA_VXLAN_LOCAL]) - vxlan->saddr = nla_get_be32(data[IFLA_VXLAN_LOCAL]); + if (data[IFLA_VXLAN_LOCAL]) { + vxlan->saddr.sin.sin_addr.s_addr = nla_get_be32(data[IFLA_VXLAN_LOCAL]); + vxlan->saddr.sa.sa_family = AF_INET; + } else if (data[IFLA_VXLAN_LOCAL6]) { +#if IS_ENABLED(CONFIG_IPV6) + if (ipv6_disabled) + return -EPFNOSUPPORT; + + nla_memcpy(&vxlan->saddr.sin6.sin6_addr, data[IFLA_VXLAN_LOCAL6], + sizeof(struct in6_addr)); + vxlan->saddr.sa.sa_family = AF_INET6; +#else + return -EPFNOSUPPORT; +#endif + } if (data[IFLA_VXLAN_LINK] && (dst->remote_ifindex = nla_get_u32(data[IFLA_VXLAN_LINK]))) { @@ -1493,9 +1803,9 @@ static size_t vxlan_get_size(const struct net_device *dev) { return nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_ID */ - nla_total_size(sizeof(__be32)) +/* IFLA_VXLAN_GROUP */ + nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_GROUP{6} */ nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_LINK */ - nla_total_size(sizeof(__be32))+ /* IFLA_VXLAN_LOCAL */ + nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_LOCAL{6} */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TTL */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TOS */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_LEARNING */ @@ -1522,14 +1832,36 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put_u32(skb, IFLA_VXLAN_ID, dst->remote_vni)) goto nla_put_failure; - if (dst->remote_ip && nla_put_be32(skb, IFLA_VXLAN_GROUP, dst->remote_ip)) - goto nla_put_failure; + if (!vxlan_addr_any(&dst->remote_ip)) { + if (dst->remote_ip.sa.sa_family == AF_INET) { + if (nla_put_be32(skb, IFLA_VXLAN_GROUP, + dst->remote_ip.sin.sin_addr.s_addr)) + goto nla_put_failure; + } else { +#if IS_ENABLED(CONFIG_IPV6) + if (nla_put(skb, IFLA_VXLAN_GROUP6, sizeof(struct in6_addr), + &dst->remote_ip.sin6.sin6_addr)) + goto nla_put_failure; +#endif + } + } if (dst->remote_ifindex && nla_put_u32(skb, IFLA_VXLAN_LINK, dst->remote_ifindex)) goto nla_put_failure; - if (vxlan->saddr && nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr)) - goto nla_put_failure; + if (!vxlan_addr_any(&vxlan->saddr)) { + if (vxlan->saddr.sa.sa_family == AF_INET) { + if (nla_put_be32(skb, IFLA_VXLAN_LOCAL, + vxlan->saddr.sin.sin_addr.s_addr)) + goto nla_put_failure; + } else { +#if IS_ENABLED(CONFIG_IPV6) + if (nla_put(skb, IFLA_VXLAN_LOCAL6, sizeof(struct in6_addr), + &vxlan->saddr.sin6.sin6_addr)) + goto nla_put_failure; +#endif + } + } if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) || nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) || @@ -1569,18 +1901,17 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = { .fill_info = vxlan_fill_info, }; -static __net_init int vxlan_init_net(struct net *net) +static __net_init int create_v4_sock(struct net *net) { - struct vxlan_net *vn = net_generic(net, vxlan_net_id); struct sock *sk; + struct vxlan_net *vn = net_generic(net, vxlan_net_id); struct sockaddr_in vxlan_addr = { .sin_family = AF_INET, + .sin_port = htons(vxlan_port), .sin_addr.s_addr = htonl(INADDR_ANY), }; int rc; - unsigned h; - /* Create UDP socket for encapsulation receive. */ rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &vn->sock); if (rc < 0) { pr_debug("UDP socket create failed\n"); @@ -1590,10 +1921,8 @@ static __net_init int vxlan_init_net(struct net *net) sk = vn->sock->sk; sk_change_net(sk, net); - vxlan_addr.sin_port = htons(vxlan_port); - - rc = kernel_bind(vn->sock, (struct sockaddr *) &vxlan_addr, - sizeof(vxlan_addr)); + rc = kernel_bind(vn->sock, (struct sockaddr *)&vxlan_addr, + sizeof(struct sockaddr_in)); if (rc < 0) { pr_debug("bind for UDP socket %pI4:%u (%d)\n", &vxlan_addr.sin_addr, ntohs(vxlan_addr.sin_port), rc); @@ -1604,11 +1933,94 @@ static __net_init int vxlan_init_net(struct net *net) /* Disable multicast loopback */ inet_sk(sk)->mc_loop = 0; + /* Mark socket as an encapsulation socket. */ + udp_sk(sk)->encap_type = 1; + udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv; + return 0; +} + +/* Create UDP socket for encapsulation receive. AF_INET6 socket + * could be used for both IPv4 and IPv6 communications, but + * users may set bindv6only=1. + */ +#if IS_ENABLED(CONFIG_IPV6) +static __net_init int create_v6_sock(struct net *net) +{ + struct sock *sk; + struct vxlan_net *vn = net_generic(net, vxlan_net_id); + struct sockaddr_in6 vxlan_addr = { + .sin6_family = AF_INET6, + .sin6_port = htons(vxlan_port), + }; + int rc, val = 1; + + rc = sock_create_kern(AF_INET6, SOCK_DGRAM, IPPROTO_UDP, &vn->sock6); + if (rc < 0) + return rc; + + /* Put in proper namespace */ + sk = vn->sock6->sk; + sk_change_net(sk, net); + + kernel_setsockopt(vn->sock6, SOL_IPV6, IPV6_V6ONLY, + (char *)&val, sizeof(val)); + rc = kernel_bind(vn->sock6, (struct sockaddr *)&vxlan_addr, + sizeof(struct sockaddr_in6)); + if (rc < 0) { + pr_debug("bind for UDP socket %pI6:%u (%d)\n", + &vxlan_addr.sin6_addr, ntohs(vxlan_addr.sin6_port), rc); + sk_release_kernel(sk); + vn->sock6 = NULL; + return rc; + } + + /* At this point, IPv6 module should have been loaded in + * sock_create_kern(). + */ + BUG_ON(!ipv6_stub); + + /* Disable multicast loopback */ + inet_sk(sk)->mc_loop = 0; /* Mark socket as an encapsulation socket. */ udp_sk(sk)->encap_type = 1; udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv; + + return 0; +} + +static __net_init int create_sock(struct net *net) +{ + int rc; + rc = create_v6_sock(net); + if (rc < 0) { + pr_info("UDP IPv6 socket create failed, disable IPv6\n"); + ipv6_disabled = true; + } + + return create_v4_sock(net); +} +#else +static __net_init int create_sock(struct net *net) +{ + return create_v4_sock(net); +} +#endif + +static __net_init int vxlan_init_net(struct net *net) +{ + struct vxlan_net *vn = net_generic(net, vxlan_net_id); + int rc; + unsigned h; + + rc = create_sock(net); + if (rc < 0) + return rc; + udp_encap_enable(); +#if IS_ENABLED(CONFIG_IPV6) + ipv6_stub->udpv6_encap_enable(); +#endif for (h = 0; h < VNI_HASH_SIZE; ++h) INIT_HLIST_HEAD(&vn->vni_list[h]); @@ -1632,6 +2044,12 @@ static __net_exit void vxlan_exit_net(struct net *net) sk_release_kernel(vn->sock->sk); vn->sock = NULL; } +#if IS_ENABLED(CONFIG_IPV6) + if (vn->sock6) { + sk_release_kernel(vn->sock6->sk); + vn->sock6 = NULL; + } +#endif } static struct pernet_operations vxlan_net_ops = { diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index b05823c..f7bed18 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -311,6 +311,8 @@ enum { IFLA_VXLAN_L2MISS, IFLA_VXLAN_L3MISS, IFLA_VXLAN_PORT, /* destination port */ + IFLA_VXLAN_GROUP6, + IFLA_VXLAN_LOCAL6, __IFLA_VXLAN_MAX }; #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1) -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (5 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 06/11] vxlan: add ipv6 support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 13:07 ` Sergei Shtylyov 2013-05-17 0:21 ` [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support Cong Wang ` (3 subsequent siblings) 10 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David Miller, Cong Wang From: Cong Wang <amwang@redhat.com> When disable_ipv6 is set, we should not allow IPv6 vxlan device created on top of it. Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 46c59a6..1ee79e0 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1681,6 +1681,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, struct vxlan_rdst *dst = &vxlan->default_dst; __u32 vni; int err; + bool use_ipv6 = false; if (!data[IFLA_VXLAN_ID]) return -EINVAL; @@ -1703,6 +1704,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, nla_memcpy(&dst->remote_ip.sin6.sin6_addr, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr)); dst->remote_ip.sa.sa_family = AF_INET6; + use_ipv6 = true; #else return -EPFNOSUPPORT; #endif @@ -1719,6 +1721,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, nla_memcpy(&vxlan->saddr.sin6.sin6_addr, data[IFLA_VXLAN_LOCAL6], sizeof(struct in6_addr)); vxlan->saddr.sa.sa_family = AF_INET6; + use_ipv6 = true; #else return -EPFNOSUPPORT; #endif @@ -1734,6 +1737,17 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, return -ENODEV; } +#if IS_ENABLED(CONFIG_IPV6) + if (use_ipv6) { + struct inet6_dev *idev = in6_dev_get(lowerdev); + if (idev && idev->cnf.disable_ipv6) { + pr_info("IPv6 is disabled via sysctl\n"); + return -EPERM; + } + } +#else + BUG_ON(use_ipv6); +#endif if (!tb[IFLA_MTU]) dev->mtu = lowerdev->mtu - VXLAN_HEADROOM; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang @ 2013-05-17 13:07 ` Sergei Shtylyov 0 siblings, 0 replies; 23+ messages in thread From: Sergei Shtylyov @ 2013-05-17 13:07 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, David Miller On 17-05-2013 4:21, Cong Wang wrote: > From: Cong Wang <amwang@redhat.com> > When disable_ipv6 is set, we should not allow IPv6 vxlan > device created on top of it. > Cc: David Miller <davem@davemloft.net> > Signed-off-by: Cong Wang <amwang@redhat.com> > --- > drivers/net/vxlan.c | 14 ++++++++++++++ > 1 files changed, 14 insertions(+), 0 deletions(-) > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index 46c59a6..1ee79e0 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c [...] > @@ -1734,6 +1737,17 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, > return -ENODEV; > } > > +#if IS_ENABLED(CONFIG_IPV6) Why not: if (IS_ENABLED(CONFIG_IPV6)) #if's in the function body are frowned upon. > + if (use_ipv6) { > + struct inet6_dev *idev = in6_dev_get(lowerdev); Empty line wouldn't hurt here, after declaration... > + if (idev && idev->cnf.disable_ipv6) { > + pr_info("IPv6 is disabled via sysctl\n"); > + return -EPERM; > + } > + } > +#else > + BUG_ON(use_ipv6); > +#endif WBR, Sergei ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (6 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 09/11] vxlan: add ipv6 proxy support Cong Wang ` (2 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, David Stevens, Cong Wang From: Cong Wang <amwang@redhat.com> route short circuit only has IPv4 part, this patch adds the IPv6 part. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 28 ++++++++++++++++++++++++++-- 1 files changed, 26 insertions(+), 2 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 1ee79e0..04fd499 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -995,7 +995,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) { struct vxlan_dev *vxlan = netdev_priv(dev); struct neighbour *n; - struct iphdr *pip; if (is_multicast_ether_addr(eth_hdr(skb)->h_dest)) return false; @@ -1003,6 +1002,9 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) n = NULL; switch (ntohs(eth_hdr(skb)->h_proto)) { case ETH_P_IP: + { + struct iphdr *pip; + if (!pskb_may_pull(skb, sizeof(struct iphdr))) return false; pip = ip_hdr(skb); @@ -1016,6 +1018,27 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) } break; + } +#if IS_ENABLED(CONFIG_IPV6) + case ETH_P_IPV6: + { + struct ipv6hdr *pip6; + + if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) + return false; + pip6 = ipv6_hdr(skb); + n = neigh_lookup(&nd_tbl, &pip6->daddr, dev); + if (!n && vxlan->flags & VXLAN_F_L3MISS) { + union vxlan_addr ipa; + ipa.sin6.sin6_addr = pip6->daddr; + ipa.sa.sa_family = AF_INET6; + vxlan_ip_miss(dev, &ipa); + return false; + } + + break; + } +#endif default: return false; } @@ -1394,7 +1417,8 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) did_rsc = false; if (f && (f->flags & NTF_ROUTER) && (vxlan->flags & VXLAN_F_RSC) && - ntohs(eth->h_proto) == ETH_P_IP) { + (ntohs(eth->h_proto) == ETH_P_IP || + ntohs(eth->h_proto) == ETH_P_IPV6)) { did_rsc = route_shortcircuit(dev, skb); if (did_rsc) f = vxlan_find_mac(vxlan, eth->h_dest); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 09/11] vxlan: add ipv6 proxy support 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (7 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, David Stevens, Cong Wang From: Cong Wang <amwang@redhat.com> This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++- include/net/addrconf.h | 4 ++ include/net/ndisc.h | 5 +++ net/ipv6/af_inet6.c | 1 + net/ipv6/ndisc.c | 8 ++-- 5 files changed, 93 insertions(+), 6 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 04fd499..f4d46bf 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -991,6 +991,76 @@ out: return NETDEV_TX_OK; } +#if IS_ENABLED(CONFIG_IPV6) +static int neigh_reduce(struct net_device *dev, struct sk_buff *skb) +{ + struct vxlan_dev *vxlan = netdev_priv(dev); + struct neighbour *n; + union vxlan_addr ipa; + const struct ipv6hdr *iphdr; + const struct in6_addr *saddr, *daddr; + struct nd_msg *msg; + struct inet6_dev *in6_dev = NULL; + + in6_dev = in6_dev_get(dev); + if (!in6_dev) + goto consume; + + if (skb->len < sizeof(struct ipv6hdr) + sizeof(struct nd_msg) || + !pskb_may_pull(skb, skb->len)) + goto out; + + iphdr = ipv6_hdr(skb); + saddr = &iphdr->saddr; + daddr = &iphdr->daddr; + + if (iphdr->nexthdr != IPPROTO_ICMPV6) + goto out; + + if (ipv6_addr_loopback(daddr) || + ipv6_addr_is_multicast(daddr)) + goto out; + + msg = (struct nd_msg *)skb_transport_header(skb); + if (msg->icmph.icmp6_code != 0 || + msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) + goto out; + + n = neigh_lookup(&nd_tbl, daddr, dev); + + if (n) { + struct vxlan_fdb *f; + + if (!(n->nud_state & NUD_CONNECTED)) { + neigh_release(n); + goto out; + } + + f = vxlan_find_mac(vxlan, n->ha); + if (f && vxlan_addr_any(&f->remote.remote_ip)) { + /* bridge-local neighbor */ + neigh_release(n); + goto out; + } + + ipv6_stub->ndisc_send_na(dev, n, saddr, &msg->target, + !!in6_dev->cnf.forwarding, + true, false, false); + neigh_release(n); + } else if (vxlan->flags & VXLAN_F_L3MISS) { + ipa.sin6.sin6_addr = *daddr; + ipa.sa.sa_family = AF_INET6; + vxlan_ip_miss(dev, &ipa); + } + +out: + in6_dev_put(in6_dev); +consume: + consume_skb(skb); + return NETDEV_TX_OK; +} +#endif + static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) { struct vxlan_dev *vxlan = netdev_priv(dev); @@ -1410,8 +1480,15 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) skb_reset_mac_header(skb); eth = eth_hdr(skb); - if ((vxlan->flags & VXLAN_F_PROXY) && ntohs(eth->h_proto) == ETH_P_ARP) - return arp_reduce(dev, skb); + if ((vxlan->flags & VXLAN_F_PROXY)) { + if (ntohs(eth->h_proto) == ETH_P_ARP) + return arp_reduce(dev, skb); +#if IS_ENABLED(CONFIG_IPV6) + else if (ntohs(eth->h_proto) == ETH_P_IPV6 && + ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) + return neigh_reduce(dev, skb); +#endif + } f = vxlan_find_mac(vxlan, eth->h_dest); did_rsc = false; diff --git a/include/net/addrconf.h b/include/net/addrconf.h index d09d42c..34bccff 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -151,6 +151,10 @@ struct ipv6_stub { int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst, struct flowi6 *fl6); void (*udpv6_encap_enable)(void); + void (*ndisc_send_na)(struct net_device *dev, struct neighbour *neigh, + const struct in6_addr *daddr, + const struct in6_addr *solicited_addr, + bool router, bool solicited, bool override, bool inc_opt); }; extern const struct ipv6_stub *ipv6_stub __read_mostly; diff --git a/include/net/ndisc.h b/include/net/ndisc.h index 745bf74..ec2da56 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -204,6 +204,11 @@ extern void ndisc_send_ns(struct net_device *dev, extern void ndisc_send_rs(struct net_device *dev, const struct in6_addr *saddr, const struct in6_addr *daddr); +extern void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, + const struct in6_addr *daddr, + const struct in6_addr *solicited_addr, + bool router, bool solicited, bool override, + bool inc_opt); extern void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 58de055..d80fe10 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -813,6 +813,7 @@ static const struct ipv6_stub ipv6_stub_impl = { .ipv6_sock_mc_drop = ipv6_sock_mc_drop, .ipv6_dst_lookup = ip6_dst_lookup, .udpv6_encap_enable = udpv6_encap_enable, + .ndisc_send_na = ndisc_send_na, }; static int __init inet6_init(void) diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 2712ab2..a17e4d0 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -462,10 +462,10 @@ static void ndisc_send_skb(struct sk_buff *skb, rcu_read_unlock(); } -static void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, - const struct in6_addr *daddr, - const struct in6_addr *solicited_addr, - bool router, bool solicited, bool override, bool inc_opt) +void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, + const struct in6_addr *daddr, + const struct in6_addr *solicited_addr, + bool router, bool solicited, bool override, bool inc_opt) { struct sk_buff *skb; struct in6_addr tmpaddr; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (8 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 09/11] vxlan: add ipv6 proxy support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David Stevens, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> As pointed out by David, we should take care of scope id for ll addr, and use it for route lookup. Cc: David Stevens <dlstevens@us.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index f4d46bf..68ebfa4 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1237,7 +1237,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct sock *sk = vn->sock6->sk; struct ipv6hdr *ip6h; #endif - const union vxlan_addr *dst; + union vxlan_addr *dst; struct dst_entry *ndst = NULL; __be16 src_port = 0, dst_port; u32 vni; @@ -1332,6 +1332,12 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, src_port = vxlan_src_port(vxlan, skb); + if (ipv6_addr_type(&dst->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { + dst->sin6.sin6_scope_id = ipv6_iface_scope_id(&dst->sin6.sin6_addr, + rdst->remote_ifindex); + rdst->remote_ifindex = dst->sin6.sin6_scope_id; + } + memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_oif = rdst->remote_ifindex; fl6.flowi6_tos = RT_TOS(tos); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (9 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr Cong Wang @ 2013-05-17 0:21 ` Cong Wang 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev Cc: Jesse Gross, Pravin B Shelar, Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a (tunneling: Add generic Tunnel segmentation) This patch adds generic tunneling offloading support for IPv6-UDP based tunnels. This can be used by tunneling protocols like VXLAN. Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- net/ipv6/ip6_offload.c | 4 +- net/ipv6/udp_offload.c | 153 +++++++++++++++++++++++++++++++++--------------- 2 files changed, 108 insertions(+), 49 deletions(-) diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 71b766e..87fbf2e 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -91,6 +91,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, unsigned int unfrag_ip6hlen; u8 *prevhdr; int offset = 0; + bool tunnel; if (unlikely(skb_shinfo(skb)->gso_type & ~(SKB_GSO_UDP | @@ -105,6 +106,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h)))) goto out; + tunnel = skb->encapsulation; ipv6h = ipv6_hdr(skb); __skb_pull(skb, sizeof(*ipv6h)); segs = ERR_PTR(-EPROTONOSUPPORT); @@ -125,7 +127,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, ipv6h = ipv6_hdr(skb); ipv6h->payload_len = htons(skb->len - skb->mac_len - sizeof(*ipv6h)); - if (proto == IPPROTO_UDP) { + if (!tunnel && proto == IPPROTO_UDP) { unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr); fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen); diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 3bb3a89..2c3fa3b 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -21,26 +21,79 @@ static int udp6_ufo_send_check(struct sk_buff *skb) const struct ipv6hdr *ipv6h; struct udphdr *uh; - /* UDP Tunnel offload on ipv6 is not yet supported. */ - if (skb->encapsulation) - return -EINVAL; - if (!pskb_may_pull(skb, sizeof(*uh))) return -EINVAL; - ipv6h = ipv6_hdr(skb); - uh = udp_hdr(skb); + if (likely(!skb->encapsulation)) { + ipv6h = ipv6_hdr(skb); + uh = udp_hdr(skb); + + uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len, + IPPROTO_UDP, 0); + skb->csum_start = skb_transport_header(skb) - skb->head; + skb->csum_offset = offsetof(struct udphdr, check); + skb->ip_summed = CHECKSUM_PARTIAL; + } - uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len, - IPPROTO_UDP, 0); - skb->csum_start = skb_transport_header(skb) - skb->head; - skb->csum_offset = offsetof(struct udphdr, check); - skb->ip_summed = CHECKSUM_PARTIAL; return 0; } +static struct sk_buff *skb_udp6_tunnel_segment(struct sk_buff *skb, + netdev_features_t features) +{ + struct sk_buff *segs = ERR_PTR(-EINVAL); + int mac_len = skb->mac_len; + int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb); + int outer_hlen; + netdev_features_t enc_features; + + if (unlikely(!pskb_may_pull(skb, tnl_hlen))) + goto out; + + skb->encapsulation = 0; + __skb_pull(skb, tnl_hlen); + skb_reset_mac_header(skb); + skb_set_network_header(skb, skb_inner_network_offset(skb)); + skb->mac_len = skb_inner_network_offset(skb); + + /* segment inner packet. */ + enc_features = skb->dev->hw_enc_features & netif_skb_features(skb); + segs = skb_mac_gso_segment(skb, enc_features); + if (!segs || IS_ERR(segs)) + goto out; + + outer_hlen = skb_tnl_header_len(skb); + skb = segs; + do { + struct udphdr *uh; + struct ipv6hdr *ipv6h; + int udp_offset = outer_hlen - tnl_hlen; + u32 len; + + skb->mac_len = mac_len; + + skb_push(skb, outer_hlen); + skb_reset_mac_header(skb); + skb_set_network_header(skb, mac_len); + skb_set_transport_header(skb, udp_offset); + uh = udp_hdr(skb); + uh->len = htons(skb->len - udp_offset); + ipv6h = ipv6_hdr(skb); + len = skb->len - udp_offset; + + uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, + len, IPPROTO_UDP, 0); + uh->check = csum_fold(skb_checksum(skb, udp_offset, len, 0)); + if (uh->check == 0) + uh->check = CSUM_MANGLED_0; + skb->ip_summed = CHECKSUM_NONE; + } while ((skb = skb->next)); +out: + return segs; +} + static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, - netdev_features_t features) + netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EINVAL); unsigned int mss; @@ -73,43 +126,47 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, goto out; } - /* Do software UFO. Complete and fill in the UDP checksum as HW cannot - * do checksum of UDP packets sent as multiple IP fragments. - */ - offset = skb_checksum_start_offset(skb); - csum = skb_checksum(skb, offset, skb->len - offset, 0); - offset += skb->csum_offset; - *(__sum16 *)(skb->data + offset) = csum_fold(csum); - skb->ip_summed = CHECKSUM_NONE; - - /* Check if there is enough headroom to insert fragment header. */ - if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) && - pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC)) - goto out; + if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) + segs = skb_udp6_tunnel_segment(skb, features); + else { + /* Do software UFO. Complete and fill in the UDP checksum as HW cannot + * do checksum of UDP packets sent as multiple IP fragments. + */ + offset = skb_checksum_start_offset(skb); + csum = skb_checksum(skb, offset, skb->len - offset, 0); + offset += skb->csum_offset; + *(__sum16 *)(skb->data + offset) = csum_fold(csum); + skb->ip_summed = CHECKSUM_NONE; + + /* Check if there is enough headroom to insert fragment header. */ + if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) && + pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC)) + goto out; - /* Find the unfragmentable header and shift it left by frag_hdr_sz - * bytes to insert fragment header. - */ - unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr); - nexthdr = *prevhdr; - *prevhdr = NEXTHDR_FRAGMENT; - unfrag_len = skb_network_header(skb) - skb_mac_header(skb) + - unfrag_ip6hlen; - mac_start = skb_mac_header(skb); - memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len); - - skb->mac_header -= frag_hdr_sz; - skb->network_header -= frag_hdr_sz; - - fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen); - fptr->nexthdr = nexthdr; - fptr->reserved = 0; - ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb)); - - /* Fragment the skb. ipv6 header and the remaining fields of the - * fragment header are updated in ipv6_gso_segment() - */ - segs = skb_segment(skb, features); + /* Find the unfragmentable header and shift it left by frag_hdr_sz + * bytes to insert fragment header. + */ + unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr); + nexthdr = *prevhdr; + *prevhdr = NEXTHDR_FRAGMENT; + unfrag_len = skb_network_header(skb) - skb_mac_header(skb) + + unfrag_ip6hlen; + mac_start = skb_mac_header(skb); + memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len); + + skb->mac_header -= frag_hdr_sz; + skb->network_header -= frag_hdr_sz; + + fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen); + fptr->nexthdr = nexthdr; + fptr->reserved = 0; + ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb)); + + /* Fragment the skb. ipv6 header and the remaining fields of the + * fragment header are updated in ipv6_gso_segment() + */ + segs = skb_segment(skb, features); + } out: return segs; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-05-24 5:15 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang 2013-05-17 13:02 ` Sergei Shtylyov 2013-05-17 21:13 ` David Miller 2013-05-22 4:54 ` Cong Wang 2013-05-22 7:14 ` David Miller 2013-05-22 10:28 ` Cong Wang 2013-05-22 15:50 ` Mike Rapoport 2013-05-22 16:03 ` Cong Wang 2013-05-22 16:10 ` Mike Rapoport 2013-05-24 5:10 ` Cong Wang 2013-05-24 5:15 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 06/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang 2013-05-17 13:07 ` Sergei Shtylyov 2013-05-17 0:21 ` [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 09/11] vxlan: add ipv6 proxy support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.