* [Patch net-next v8 00/11] vxlan: add ipv6 support @ 2013-05-17 0:21 Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang ` (10 more replies) 0 siblings, 11 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev From: Cong Wang <amwang@redhat.com> v8: fix the bug when bindv6only=1 fix more compile errors when IPV6=m complete the rest missing features for IPv6 v7: respect disable_ipv6 flag back to ipv4 only when ipv6 is not supported v6: use a stub for IPv6 mcast functions split a few more long lines rebased on the latest net-next v5: make David happy on the names of the fields fix my mistake during rebasing the patches drop the scope_id patch, because it is broken export in6addr_loopback fix a udp checksum bug rebased on the latest net-next v4: rename ->sin to ->va_sin rename ->sin6 to ->va_sin6 rename ->family to ->va_sa support ll addr fix more ugly #ifdef rebased on the latest net-next v3: fix many coding style issues fix some ugly #ifdef rename vxlan_ip to vxlan_addr rename ->proto to ->family rename ->ip4/->ip6 to ->sin/->sin6 v2: fix some compile error when !CONFIG_IPV6 improve some code based on Stephen's comments use sockaddr suggested by David Cong Wang (11): vxlan: defer vxlan init as late as possible ipv6: make ip6_dst_hoplimit() static inline ipv6: move ip6_local_out into core kernel ipv6: export a stub for IPv6 symbols used by vxlan ipv6: export in6addr_loopback to modules vxlan: add ipv6 support vxlan: respect disable_ipv6 sysctl vxlan: add ipv6 route short circuit support vxlan: add ipv6 proxy support vxlan: respect scope_id for ll addr ipv6: Add generic UDP Tunnel segmentation drivers/net/vxlan.c | 829 ++++++++++++++++++++++++++++++++++-------- include/net/addrconf.h | 18 + include/net/ip6_route.h | 23 +- include/net/ndisc.h | 5 + include/uapi/linux/if_link.h | 2 + net/ipv6/addrconf.c | 9 - net/ipv6/addrconf_core.c | 13 + net/ipv6/af_inet6.c | 12 + net/ipv6/ip6_offload.c | 4 +- net/ipv6/ip6_output.c | 25 -- net/ipv6/ndisc.c | 8 +- net/ipv6/output_core.c | 26 ++ net/ipv6/route.c | 19 - net/ipv6/udp_offload.c | 153 ++++++--- 14 files changed, 893 insertions(+), 253 deletions(-) -- 1.7.7.6 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang ` (9 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> When vxlan is compiled as builtin, its init code runs before IPv6 init, this could cause problems if we create IPv6 socket in the latter patch. Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index ba81f3c..c1258c6 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1662,7 +1662,7 @@ out2: out1: return rc; } -module_init(vxlan_init_module); +late_initcall(vxlan_init_module); static void __exit vxlan_cleanup_module(void) { -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 13:02 ` Sergei Shtylyov 2013-05-17 21:13 ` David Miller 2013-05-17 0:21 ` [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel Cong Wang ` (8 subsequent siblings) 10 siblings, 2 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> It will be used by vxlan module, so move it from ipv6 module to core kernel. I think it is small enough to be inlined. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- include/net/ip6_route.h | 23 +++++++++++++++++++++-- net/ipv6/route.c | 19 ------------------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h index 260f83f..7e9192e 100644 --- a/include/net/ip6_route.h +++ b/include/net/ip6_route.h @@ -21,6 +21,7 @@ struct route_info { #include <net/flow.h> #include <net/ip6_fib.h> #include <net/sock.h> +#include <net/addrconf.h> #include <linux/ip.h> #include <linux/ipv6.h> #include <linux/route.h> @@ -112,8 +113,6 @@ extern struct rt6_info *addrconf_dst_alloc(struct inet6_dev *idev, const struct in6_addr *addr, bool anycast); -extern int ip6_dst_hoplimit(struct dst_entry *dst); - /* * support functions for ND * @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr return dest; } +#if IS_ENABLED(CONFIG_IPV6) +static inline int ip6_dst_hoplimit(struct dst_entry *dst) +{ + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); + if (hoplimit == 0) { + struct net_device *dev = dst->dev; + struct inet6_dev *idev; + + rcu_read_lock(); + idev = __in6_dev_get(dev); + if (idev) + hoplimit = idev->cnf.hop_limit; + else + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; + rcu_read_unlock(); + } + return hoplimit; +} +#endif + #endif diff --git a/net/ipv6/route.c b/net/ipv6/route.c index ad0aa6b..0d9c531 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1310,25 +1310,6 @@ out: return entries > rt_max_size; } -int ip6_dst_hoplimit(struct dst_entry *dst) -{ - int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); - if (hoplimit == 0) { - struct net_device *dev = dst->dev; - struct inet6_dev *idev; - - rcu_read_lock(); - idev = __in6_dev_get(dev); - if (idev) - hoplimit = idev->cnf.hop_limit; - else - hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; - rcu_read_unlock(); - } - return hoplimit; -} -EXPORT_SYMBOL(ip6_dst_hoplimit); - /* * */ -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang @ 2013-05-17 13:02 ` Sergei Shtylyov 2013-05-17 21:13 ` David Miller 1 sibling, 0 replies; 23+ messages in thread From: Sergei Shtylyov @ 2013-05-17 13:02 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, David S. Miller Hello. On 17-05-2013 4:21, Cong Wang wrote: > From: Cong Wang <amwang@redhat.com> > It will be used by vxlan module, so move it from ipv6 module > to core kernel. I think it is small enough to be inlined. > Cc: David S. Miller <davem@davemloft.net> > Signed-off-by: Cong Wang <amwang@redhat.com> > --- > include/net/ip6_route.h | 23 +++++++++++++++++++++-- > net/ipv6/route.c | 19 ------------------- > 2 files changed, 21 insertions(+), 21 deletions(-) > diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h > index 260f83f..7e9192e 100644 > --- a/include/net/ip6_route.h > +++ b/include/net/ip6_route.h [...] > @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr > return dest; > } > > +#if IS_ENABLED(CONFIG_IPV6) > +static inline int ip6_dst_hoplimit(struct dst_entry *dst) > +{ > + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); Empty line wouldn't hurt here, after the declaration, like below... > + if (hoplimit == 0) { > + struct net_device *dev = dst->dev; > + struct inet6_dev *idev; > + > + rcu_read_lock(); > + idev = __in6_dev_get(dev); > + if (idev) > + hoplimit = idev->cnf.hop_limit; > + else > + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; > + rcu_read_unlock(); > + } > + return hoplimit; > +} > +#endif > + WBR, Sergei ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang 2013-05-17 13:02 ` Sergei Shtylyov @ 2013-05-17 21:13 ` David Miller 2013-05-22 4:54 ` Cong Wang 1 sibling, 1 reply; 23+ messages in thread From: David Miller @ 2013-05-17 21:13 UTC (permalink / raw) To: amwang; +Cc: netdev From: Cong Wang <amwang@redhat.com> Date: Fri, 17 May 2013 08:21:30 +0800 > @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr > return dest; > } > > +#if IS_ENABLED(CONFIG_IPV6) > +static inline int ip6_dst_hoplimit(struct dst_entry *dst) > +{ > + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); > + if (hoplimit == 0) { > + struct net_device *dev = dst->dev; > + struct inet6_dev *idev; > + > + rcu_read_lock(); > + idev = __in6_dev_get(dev); > + if (idev) > + hoplimit = idev->cnf.hop_limit; > + else > + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; > + rcu_read_unlock(); > + } > + return hoplimit; > +} > +#endif Create a dummy stub version in an #else branch here, so that you have to ifdef less in vxlan.c In fact I think you can avoid nearly every ifdef in vxlan.c if you apply this technique throughout your changes. Please do that and resubmit this series. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-17 21:13 ` David Miller @ 2013-05-22 4:54 ` Cong Wang 2013-05-22 7:14 ` David Miller 0 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-22 4:54 UTC (permalink / raw) To: David Miller; +Cc: netdev On Fri, 2013-05-17 at 14:13 -0700, David Miller wrote: > From: Cong Wang <amwang@redhat.com> > Date: Fri, 17 May 2013 08:21:30 +0800 > > > @@ -201,4 +200,24 @@ static inline struct in6_addr *rt6_nexthop(struct rt6_info *rt, struct in6_addr > > return dest; > > } > > > > +#if IS_ENABLED(CONFIG_IPV6) > > +static inline int ip6_dst_hoplimit(struct dst_entry *dst) > > +{ > > + int hoplimit = dst_metric_raw(dst, RTAX_HOPLIMIT); > > + if (hoplimit == 0) { > > + struct net_device *dev = dst->dev; > > + struct inet6_dev *idev; > > + > > + rcu_read_lock(); > > + idev = __in6_dev_get(dev); > > + if (idev) > > + hoplimit = idev->cnf.hop_limit; > > + else > > + hoplimit = dev_net(dev)->ipv6.devconf_all->hop_limit; > > + rcu_read_unlock(); > > + } > > + return hoplimit; > > +} > > +#endif > > Create a dummy stub version in an #else branch here, so that you have > to ifdef less in vxlan.c The reason why we need #if IS_ENABLED(CONFIG_IPV6) here is that dev_net(dev)->ipv6 is defined only in such case, not just for its caller in vxlan. Nor I think anyone will seriously call ip6_dst_hoplimit() for !CONFIG_IPV6 case, since its name is obvious. > > In fact I think you can avoid nearly every ifdef in vxlan.c if you apply > this technique throughout your changes. > > Please do that and resubmit this series. > Actually that is exactly what I _did_ in v1 or RFC, IIRC, it is David Stevens who prefers to use #ifdef inside these functions, so I changed it based on his suggestion. I myself don't have any strong opinion here, either is okay, I just don't like changing it again and again. :) Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 4:54 ` Cong Wang @ 2013-05-22 7:14 ` David Miller 2013-05-22 10:28 ` Cong Wang 0 siblings, 1 reply; 23+ messages in thread From: David Miller @ 2013-05-22 7:14 UTC (permalink / raw) To: amwang; +Cc: netdev From: Cong Wang <amwang@redhat.com> Date: Wed, 22 May 2013 12:54:13 +0800 > Actually that is exactly what I _did_ in v1 or RFC, IIRC, it is David > Stevens who prefers to use #ifdef inside these functions, so I changed > it based on his suggestion. > > I myself don't have any strong opinion here, either is okay, I just > don't like changing it again and again. :) The driver looks like complete shit with all the ifdefs in there, this isn't the BSD kernel. I do not want to seem them there at all. You can abstract everything behind helper functions in a header file, keep the mess there. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 7:14 ` David Miller @ 2013-05-22 10:28 ` Cong Wang 2013-05-22 15:50 ` Mike Rapoport 0 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-22 10:28 UTC (permalink / raw) To: David Miller; +Cc: netdev On Wed, 2013-05-22 at 00:14 -0700, David Miller wrote: > From: Cong Wang <amwang@redhat.com> > Date: Wed, 22 May 2013 12:54:13 +0800 > > > Actually that is exactly what I _did_ in v1 or RFC, IIRC, it is David > > Stevens who prefers to use #ifdef inside these functions, so I changed > > it based on his suggestion. > > > > I myself don't have any strong opinion here, either is okay, I just > > don't like changing it again and again. :) > > The driver looks like complete shit with all the ifdefs in there, > this isn't the BSD kernel. > > I do not want to seem them there at all. > > You can abstract everything behind helper functions in a header > file, keep the mess there. Alright, I will change all such functions in vxlan.c back to what you are suggesting. For example, change static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { #if IS_ENABLED(CONFIG_IPV6) if (a->sa.sa_family != b->sa.sa_family) return false; if (a->sa.sa_family == AF_INET6) return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); else #endif return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } to #if IS_ENABLED(CONFIG_IPV6) static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { if (a->sa.sa_family != b->sa.sa_family) return false; if (a->sa.sa_family == AF_INET6) return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); else return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } #else static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } #endif just in case I misunderstand you. Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 10:28 ` Cong Wang @ 2013-05-22 15:50 ` Mike Rapoport 2013-05-22 16:03 ` Cong Wang 0 siblings, 1 reply; 23+ messages in thread From: Mike Rapoport @ 2013-05-22 15:50 UTC (permalink / raw) To: Cong Wang; +Cc: David Miller, netdev On Wed, May 22, 2013 at 1:28 PM, Cong Wang <amwang@redhat.com> wrote: > On Wed, 2013-05-22 at 00:14 -0700, David Miller wrote: >> From: Cong Wang <amwang@redhat.com> >> Date: Wed, 22 May 2013 12:54:13 +0800 >> > > Alright, I will change all such functions in vxlan.c back to what you > are suggesting. > > For example, change > > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr > *b) > { > #if IS_ENABLED(CONFIG_IPV6) > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, > &b->sin6.sin6_addr); > else > #endif > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } > > to > > #if IS_ENABLED(CONFIG_IPV6) > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr > *b) > { > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, > &b->sin6.sin6_addr); > else > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } > #else > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr > *b) > { > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } > #endif I think you can just drop #ifdefs in 90% of the cases rather than create two versions of code for IPv4 and IPv6.... > just in case I misunderstand you. > > Thanks. > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Sincerely yours, Mike. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 15:50 ` Mike Rapoport @ 2013-05-22 16:03 ` Cong Wang 2013-05-22 16:10 ` Mike Rapoport 0 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-22 16:03 UTC (permalink / raw) To: Mike Rapoport; +Cc: David Miller, netdev ----- Original Message ----- > > I think you can just drop #ifdefs in 90% of the cases rather than > create two versions of code for IPv4 and IPv6.... > I know we can use memcmp(), but comparing 16+ bytes even for IPv4 is not a good idea, also we have to zalloc() every instance of union vxlan_addr. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 16:03 ` Cong Wang @ 2013-05-22 16:10 ` Mike Rapoport 2013-05-24 5:10 ` Cong Wang 2013-05-24 5:15 ` Cong Wang 0 siblings, 2 replies; 23+ messages in thread From: Mike Rapoport @ 2013-05-22 16:10 UTC (permalink / raw) To: Cong Wang; +Cc: David Miller, netdev On Wed, May 22, 2013 at 12:03:23PM -0400, Cong Wang wrote: > > > ----- Original Message ----- > > > > I think you can just drop #ifdefs in 90% of the cases rather than > > create two versions of code for IPv4 and IPv6.... > > > > I know we can use memcmp(), but comparing 16+ bytes even for IPv4 is not > a good idea, also we have to zalloc() every instance of union vxlan_addr. I've lost you here... Why not just: static inline bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) { if (a->sa.sa_family != b->sa.sa_family) return false; if (a->sa.sa_family == AF_INET6) return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); else return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; } -- Sincrely yours, Mike. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 16:10 ` Mike Rapoport @ 2013-05-24 5:10 ` Cong Wang 2013-05-24 5:15 ` Cong Wang 1 sibling, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-24 5:10 UTC (permalink / raw) To: Mike Rapoport; +Cc: David Miller, netdev On Wed, 2013-05-22 at 19:10 +0300, Mike Rapoport wrote: > I've lost you here... Why not just: > > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) > { > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); > else > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } I see your point now, but for !CONFIG_IPV6, the first two 'if' is obviously useless. Is GCC smart enough to know ->sa.sa_family == AF_INET4 is always true in such case? I doubt... ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline 2013-05-22 16:10 ` Mike Rapoport 2013-05-24 5:10 ` Cong Wang @ 2013-05-24 5:15 ` Cong Wang 1 sibling, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-24 5:15 UTC (permalink / raw) To: Mike Rapoport; +Cc: David Miller, netdev On Wed, 2013-05-22 at 19:10 +0300, Mike Rapoport wrote: > I've lost you here... Why not just: > > static inline > bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) > { > if (a->sa.sa_family != b->sa.sa_family) > return false; > if (a->sa.sa_family == AF_INET6) > return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); > else > return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; > } I see your point now, but for !CONFIG_IPV6, the first two 'if' is obviously useless. Is GCC smart enough to know ->sa.sa_family == AF_INET4 is always true in such case? I doubt... ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang ` (7 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> It will be used by vxlan module too. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- net/ipv6/ip6_output.c | 25 ------------------------- net/ipv6/output_core.c | 26 ++++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 25 deletions(-) diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d2eedf1..316895e 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -56,31 +56,6 @@ #include <net/checksum.h> #include <linux/mroute6.h> -int __ip6_local_out(struct sk_buff *skb) -{ - int len; - - len = skb->len - sizeof(struct ipv6hdr); - if (len > IPV6_MAXPLEN) - len = 0; - ipv6_hdr(skb)->payload_len = htons(len); - - return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, skb, NULL, - skb_dst(skb)->dev, dst_output); -} - -int ip6_local_out(struct sk_buff *skb) -{ - int err; - - err = __ip6_local_out(skb); - if (likely(err == 1)) - err = dst_output(skb); - - return err; -} -EXPORT_SYMBOL_GPL(ip6_local_out); - static int ip6_finish_output2(struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c index c2e73e6..030d03f 100644 --- a/net/ipv6/output_core.c +++ b/net/ipv6/output_core.c @@ -74,3 +74,29 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr) return offset; } EXPORT_SYMBOL(ip6_find_1stfragopt); + +int __ip6_local_out(struct sk_buff *skb) +{ + int len; + + len = skb->len - sizeof(struct ipv6hdr); + if (len > IPV6_MAXPLEN) + len = 0; + ipv6_hdr(skb)->payload_len = htons(len); + + return nf_hook(NFPROTO_IPV6, NF_INET_LOCAL_OUT, skb, NULL, + skb_dst(skb)->dev, dst_output); +} +EXPORT_SYMBOL_GPL(__ip6_local_out); + +int ip6_local_out(struct sk_buff *skb) +{ + int err; + + err = __ip6_local_out(skb); + if (likely(err == 1)) + err = dst_output(skb); + + return err; +} +EXPORT_SYMBOL_GPL(ip6_local_out); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (2 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules Cong Wang ` (6 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev Cc: Ben Hutchings, Bjørn Mork, Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> In case IPv6 is compiled as a module, introduce a stub for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used by vxlan module. This is an ugly but easy solution for now. Suggested-by: Ben Hutchings <bhutchings@solarflare.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Bjørn Mork <bjorn@mork.no> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- include/net/addrconf.h | 14 ++++++++++++++ net/ipv6/addrconf_core.c | 3 +++ net/ipv6/af_inet6.c | 11 +++++++++++ 3 files changed, 28 insertions(+), 0 deletions(-) diff --git a/include/net/addrconf.h b/include/net/addrconf.h index 84a6440..d09d42c 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -140,6 +140,20 @@ extern bool inet6_mc_check(struct sock *sk, const struct in6_addr *mc_addr, const struct in6_addr *src_addr); +/* A stub used by vxlan module. This is ugly, ideally these + * symbols should be built into the core kernel. + */ +struct ipv6_stub { + int (*ipv6_sock_mc_join)(struct sock *sk, int ifindex, + const struct in6_addr *addr); + int (*ipv6_sock_mc_drop)(struct sock *sk, int ifindex, + const struct in6_addr *addr); + int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst, + struct flowi6 *fl6); + void (*udpv6_encap_enable)(void); +}; +extern const struct ipv6_stub *ipv6_stub __read_mostly; + extern int ipv6_dev_mc_inc(struct net_device *dev, const struct in6_addr *addr); extern int __ipv6_dev_mc_dec(struct inet6_dev *idev, const struct in6_addr *addr); extern int ipv6_dev_mc_dec(struct net_device *dev, const struct in6_addr *addr); diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c index 7210456..3807a79 100644 --- a/net/ipv6/addrconf_core.c +++ b/net/ipv6/addrconf_core.c @@ -97,3 +97,6 @@ int inet6addr_notifier_call_chain(unsigned long val, void *v) return atomic_notifier_call_chain(&inet6addr_chain, val, v); } EXPORT_SYMBOL(inet6addr_notifier_call_chain); + +const struct ipv6_stub *ipv6_stub __read_mostly; +EXPORT_SYMBOL_GPL(ipv6_stub); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index ab5c7ad..58de055 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -808,6 +808,13 @@ static struct pernet_operations inet6_net_ops = { .exit = inet6_net_exit, }; +static const struct ipv6_stub ipv6_stub_impl = { + .ipv6_sock_mc_join = ipv6_sock_mc_join, + .ipv6_sock_mc_drop = ipv6_sock_mc_drop, + .ipv6_dst_lookup = ip6_dst_lookup, + .udpv6_encap_enable = udpv6_encap_enable, +}; + static int __init inet6_init(void) { struct list_head *r; @@ -879,6 +886,9 @@ static int __init inet6_init(void) err = igmp6_init(); if (err) goto igmp_fail; + + ipv6_stub = &ipv6_stub_impl; + err = ipv6_netfilter_init(); if (err) goto netfilter_fail; @@ -1027,6 +1037,7 @@ static void __exit inet6_exit(void) raw6_proc_exit(); #endif ipv6_netfilter_fini(); + ipv6_stub = NULL; igmp6_cleanup(); ndisc_cleanup(); ip6_mr_cleanup(); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (3 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 06/11] vxlan: add ipv6 support Cong Wang ` (5 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: Mike Rapoport, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> It is needed by vxlan module. Noticed by Mike. Cc: Mike Rapoport <mike.rapoport@ravellosystems.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- net/ipv6/addrconf.c | 9 --------- net/ipv6/addrconf_core.c | 10 ++++++++++ 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index d1ab6ab..650a109 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -238,15 +238,6 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .accept_dad = 1, }; -/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ -const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; -const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; -const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT; -const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT; -const struct in6_addr in6addr_interfacelocal_allnodes = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT; -const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT; -const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT; - /* Check if a valid qdisc is available */ static inline bool addrconf_qdisc_ok(const struct net_device *dev) { diff --git a/net/ipv6/addrconf_core.c b/net/ipv6/addrconf_core.c index 3807a79..cb991dd 100644 --- a/net/ipv6/addrconf_core.c +++ b/net/ipv6/addrconf_core.c @@ -100,3 +100,13 @@ EXPORT_SYMBOL(inet6addr_notifier_call_chain); const struct ipv6_stub *ipv6_stub __read_mostly; EXPORT_SYMBOL_GPL(ipv6_stub); + +/* IPv6 Wildcard Address and Loopback Address defined by RFC2553 */ +const struct in6_addr in6addr_loopback = IN6ADDR_LOOPBACK_INIT; +EXPORT_SYMBOL(in6addr_loopback); +const struct in6_addr in6addr_any = IN6ADDR_ANY_INIT; +const struct in6_addr in6addr_linklocal_allnodes = IN6ADDR_LINKLOCAL_ALLNODES_INIT; +const struct in6_addr in6addr_linklocal_allrouters = IN6ADDR_LINKLOCAL_ALLROUTERS_INIT; +const struct in6_addr in6addr_interfacelocal_allnodes = IN6ADDR_INTERFACELOCAL_ALLNODES_INIT; +const struct in6_addr in6addr_interfacelocal_allrouters = IN6ADDR_INTERFACELOCAL_ALLROUTERS_INIT; +const struct in6_addr in6addr_sitelocal_allrouters = IN6ADDR_SITELOCAL_ALLROUTERS_INIT; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 06/11] vxlan: add ipv6 support 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (4 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang ` (4 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David Stevens, Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> This patch adds IPv6 support to vxlan device, as the new version RFC already mentions it: http://tools.ietf.org/html/draft-mahalingam-dutt-dcops-vxlan-03 Cc: David Stevens <dlstevens@us.ibm.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 698 +++++++++++++++++++++++++++++++++--------- include/uapi/linux/if_link.h | 2 + 2 files changed, 560 insertions(+), 140 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index c1258c6..46c59a6 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -6,9 +6,6 @@ * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as * published by the Free Software Foundation. - * - * TODO - * - IPv6 (not in RFC) */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt @@ -41,6 +38,11 @@ #include <net/inet_ecn.h> #include <net/net_namespace.h> #include <net/netns/generic.h> +#if IS_ENABLED(CONFIG_IPV6) +#include <net/addrconf.h> +#include <net/ip6_route.h> +#include <net/ip6_tunnel.h> +#endif #define VXLAN_VERSION "0.1" @@ -55,6 +57,8 @@ #define VXLAN_VID_MASK (VXLAN_N_VID - 1) /* IP header + UDP + VXLAN + Ethernet header */ #define VXLAN_HEADROOM (20 + 8 + 8 + 14) +/* IPv6 header + UDP + VXLAN + Ethernet header */ +#define VXLAN6_HEADROOM (40 + 8 + 8 + 14) #define VXLAN_FLAGS 0x08000000 /* struct vxlanhdr.vx_flags required value. */ @@ -76,16 +80,27 @@ static bool log_ecn_error = true; module_param(log_ecn_error, bool, 0644); MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN"); +#if IS_ENABLED(CONFIG_IPV6) +static bool ipv6_disabled = false; +#endif + /* per-net private data for this module */ static unsigned int vxlan_net_id; struct vxlan_net { struct socket *sock; /* UDP encap socket */ + struct socket *sock6; struct hlist_head vni_list[VNI_HASH_SIZE]; }; +union vxlan_addr { + struct sockaddr_in sin; + struct sockaddr_in6 sin6; + struct sockaddr sa; +}; + struct vxlan_rdst { struct rcu_head rcu; - __be32 remote_ip; + union vxlan_addr remote_ip; __be16 remote_port; u32 remote_vni; u32 remote_ifindex; @@ -109,7 +124,7 @@ struct vxlan_dev { struct hlist_node hlist; struct net_device *dev; struct vxlan_rdst default_dst; /* default destination */ - __be32 saddr; /* source address */ + union vxlan_addr saddr; /* source address */ __be16 dst_port; __u16 port_min; /* source port range */ __u16 port_max; @@ -132,6 +147,69 @@ struct vxlan_dev { #define VXLAN_F_L2MISS 0x08 #define VXLAN_F_L3MISS 0x10 +static inline +bool vxlan_addr_equal(const union vxlan_addr *a, const union vxlan_addr *b) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (a->sa.sa_family != b->sa.sa_family) + return false; + if (a->sa.sa_family == AF_INET6) + return ipv6_addr_equal(&a->sin6.sin6_addr, &b->sin6.sin6_addr); + else +#endif + return a->sin.sin_addr.s_addr == b->sin.sin_addr.s_addr; +} + +static inline bool vxlan_addr_any(const union vxlan_addr *ipa) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (ipa->sa.sa_family == AF_INET6) + return ipv6_addr_any(&ipa->sin6.sin6_addr); + else +#endif + return ipa->sin.sin_addr.s_addr == htonl(INADDR_ANY); +} + +static inline bool vxlan_addr_multicast(const union vxlan_addr *ipa) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (ipa->sa.sa_family == AF_INET6) + return ipv6_addr_is_multicast(&ipa->sin6.sin6_addr); + else +#endif + return IN_MULTICAST(ntohl(ipa->sin.sin_addr.s_addr)); +} + +static int vxlan_nla_get_addr(union vxlan_addr *ip, struct nlattr *nla) +{ + if (nla_len(nla) >= sizeof(struct in6_addr)) { +#if IS_ENABLED(CONFIG_IPV6) + nla_memcpy(&ip->sin6.sin6_addr, nla, sizeof(struct in6_addr)); + ip->sa.sa_family = AF_INET6; + return 0; +#else + return -EAFNOSUPPORT; +#endif + } else if (nla_len(nla) >= sizeof(__be32)) { + ip->sin.sin_addr.s_addr = nla_get_be32(nla); + ip->sa.sa_family = AF_INET; + return 0; + } else { + return -EAFNOSUPPORT; + } +} + +static int vxlan_nla_put_addr(struct sk_buff *skb, int attr, + const union vxlan_addr *ip) +{ +#if IS_ENABLED(CONFIG_IPV6) + if (ip->sa.sa_family == AF_INET6) + return nla_put(skb, attr, sizeof(struct in6_addr), &ip->sin6.sin6_addr); + else +#endif + return nla_put_be32(skb, attr, ip->sin.sin_addr.s_addr); +} + /* salt for hash table */ static u32 vxlan_salt __read_mostly; @@ -178,7 +256,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, if (type == RTM_GETNEIGH) { ndm->ndm_family = AF_INET; - send_ip = rdst->remote_ip != htonl(INADDR_ANY); + send_ip = !vxlan_addr_any(&rdst->remote_ip); send_eth = !is_zero_ether_addr(fdb->eth_addr); } else ndm->ndm_family = AF_BRIDGE; @@ -190,7 +268,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan, if (send_eth && nla_put(skb, NDA_LLADDR, ETH_ALEN, &fdb->eth_addr)) goto nla_put_failure; - if (send_ip && nla_put_be32(skb, NDA_DST, rdst->remote_ip)) + if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip)) goto nla_put_failure; if (rdst->remote_port && rdst->remote_port != vxlan->dst_port && @@ -222,7 +300,7 @@ static inline size_t vxlan_nlmsg_size(void) { return NLMSG_ALIGN(sizeof(struct ndmsg)) + nla_total_size(ETH_ALEN) /* NDA_LLADDR */ - + nla_total_size(sizeof(__be32)) /* NDA_DST */ + + nla_total_size(sizeof(struct in6_addr)) /* NDA_DST */ + nla_total_size(sizeof(__be16)) /* NDA_PORT */ + nla_total_size(sizeof(__be32)) /* NDA_VNI */ + nla_total_size(sizeof(__u32)) /* NDA_IFINDEX */ @@ -255,14 +333,14 @@ errout: rtnl_set_sk_err(net, RTNLGRP_NEIGH, err); } -static void vxlan_ip_miss(struct net_device *dev, __be32 ipa) +static void vxlan_ip_miss(struct net_device *dev, union vxlan_addr *ipa) { struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_fdb f; memset(&f, 0, sizeof f); f.state = NUD_STALE; - f.remote.remote_ip = ipa; /* goes to NDA_DST */ + f.remote.remote_ip = *ipa; /* goes to NDA_DST */ f.remote.remote_vni = VXLAN_N_VID; vxlan_fdb_notify(vxlan, &f, RTM_GETNEIGH); @@ -317,14 +395,14 @@ static struct vxlan_fdb *vxlan_find_mac(struct vxlan_dev *vxlan, } /* Add/update destinations for multicast */ -static int vxlan_fdb_append(struct vxlan_fdb *f, - __be32 ip, __be16 port, __u32 vni, __u32 ifindex) +static int vxlan_fdb_append(struct vxlan_fdb *f, union vxlan_addr *ip, + __be16 port, __u32 vni, __u32 ifindex) { struct vxlan_rdst *rd_prev, *rd; rd_prev = NULL; for (rd = &f->remote; rd; rd = rd->remote_next) { - if (rd->remote_ip == ip && + if (vxlan_addr_equal(&rd->remote_ip, ip) && rd->remote_port == port && rd->remote_vni == vni && rd->remote_ifindex == ifindex) @@ -334,7 +412,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f, rd = kmalloc(sizeof(*rd), GFP_ATOMIC); if (rd == NULL) return -ENOBUFS; - rd->remote_ip = ip; + rd->remote_ip = *ip; rd->remote_port = port; rd->remote_vni = vni; rd->remote_ifindex = ifindex; @@ -345,7 +423,7 @@ static int vxlan_fdb_append(struct vxlan_fdb *f, /* Add new entry to forwarding table -- assumes lock held */ static int vxlan_fdb_create(struct vxlan_dev *vxlan, - const u8 *mac, __be32 ip, + const u8 *mac, union vxlan_addr *ip, __u16 state, __u16 flags, __be16 port, __u32 vni, __u32 ifindex, __u8 ndm_flags) @@ -385,13 +463,20 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan, if (vxlan->addrmax && vxlan->addrcnt >= vxlan->addrmax) return -ENOSPC; - netdev_dbg(vxlan->dev, "add %pM -> %pI4\n", mac, &ip); +#if IS_ENABLED(CONFIG_IPV6) + if (ip->sa.sa_family == AF_INET6) + netdev_dbg(vxlan->dev, "add %pM -> %pI6\n", mac, + &ip->sin6.sin6_addr); + else +#endif + netdev_dbg(vxlan->dev, "add %pM -> %pI4\n", mac, + &ip->sin.sin_addr.s_addr); f = kmalloc(sizeof(*f), GFP_ATOMIC); if (!f) return -ENOMEM; notify = 1; - f->remote.remote_ip = ip; + f->remote.remote_ip = *ip; f->remote.remote_port = port; f->remote.remote_vni = vni; f->remote.remote_ifindex = ifindex; @@ -444,7 +529,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], { struct vxlan_dev *vxlan = netdev_priv(dev); struct net *net = dev_net(vxlan->dev); - __be32 ip; + union vxlan_addr ip; __be16 port; u32 vni, ifindex; int err; @@ -458,10 +543,9 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], if (tb[NDA_DST] == NULL) return -EINVAL; - if (nla_len(tb[NDA_DST]) != sizeof(__be32)) - return -EAFNOSUPPORT; - - ip = nla_get_be32(tb[NDA_DST]); + err = vxlan_nla_get_addr(&ip, tb[NDA_DST]); + if (err) + return err; if (tb[NDA_PORT]) { if (nla_len(tb[NDA_PORT]) != sizeof(__be16)) @@ -491,7 +575,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], ifindex = 0; spin_lock_bh(&vxlan->hash_lock); - err = vxlan_fdb_create(vxlan, addr, ip, ndm->ndm_state, flags, + err = vxlan_fdb_create(vxlan, addr, &ip, ndm->ndm_state, flags, port, vni, ifindex, ndm->ndm_flags); spin_unlock_bh(&vxlan->hash_lock); @@ -555,7 +639,7 @@ skip: * and Tunnel endpoint. */ static void vxlan_snoop(struct net_device *dev, - __be32 src_ip, const u8 *src_mac) + union vxlan_addr *src_ip, const u8 *src_mac) { struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_fdb *f; @@ -564,15 +648,25 @@ static void vxlan_snoop(struct net_device *dev, f = vxlan_find_mac(vxlan, src_mac); if (likely(f)) { f->used = jiffies; - if (likely(f->remote.remote_ip == src_ip)) + if (likely(vxlan_addr_equal(&f->remote.remote_ip, src_ip))) return; - if (net_ratelimit()) - netdev_info(dev, - "%pM migrated from %pI4 to %pI4\n", - src_mac, &f->remote.remote_ip, &src_ip); + if (net_ratelimit()) { +#if IS_ENABLED(CONFIG_IPV6) + if (src_ip->sa.sa_family == AF_INET6) + netdev_info(dev, + "%pM migrated from %pI6 to %pI6\n", + src_mac, &f->remote.remote_ip.sin6.sin6_addr, + &src_ip->sin6.sin6_addr); + else +#endif + netdev_info(dev, + "%pM migrated from %pI4 to %pI4\n", + src_mac, &f->remote.remote_ip.sin.sin_addr.s_addr, + &src_ip->sin.sin_addr.s_addr); + } - f->remote.remote_ip = src_ip; + f->remote.remote_ip = *src_ip; f->updated = jiffies; } else { /* learned new entry */ @@ -603,7 +697,8 @@ static bool vxlan_group_used(struct vxlan_net *vn, if (!netif_running(vxlan->dev)) continue; - if (vxlan->default_dst.remote_ip == this->default_dst.remote_ip) + if (vxlan_addr_equal(&vxlan->default_dst.remote_ip, + &this->default_dst.remote_ip)) return true; } @@ -616,11 +711,12 @@ static int vxlan_join_group(struct net_device *dev) struct vxlan_dev *vxlan = netdev_priv(dev); struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); struct sock *sk = vn->sock->sk; + union vxlan_addr *ip = &vxlan->default_dst.remote_ip; struct ip_mreqn mreq = { - .imr_multiaddr.s_addr = vxlan->default_dst.remote_ip, + .imr_multiaddr.s_addr = ip->sin.sin_addr.s_addr, .imr_ifindex = vxlan->default_dst.remote_ifindex, }; - int err; + int err = 0; /* Already a member of group */ if (vxlan_group_used(vn, vxlan)) @@ -628,8 +724,17 @@ static int vxlan_join_group(struct net_device *dev) /* Need to drop RTNL to call multicast join */ rtnl_unlock(); - lock_sock(sk); - err = ip_mc_join_group(sk, &mreq); + if (ip->sa.sa_family == AF_INET) { + lock_sock(sk); + err = ip_mc_join_group(sk, &mreq); +#if IS_ENABLED(CONFIG_IPV6) + } else { + sk = vn->sock6->sk; + lock_sock(sk); + err = ipv6_stub->ipv6_sock_mc_join(sk, vxlan->default_dst.remote_ifindex, + &ip->sin6.sin6_addr); +#endif + } release_sock(sk); rtnl_lock(); @@ -644,8 +749,9 @@ static int vxlan_leave_group(struct net_device *dev) struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); int err = 0; struct sock *sk = vn->sock->sk; + union vxlan_addr *ip = &vxlan->default_dst.remote_ip; struct ip_mreqn mreq = { - .imr_multiaddr.s_addr = vxlan->default_dst.remote_ip, + .imr_multiaddr.s_addr = ip->sin.sin_addr.s_addr, .imr_ifindex = vxlan->default_dst.remote_ifindex, }; @@ -655,8 +761,17 @@ static int vxlan_leave_group(struct net_device *dev) /* Need to drop RTNL to call multicast leave */ rtnl_unlock(); - lock_sock(sk); - err = ip_mc_leave_group(sk, &mreq); + if (ip->sa.sa_family == AF_INET) { + lock_sock(sk); + err = ip_mc_leave_group(sk, &mreq); +#if IS_ENABLED(CONFIG_IPV6) + } else { + sk = vn->sock6->sk; + lock_sock(sk); + err = ipv6_stub->ipv6_sock_mc_drop(sk, vxlan->default_dst.remote_ifindex, + &ip->sin6.sin6_addr); +#endif + } release_sock(sk); rtnl_lock(); @@ -666,12 +781,16 @@ static int vxlan_leave_group(struct net_device *dev) /* Callback from net/ipv4/udp.c to receive packets */ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) { - struct iphdr *oip; + struct iphdr *oip = NULL; +#if IS_ENABLED(CONFIG_IPV6) + struct ipv6hdr *oip6 = NULL; +#endif struct vxlanhdr *vxh; struct vxlan_dev *vxlan; struct pcpu_tstats *stats; + union vxlan_addr src_ip; __u32 vni; - int err; + int err = 0; /* pop off outer UDP header */ __skb_pull(skb, sizeof(struct udphdr)); @@ -708,7 +827,13 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) skb_reset_mac_header(skb); /* Re-examine inner Ethernet packet */ - oip = ip_hdr(skb); + if (skb->protocol == htons(ETH_P_IP)) + oip = ip_hdr(skb); +#if IS_ENABLED(CONFIG_IPV6) + if (skb->protocol == htons(ETH_P_IPV6)) + oip6 = ipv6_hdr(skb); +#endif + skb->protocol = eth_type_trans(skb, vxlan->dev); /* Ignore packet loops (and multicast echo) */ @@ -716,8 +841,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) vxlan->dev->dev_addr) == 0) goto drop; - if (vxlan->flags & VXLAN_F_LEARN) - vxlan_snoop(skb->dev, oip->saddr, eth_hdr(skb)->h_source); + if (vxlan->flags & VXLAN_F_LEARN) { + if (oip) { + src_ip.sin.sin_addr.s_addr = oip->saddr; + src_ip.sa.sa_family = AF_INET; + } +#if IS_ENABLED(CONFIG_IPV6) + if (oip6) { + src_ip.sin6.sin6_addr = oip6->saddr; + src_ip.sa.sa_family = AF_INET6; + } +#endif + vxlan_snoop(skb->dev, &src_ip, eth_hdr(skb)->h_source); + } __skb_tunnel_rx(skb, vxlan->dev); skb_reset_network_header(skb); @@ -733,11 +869,24 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb) skb->encapsulation = 0; - err = IP_ECN_decapsulate(oip, skb); +#if IS_ENABLED(CONFIG_IPV6) + if (oip6) + err = IP6_ECN_decapsulate(oip6, skb); +#endif + if (oip) + err = IP_ECN_decapsulate(oip, skb); + if (unlikely(err)) { - if (log_ecn_error) - net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n", - &oip->saddr, oip->tos); + if (log_ecn_error) { +#if IS_ENABLED(CONFIG_IPV6) + if (oip6) + net_info_ratelimited("non-ECT from %pI6\n", + &oip6->saddr); +#endif + if (oip) + net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n", + &oip->saddr, oip->tos); + } if (err > 1) { ++vxlan->dev->stats.rx_frame_errors; ++vxlan->dev->stats.rx_errors; @@ -772,6 +921,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb) u8 *arpptr, *sha; __be32 sip, tip; struct neighbour *n; + union vxlan_addr ipa; if (dev->flags & IFF_NOARP) goto out; @@ -813,7 +963,7 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb) } f = vxlan_find_mac(vxlan, n->ha); - if (f && f->remote.remote_ip == htonl(INADDR_ANY)) { + if (f && vxlan_addr_any(&f->remote.remote_ip)) { /* bridge-local neighbor */ neigh_release(n); goto out; @@ -831,8 +981,11 @@ static int arp_reduce(struct net_device *dev, struct sk_buff *skb) if (netif_rx_ni(reply) == NET_RX_DROP) dev->stats.rx_dropped++; - } else if (vxlan->flags & VXLAN_F_L3MISS) - vxlan_ip_miss(dev, tip); + } else if (vxlan->flags & VXLAN_F_L3MISS) { + ipa.sin.sin_addr.s_addr = tip; + ipa.sa.sa_family = AF_INET; + vxlan_ip_miss(dev, &ipa); + } out: consume_skb(skb); return NETDEV_TX_OK; @@ -854,6 +1007,14 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) return false; pip = ip_hdr(skb); n = neigh_lookup(&arp_tbl, &pip->daddr, dev); + if (!n && vxlan->flags & VXLAN_F_L3MISS) { + union vxlan_addr ipa; + ipa.sin.sin_addr.s_addr = pip->daddr; + ipa.sa.sa_family = AF_INET; + vxlan_ip_miss(dev, &ipa); + return false; + } + break; default: return false; @@ -870,8 +1031,8 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) } neigh_release(n); return diff; - } else if (vxlan->flags & VXLAN_F_L3MISS) - vxlan_ip_miss(dev, pip->daddr); + } + return false; } @@ -881,10 +1042,11 @@ static void vxlan_sock_free(struct sk_buff *skb) } /* On transmit, associate with the tunnel socket */ -static void vxlan_set_owner(struct net_device *dev, struct sk_buff *skb) +static void vxlan_set_owner(struct net_device *dev, struct sk_buff *skb, + bool ipv6) { struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); - struct sock *sk = vn->sock->sk; + struct sock *sk = ipv6 ? vn->sock6->sk : vn->sock->sk; skb_orphan(skb); sock_hold(sk); @@ -930,15 +1092,26 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan, { struct pcpu_tstats *tx_stats = this_cpu_ptr(src_vxlan->dev->tstats); struct pcpu_tstats *rx_stats = this_cpu_ptr(dst_vxlan->dev->tstats); + union vxlan_addr loopback; skb->pkt_type = PACKET_HOST; skb->encapsulation = 0; skb->dev = dst_vxlan->dev; __skb_pull(skb, skb_network_offset(skb)); + if (dst_vxlan->default_dst.remote_ip.sa.sa_family == AF_INET) { + loopback.sin.sin_addr.s_addr = htonl(INADDR_LOOPBACK); + loopback.sa.sa_family = AF_INET; + } +#if IS_ENABLED(CONFIG_IPV6) + else { + loopback.sin6.sin6_addr = in6addr_loopback; + loopback.sa.sa_family = AF_INET6; + } +#endif + if (dst_vxlan->flags & VXLAN_F_LEARN) - vxlan_snoop(skb->dev, htonl(INADDR_LOOPBACK), - eth_hdr(skb)->h_source); + vxlan_snoop(skb->dev, &loopback, eth_hdr(skb)->h_source); u64_stats_update_begin(&tx_stats->syncp); tx_stats->tx_packets++; @@ -960,22 +1133,29 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, { struct vxlan_dev *vxlan = netdev_priv(dev); struct rtable *rt; - const struct iphdr *old_iph; + const struct iphdr *old_iph = NULL; struct iphdr *iph; struct vxlanhdr *vxh; struct udphdr *uh; struct flowi4 fl4; - __be32 dst; - __be16 src_port, dst_port; +#if IS_ENABLED(CONFIG_IPV6) + struct flowi6 fl6; + struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id); + struct sock *sk = vn->sock6->sk; + struct ipv6hdr *ip6h; +#endif + const union vxlan_addr *dst; + struct dst_entry *ndst = NULL; + __be16 src_port = 0, dst_port; u32 vni; __be16 df = 0; __u8 tos, ttl; dst_port = rdst->remote_port ? rdst->remote_port : vxlan->dst_port; vni = rdst->remote_vni; - dst = rdst->remote_ip; + dst = &rdst->remote_ip; - if (!dst) { + if (vxlan_addr_any(dst)) { if (did_rsc) { /* short-circuited back to local bridge */ vxlan_encap_bypass(skb, vxlan, vxlan); @@ -989,60 +1169,119 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, skb->encapsulation = 1; } - /* Need space for new headers (invalidates iph ptr) */ - if (skb_cow_head(skb, VXLAN_HEADROOM)) - goto drop; + ttl = vxlan->ttl; + tos = vxlan->tos; + if (dst->sa.sa_family == AF_INET) { + /* Need space for new headers (invalidates iph ptr) */ + if (skb_cow_head(skb, VXLAN_HEADROOM)) + goto drop; - old_iph = ip_hdr(skb); + old_iph = ip_hdr(skb); + if (!ttl && IN_MULTICAST(ntohl(dst->sin.sin_addr.s_addr))) + ttl = 1; - ttl = vxlan->ttl; - if (!ttl && IN_MULTICAST(ntohl(dst))) - ttl = 1; + if (tos == 1) + tos = ip_tunnel_get_dsfield(old_iph, skb); - tos = vxlan->tos; - if (tos == 1) - tos = ip_tunnel_get_dsfield(old_iph, skb); - - src_port = vxlan_src_port(vxlan, skb); - - memset(&fl4, 0, sizeof(fl4)); - fl4.flowi4_oif = rdst->remote_ifindex; - fl4.flowi4_tos = RT_TOS(tos); - fl4.daddr = dst; - fl4.saddr = vxlan->saddr; - - rt = ip_route_output_key(dev_net(dev), &fl4); - if (IS_ERR(rt)) { - netdev_dbg(dev, "no route to %pI4\n", &dst); - dev->stats.tx_carrier_errors++; - goto tx_error; - } + src_port = vxlan_src_port(vxlan, skb); - if (rt->dst.dev == dev) { - netdev_dbg(dev, "circular route to %pI4\n", &dst); - ip_rt_put(rt); - dev->stats.collisions++; - goto tx_error; - } + memset(&fl4, 0, sizeof(fl4)); + fl4.flowi4_oif = rdst->remote_ifindex; + fl4.flowi4_tos = RT_TOS(tos); + fl4.daddr = dst->sin.sin_addr.s_addr; + fl4.saddr = vxlan->saddr.sin.sin_addr.s_addr; + + rt = ip_route_output_key(dev_net(dev), &fl4); + if (IS_ERR(rt)) { + netdev_dbg(dev, "no route to %pI4\n", + &dst->sin.sin_addr.s_addr); + dev->stats.tx_carrier_errors++; + goto tx_error; + } + + if (rt->dst.dev == dev) { + netdev_dbg(dev, "circular route to %pI4\n", + &dst->sin.sin_addr.s_addr); + ip_rt_put(rt); + dev->stats.collisions++; + goto tx_error; + } + + /* Bypass encapsulation if the destination is local */ + if (rt->rt_flags & RTCF_LOCAL && + !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { + struct vxlan_dev *dst_vxlan; + + ip_rt_put(rt); + dst_vxlan = vxlan_find_vni(dev_net(dev), vni); + if (!dst_vxlan) + goto tx_error; + vxlan_encap_bypass(skb, vxlan, dst_vxlan); + return NETDEV_TX_OK; + } + + ndst = &rt->dst; +#if IS_ENABLED(CONFIG_IPV6) + } else { + const struct ipv6hdr *old_iph6; + u32 flags; + + /* Need space for new headers (invalidates ipv6h ptr) */ + if (skb_cow_head(skb, VXLAN6_HEADROOM)) + goto drop; + + old_iph6 = ipv6_hdr(skb); + if (!ttl && ipv6_addr_is_multicast(&dst->sin6.sin6_addr)) + ttl = 1; + + if (tos == 1) + tos = ipv6_get_dsfield(old_iph6); + + src_port = vxlan_src_port(vxlan, skb); - /* Bypass encapsulation if the destination is local */ - if (rt->rt_flags & RTCF_LOCAL && - !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { - struct vxlan_dev *dst_vxlan; + memset(&fl6, 0, sizeof(fl6)); + fl6.flowi6_oif = rdst->remote_ifindex; + fl6.flowi6_tos = RT_TOS(tos); + fl6.daddr = dst->sin6.sin6_addr; + fl6.saddr = vxlan->saddr.sin6.sin6_addr; + fl6.flowi6_proto = skb->protocol; - ip_rt_put(rt); - dst_vxlan = vxlan_find_vni(dev_net(dev), vni); - if (!dst_vxlan) + if (ipv6_stub->ipv6_dst_lookup(sk, &ndst, &fl6)) { + netdev_dbg(dev, "no route to %pI6\n", + &dst->sin6.sin6_addr); + dev->stats.tx_carrier_errors++; goto tx_error; - vxlan_encap_bypass(skb, vxlan, dst_vxlan); - return NETDEV_TX_OK; + } + + if (ndst->dev == dev) { + netdev_dbg(dev, "circular route to %pI6\n", + &dst->sin6.sin6_addr); + dst_release(ndst); + dev->stats.collisions++; + goto tx_error; + } + + /* Bypass encapsulation if the destination is local */ + flags = ((struct rt6_info *)ndst)->rt6i_flags; + if (flags & RTF_LOCAL && + !(flags & (RTCF_BROADCAST | RTCF_MULTICAST))) { + struct vxlan_dev *dst_vxlan; + + dst_release(ndst); + dst_vxlan = vxlan_find_vni(dev_net(dev), vni); + if (!dst_vxlan) + goto tx_error; + vxlan_encap_bypass(skb, vxlan, dst_vxlan); + return NETDEV_TX_OK; + } +#endif } memset(&(IPCB(skb)->opt), 0, sizeof(IPCB(skb)->opt)); IPCB(skb)->flags &= ~(IPSKB_XFRM_TUNNEL_SIZE | IPSKB_XFRM_TRANSFORMED | IPSKB_REROUTED); skb_dst_drop(skb); - skb_dst_set(skb, &rt->dst); + skb_dst_set(skb, ndst); vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh)); vxh->vx_flags = htonl(VXLAN_FLAGS); @@ -1058,27 +1297,65 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, uh->len = htons(skb->len); uh->check = 0; - __skb_push(skb, sizeof(*iph)); - skb_reset_network_header(skb); - iph = ip_hdr(skb); - iph->version = 4; - iph->ihl = sizeof(struct iphdr) >> 2; - iph->frag_off = df; - iph->protocol = IPPROTO_UDP; - iph->tos = ip_tunnel_ecn_encap(tos, old_iph, skb); - iph->daddr = dst; - iph->saddr = fl4.saddr; - iph->ttl = ttl ? : ip4_dst_hoplimit(&rt->dst); - tunnel_ip_select_ident(skb, old_iph, &rt->dst); - - nf_reset(skb); + if (dst->sa.sa_family == AF_INET) { + __skb_push(skb, sizeof(*iph)); + skb_reset_network_header(skb); + iph = ip_hdr(skb); + iph->version = 4; + iph->ihl = sizeof(struct iphdr) >> 2; + iph->frag_off = df; + iph->protocol = IPPROTO_UDP; + iph->tos = ip_tunnel_ecn_encap(tos, old_iph, skb); + iph->daddr = dst->sin.sin_addr.s_addr; + iph->saddr = fl4.saddr; + iph->ttl = ttl ? : ip4_dst_hoplimit(ndst); + tunnel_ip_select_ident(skb, old_iph, ndst); + + vxlan_set_owner(dev, skb, false); +#if IS_ENABLED(CONFIG_IPV6) + } else { + if (!skb_is_gso(skb) && !(ndst->dev->features & NETIF_F_IPV6_CSUM)) { + __wsum csum = skb_checksum(skb, 0, skb->len, 0); + skb->ip_summed = CHECKSUM_UNNECESSARY; + uh->check = csum_ipv6_magic(&fl6.saddr, &fl6.daddr, skb->len, + IPPROTO_UDP, csum); + if (uh->check == 0) + uh->check = CSUM_MANGLED_0; + } else { + skb->ip_summed = CHECKSUM_PARTIAL; + skb->csum_start = skb_transport_header(skb) - skb->head; + skb->csum_offset = offsetof(struct udphdr, check); + uh->check = ~csum_ipv6_magic(&fl6.saddr, &fl6.daddr, + skb->len, IPPROTO_UDP, 0); + } - vxlan_set_owner(dev, skb); + __skb_push(skb, sizeof(*ip6h)); + skb_reset_network_header(skb); + ip6h = ipv6_hdr(skb); + ip6h->version = 6; + ip6h->priority = 0; + ip6h->flow_lbl[0] = 0; + ip6h->flow_lbl[1] = 0; + ip6h->flow_lbl[2] = 0; + ip6h->payload_len = htons(skb->len); + ip6h->nexthdr = IPPROTO_UDP; + ip6h->hop_limit = ttl ? : ip6_dst_hoplimit(ndst); + ip6h->daddr = fl6.daddr; + ip6h->saddr = fl6.saddr; + + vxlan_set_owner(dev, skb, true); +#endif + } if (handle_offloads(skb)) goto drop; - iptunnel_xmit(skb, dev); +#if IS_ENABLED(CONFIG_IPV6) + if (dst->sa.sa_family == AF_INET6) + ip6tunnel_xmit(skb, dev); + else +#endif + iptunnel_xmit(skb, dev); return NETDEV_TX_OK; drop: @@ -1126,7 +1403,7 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) if (f == NULL) { rdst0 = &vxlan->default_dst; - if (rdst0->remote_ip == htonl(INADDR_ANY) && + if (vxlan_addr_any(&rdst0->remote_ip) && (vxlan->flags & VXLAN_F_L2MISS) && !is_multicast_ether_addr(eth->h_dest)) vxlan_fdb_miss(vxlan, eth->h_dest); @@ -1204,7 +1481,7 @@ static int vxlan_open(struct net_device *dev) struct vxlan_dev *vxlan = netdev_priv(dev); int err; - if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip))) { + if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip)) { err = vxlan_join_group(dev); if (err) return err; @@ -1238,7 +1515,7 @@ static int vxlan_stop(struct net_device *dev) { struct vxlan_dev *vxlan = netdev_priv(dev); - if (IN_MULTICAST(ntohl(vxlan->default_dst.remote_ip))) + if (vxlan_addr_multicast(&vxlan->default_dst.remote_ip)) vxlan_leave_group(dev); del_timer_sync(&vxlan->age_timer); @@ -1288,7 +1565,12 @@ static void vxlan_setup(struct net_device *dev) eth_hw_addr_random(dev); ether_setup(dev); - dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM; +#if IS_ENABLED(CONFIG_IPV6) + if (vxlan->default_dst.remote_ip.sa.sa_family == AF_INET6) + dev->hard_header_len = ETH_HLEN + VXLAN6_HEADROOM; + else +#endif + dev->hard_header_len = ETH_HLEN + VXLAN_HEADROOM; dev->netdev_ops = &vxlan_netdev_ops; dev->destructor = vxlan_free; @@ -1326,8 +1608,10 @@ static void vxlan_setup(struct net_device *dev) static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = { [IFLA_VXLAN_ID] = { .type = NLA_U32 }, [IFLA_VXLAN_GROUP] = { .len = FIELD_SIZEOF(struct iphdr, daddr) }, + [IFLA_VXLAN_GROUP6] = { .len = sizeof(struct in6_addr) }, [IFLA_VXLAN_LINK] = { .type = NLA_U32 }, [IFLA_VXLAN_LOCAL] = { .len = FIELD_SIZEOF(struct iphdr, saddr) }, + [IFLA_VXLAN_LOCAL6] = { .len = sizeof(struct in6_addr) }, [IFLA_VXLAN_TOS] = { .type = NLA_U8 }, [IFLA_VXLAN_TTL] = { .type = NLA_U8 }, [IFLA_VXLAN_LEARNING] = { .type = NLA_U8 }, @@ -1408,11 +1692,37 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, } dst->remote_vni = vni; - if (data[IFLA_VXLAN_GROUP]) - dst->remote_ip = nla_get_be32(data[IFLA_VXLAN_GROUP]); + if (data[IFLA_VXLAN_GROUP]) { + dst->remote_ip.sin.sin_addr.s_addr = nla_get_be32(data[IFLA_VXLAN_GROUP]); + dst->remote_ip.sa.sa_family = AF_INET; + } else if (data[IFLA_VXLAN_GROUP6]) { +#if IS_ENABLED(CONFIG_IPV6) + if (ipv6_disabled) + return -EPFNOSUPPORT; + + nla_memcpy(&dst->remote_ip.sin6.sin6_addr, data[IFLA_VXLAN_GROUP6], + sizeof(struct in6_addr)); + dst->remote_ip.sa.sa_family = AF_INET6; +#else + return -EPFNOSUPPORT; +#endif + } - if (data[IFLA_VXLAN_LOCAL]) - vxlan->saddr = nla_get_be32(data[IFLA_VXLAN_LOCAL]); + if (data[IFLA_VXLAN_LOCAL]) { + vxlan->saddr.sin.sin_addr.s_addr = nla_get_be32(data[IFLA_VXLAN_LOCAL]); + vxlan->saddr.sa.sa_family = AF_INET; + } else if (data[IFLA_VXLAN_LOCAL6]) { +#if IS_ENABLED(CONFIG_IPV6) + if (ipv6_disabled) + return -EPFNOSUPPORT; + + nla_memcpy(&vxlan->saddr.sin6.sin6_addr, data[IFLA_VXLAN_LOCAL6], + sizeof(struct in6_addr)); + vxlan->saddr.sa.sa_family = AF_INET6; +#else + return -EPFNOSUPPORT; +#endif + } if (data[IFLA_VXLAN_LINK] && (dst->remote_ifindex = nla_get_u32(data[IFLA_VXLAN_LINK]))) { @@ -1493,9 +1803,9 @@ static size_t vxlan_get_size(const struct net_device *dev) { return nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_ID */ - nla_total_size(sizeof(__be32)) +/* IFLA_VXLAN_GROUP */ + nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_GROUP{6} */ nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_LINK */ - nla_total_size(sizeof(__be32))+ /* IFLA_VXLAN_LOCAL */ + nla_total_size(sizeof(struct in6_addr)) + /* IFLA_VXLAN_LOCAL{6} */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TTL */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_TOS */ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_LEARNING */ @@ -1522,14 +1832,36 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put_u32(skb, IFLA_VXLAN_ID, dst->remote_vni)) goto nla_put_failure; - if (dst->remote_ip && nla_put_be32(skb, IFLA_VXLAN_GROUP, dst->remote_ip)) - goto nla_put_failure; + if (!vxlan_addr_any(&dst->remote_ip)) { + if (dst->remote_ip.sa.sa_family == AF_INET) { + if (nla_put_be32(skb, IFLA_VXLAN_GROUP, + dst->remote_ip.sin.sin_addr.s_addr)) + goto nla_put_failure; + } else { +#if IS_ENABLED(CONFIG_IPV6) + if (nla_put(skb, IFLA_VXLAN_GROUP6, sizeof(struct in6_addr), + &dst->remote_ip.sin6.sin6_addr)) + goto nla_put_failure; +#endif + } + } if (dst->remote_ifindex && nla_put_u32(skb, IFLA_VXLAN_LINK, dst->remote_ifindex)) goto nla_put_failure; - if (vxlan->saddr && nla_put_be32(skb, IFLA_VXLAN_LOCAL, vxlan->saddr)) - goto nla_put_failure; + if (!vxlan_addr_any(&vxlan->saddr)) { + if (vxlan->saddr.sa.sa_family == AF_INET) { + if (nla_put_be32(skb, IFLA_VXLAN_LOCAL, + vxlan->saddr.sin.sin_addr.s_addr)) + goto nla_put_failure; + } else { +#if IS_ENABLED(CONFIG_IPV6) + if (nla_put(skb, IFLA_VXLAN_LOCAL6, sizeof(struct in6_addr), + &vxlan->saddr.sin6.sin6_addr)) + goto nla_put_failure; +#endif + } + } if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) || nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) || @@ -1569,18 +1901,17 @@ static struct rtnl_link_ops vxlan_link_ops __read_mostly = { .fill_info = vxlan_fill_info, }; -static __net_init int vxlan_init_net(struct net *net) +static __net_init int create_v4_sock(struct net *net) { - struct vxlan_net *vn = net_generic(net, vxlan_net_id); struct sock *sk; + struct vxlan_net *vn = net_generic(net, vxlan_net_id); struct sockaddr_in vxlan_addr = { .sin_family = AF_INET, + .sin_port = htons(vxlan_port), .sin_addr.s_addr = htonl(INADDR_ANY), }; int rc; - unsigned h; - /* Create UDP socket for encapsulation receive. */ rc = sock_create_kern(AF_INET, SOCK_DGRAM, IPPROTO_UDP, &vn->sock); if (rc < 0) { pr_debug("UDP socket create failed\n"); @@ -1590,10 +1921,8 @@ static __net_init int vxlan_init_net(struct net *net) sk = vn->sock->sk; sk_change_net(sk, net); - vxlan_addr.sin_port = htons(vxlan_port); - - rc = kernel_bind(vn->sock, (struct sockaddr *) &vxlan_addr, - sizeof(vxlan_addr)); + rc = kernel_bind(vn->sock, (struct sockaddr *)&vxlan_addr, + sizeof(struct sockaddr_in)); if (rc < 0) { pr_debug("bind for UDP socket %pI4:%u (%d)\n", &vxlan_addr.sin_addr, ntohs(vxlan_addr.sin_port), rc); @@ -1604,11 +1933,94 @@ static __net_init int vxlan_init_net(struct net *net) /* Disable multicast loopback */ inet_sk(sk)->mc_loop = 0; + /* Mark socket as an encapsulation socket. */ + udp_sk(sk)->encap_type = 1; + udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv; + return 0; +} + +/* Create UDP socket for encapsulation receive. AF_INET6 socket + * could be used for both IPv4 and IPv6 communications, but + * users may set bindv6only=1. + */ +#if IS_ENABLED(CONFIG_IPV6) +static __net_init int create_v6_sock(struct net *net) +{ + struct sock *sk; + struct vxlan_net *vn = net_generic(net, vxlan_net_id); + struct sockaddr_in6 vxlan_addr = { + .sin6_family = AF_INET6, + .sin6_port = htons(vxlan_port), + }; + int rc, val = 1; + + rc = sock_create_kern(AF_INET6, SOCK_DGRAM, IPPROTO_UDP, &vn->sock6); + if (rc < 0) + return rc; + + /* Put in proper namespace */ + sk = vn->sock6->sk; + sk_change_net(sk, net); + + kernel_setsockopt(vn->sock6, SOL_IPV6, IPV6_V6ONLY, + (char *)&val, sizeof(val)); + rc = kernel_bind(vn->sock6, (struct sockaddr *)&vxlan_addr, + sizeof(struct sockaddr_in6)); + if (rc < 0) { + pr_debug("bind for UDP socket %pI6:%u (%d)\n", + &vxlan_addr.sin6_addr, ntohs(vxlan_addr.sin6_port), rc); + sk_release_kernel(sk); + vn->sock6 = NULL; + return rc; + } + + /* At this point, IPv6 module should have been loaded in + * sock_create_kern(). + */ + BUG_ON(!ipv6_stub); + + /* Disable multicast loopback */ + inet_sk(sk)->mc_loop = 0; /* Mark socket as an encapsulation socket. */ udp_sk(sk)->encap_type = 1; udp_sk(sk)->encap_rcv = vxlan_udp_encap_recv; + + return 0; +} + +static __net_init int create_sock(struct net *net) +{ + int rc; + rc = create_v6_sock(net); + if (rc < 0) { + pr_info("UDP IPv6 socket create failed, disable IPv6\n"); + ipv6_disabled = true; + } + + return create_v4_sock(net); +} +#else +static __net_init int create_sock(struct net *net) +{ + return create_v4_sock(net); +} +#endif + +static __net_init int vxlan_init_net(struct net *net) +{ + struct vxlan_net *vn = net_generic(net, vxlan_net_id); + int rc; + unsigned h; + + rc = create_sock(net); + if (rc < 0) + return rc; + udp_encap_enable(); +#if IS_ENABLED(CONFIG_IPV6) + ipv6_stub->udpv6_encap_enable(); +#endif for (h = 0; h < VNI_HASH_SIZE; ++h) INIT_HLIST_HEAD(&vn->vni_list[h]); @@ -1632,6 +2044,12 @@ static __net_exit void vxlan_exit_net(struct net *net) sk_release_kernel(vn->sock->sk); vn->sock = NULL; } +#if IS_ENABLED(CONFIG_IPV6) + if (vn->sock6) { + sk_release_kernel(vn->sock6->sk); + vn->sock6 = NULL; + } +#endif } static struct pernet_operations vxlan_net_ops = { diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index b05823c..f7bed18 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -311,6 +311,8 @@ enum { IFLA_VXLAN_L2MISS, IFLA_VXLAN_L3MISS, IFLA_VXLAN_PORT, /* destination port */ + IFLA_VXLAN_GROUP6, + IFLA_VXLAN_LOCAL6, __IFLA_VXLAN_MAX }; #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1) -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (5 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 06/11] vxlan: add ipv6 support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 13:07 ` Sergei Shtylyov 2013-05-17 0:21 ` [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support Cong Wang ` (3 subsequent siblings) 10 siblings, 1 reply; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David Miller, Cong Wang From: Cong Wang <amwang@redhat.com> When disable_ipv6 is set, we should not allow IPv6 vxlan device created on top of it. Cc: David Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 46c59a6..1ee79e0 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1681,6 +1681,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, struct vxlan_rdst *dst = &vxlan->default_dst; __u32 vni; int err; + bool use_ipv6 = false; if (!data[IFLA_VXLAN_ID]) return -EINVAL; @@ -1703,6 +1704,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, nla_memcpy(&dst->remote_ip.sin6.sin6_addr, data[IFLA_VXLAN_GROUP6], sizeof(struct in6_addr)); dst->remote_ip.sa.sa_family = AF_INET6; + use_ipv6 = true; #else return -EPFNOSUPPORT; #endif @@ -1719,6 +1721,7 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, nla_memcpy(&vxlan->saddr.sin6.sin6_addr, data[IFLA_VXLAN_LOCAL6], sizeof(struct in6_addr)); vxlan->saddr.sa.sa_family = AF_INET6; + use_ipv6 = true; #else return -EPFNOSUPPORT; #endif @@ -1734,6 +1737,17 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, return -ENODEV; } +#if IS_ENABLED(CONFIG_IPV6) + if (use_ipv6) { + struct inet6_dev *idev = in6_dev_get(lowerdev); + if (idev && idev->cnf.disable_ipv6) { + pr_info("IPv6 is disabled via sysctl\n"); + return -EPERM; + } + } +#else + BUG_ON(use_ipv6); +#endif if (!tb[IFLA_MTU]) dev->mtu = lowerdev->mtu - VXLAN_HEADROOM; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang @ 2013-05-17 13:07 ` Sergei Shtylyov 0 siblings, 0 replies; 23+ messages in thread From: Sergei Shtylyov @ 2013-05-17 13:07 UTC (permalink / raw) To: Cong Wang; +Cc: netdev, David Miller On 17-05-2013 4:21, Cong Wang wrote: > From: Cong Wang <amwang@redhat.com> > When disable_ipv6 is set, we should not allow IPv6 vxlan > device created on top of it. > Cc: David Miller <davem@davemloft.net> > Signed-off-by: Cong Wang <amwang@redhat.com> > --- > drivers/net/vxlan.c | 14 ++++++++++++++ > 1 files changed, 14 insertions(+), 0 deletions(-) > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c > index 46c59a6..1ee79e0 100644 > --- a/drivers/net/vxlan.c > +++ b/drivers/net/vxlan.c [...] > @@ -1734,6 +1737,17 @@ static int vxlan_newlink(struct net *net, struct net_device *dev, > return -ENODEV; > } > > +#if IS_ENABLED(CONFIG_IPV6) Why not: if (IS_ENABLED(CONFIG_IPV6)) #if's in the function body are frowned upon. > + if (use_ipv6) { > + struct inet6_dev *idev = in6_dev_get(lowerdev); Empty line wouldn't hurt here, after declaration... > + if (idev && idev->cnf.disable_ipv6) { > + pr_info("IPv6 is disabled via sysctl\n"); > + return -EPERM; > + } > + } > +#else > + BUG_ON(use_ipv6); > +#endif WBR, Sergei ^ permalink raw reply [flat|nested] 23+ messages in thread
* [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (6 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 09/11] vxlan: add ipv6 proxy support Cong Wang ` (2 subsequent siblings) 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, David Stevens, Cong Wang From: Cong Wang <amwang@redhat.com> route short circuit only has IPv4 part, this patch adds the IPv6 part. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 28 ++++++++++++++++++++++++++-- 1 files changed, 26 insertions(+), 2 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 1ee79e0..04fd499 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -995,7 +995,6 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) { struct vxlan_dev *vxlan = netdev_priv(dev); struct neighbour *n; - struct iphdr *pip; if (is_multicast_ether_addr(eth_hdr(skb)->h_dest)) return false; @@ -1003,6 +1002,9 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) n = NULL; switch (ntohs(eth_hdr(skb)->h_proto)) { case ETH_P_IP: + { + struct iphdr *pip; + if (!pskb_may_pull(skb, sizeof(struct iphdr))) return false; pip = ip_hdr(skb); @@ -1016,6 +1018,27 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) } break; + } +#if IS_ENABLED(CONFIG_IPV6) + case ETH_P_IPV6: + { + struct ipv6hdr *pip6; + + if (!pskb_may_pull(skb, sizeof(struct ipv6hdr))) + return false; + pip6 = ipv6_hdr(skb); + n = neigh_lookup(&nd_tbl, &pip6->daddr, dev); + if (!n && vxlan->flags & VXLAN_F_L3MISS) { + union vxlan_addr ipa; + ipa.sin6.sin6_addr = pip6->daddr; + ipa.sa.sa_family = AF_INET6; + vxlan_ip_miss(dev, &ipa); + return false; + } + + break; + } +#endif default: return false; } @@ -1394,7 +1417,8 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) did_rsc = false; if (f && (f->flags & NTF_ROUTER) && (vxlan->flags & VXLAN_F_RSC) && - ntohs(eth->h_proto) == ETH_P_IP) { + (ntohs(eth->h_proto) == ETH_P_IP || + ntohs(eth->h_proto) == ETH_P_IPV6)) { did_rsc = route_shortcircuit(dev, skb); if (did_rsc) f = vxlan_find_mac(vxlan, eth->h_dest); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 09/11] vxlan: add ipv6 proxy support 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (7 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David S. Miller, David Stevens, Cong Wang From: Cong Wang <amwang@redhat.com> This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 81 ++++++++++++++++++++++++++++++++++++++++++++++- include/net/addrconf.h | 4 ++ include/net/ndisc.h | 5 +++ net/ipv6/af_inet6.c | 1 + net/ipv6/ndisc.c | 8 ++-- 5 files changed, 93 insertions(+), 6 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index 04fd499..f4d46bf 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -991,6 +991,76 @@ out: return NETDEV_TX_OK; } +#if IS_ENABLED(CONFIG_IPV6) +static int neigh_reduce(struct net_device *dev, struct sk_buff *skb) +{ + struct vxlan_dev *vxlan = netdev_priv(dev); + struct neighbour *n; + union vxlan_addr ipa; + const struct ipv6hdr *iphdr; + const struct in6_addr *saddr, *daddr; + struct nd_msg *msg; + struct inet6_dev *in6_dev = NULL; + + in6_dev = in6_dev_get(dev); + if (!in6_dev) + goto consume; + + if (skb->len < sizeof(struct ipv6hdr) + sizeof(struct nd_msg) || + !pskb_may_pull(skb, skb->len)) + goto out; + + iphdr = ipv6_hdr(skb); + saddr = &iphdr->saddr; + daddr = &iphdr->daddr; + + if (iphdr->nexthdr != IPPROTO_ICMPV6) + goto out; + + if (ipv6_addr_loopback(daddr) || + ipv6_addr_is_multicast(daddr)) + goto out; + + msg = (struct nd_msg *)skb_transport_header(skb); + if (msg->icmph.icmp6_code != 0 || + msg->icmph.icmp6_type != NDISC_NEIGHBOUR_SOLICITATION) + goto out; + + n = neigh_lookup(&nd_tbl, daddr, dev); + + if (n) { + struct vxlan_fdb *f; + + if (!(n->nud_state & NUD_CONNECTED)) { + neigh_release(n); + goto out; + } + + f = vxlan_find_mac(vxlan, n->ha); + if (f && vxlan_addr_any(&f->remote.remote_ip)) { + /* bridge-local neighbor */ + neigh_release(n); + goto out; + } + + ipv6_stub->ndisc_send_na(dev, n, saddr, &msg->target, + !!in6_dev->cnf.forwarding, + true, false, false); + neigh_release(n); + } else if (vxlan->flags & VXLAN_F_L3MISS) { + ipa.sin6.sin6_addr = *daddr; + ipa.sa.sa_family = AF_INET6; + vxlan_ip_miss(dev, &ipa); + } + +out: + in6_dev_put(in6_dev); +consume: + consume_skb(skb); + return NETDEV_TX_OK; +} +#endif + static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb) { struct vxlan_dev *vxlan = netdev_priv(dev); @@ -1410,8 +1480,15 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev) skb_reset_mac_header(skb); eth = eth_hdr(skb); - if ((vxlan->flags & VXLAN_F_PROXY) && ntohs(eth->h_proto) == ETH_P_ARP) - return arp_reduce(dev, skb); + if ((vxlan->flags & VXLAN_F_PROXY)) { + if (ntohs(eth->h_proto) == ETH_P_ARP) + return arp_reduce(dev, skb); +#if IS_ENABLED(CONFIG_IPV6) + else if (ntohs(eth->h_proto) == ETH_P_IPV6 && + ipv6_hdr(skb)->nexthdr == IPPROTO_ICMPV6) + return neigh_reduce(dev, skb); +#endif + } f = vxlan_find_mac(vxlan, eth->h_dest); did_rsc = false; diff --git a/include/net/addrconf.h b/include/net/addrconf.h index d09d42c..34bccff 100644 --- a/include/net/addrconf.h +++ b/include/net/addrconf.h @@ -151,6 +151,10 @@ struct ipv6_stub { int (*ipv6_dst_lookup)(struct sock *sk, struct dst_entry **dst, struct flowi6 *fl6); void (*udpv6_encap_enable)(void); + void (*ndisc_send_na)(struct net_device *dev, struct neighbour *neigh, + const struct in6_addr *daddr, + const struct in6_addr *solicited_addr, + bool router, bool solicited, bool override, bool inc_opt); }; extern const struct ipv6_stub *ipv6_stub __read_mostly; diff --git a/include/net/ndisc.h b/include/net/ndisc.h index 745bf74..ec2da56 100644 --- a/include/net/ndisc.h +++ b/include/net/ndisc.h @@ -204,6 +204,11 @@ extern void ndisc_send_ns(struct net_device *dev, extern void ndisc_send_rs(struct net_device *dev, const struct in6_addr *saddr, const struct in6_addr *daddr); +extern void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, + const struct in6_addr *daddr, + const struct in6_addr *solicited_addr, + bool router, bool solicited, bool override, + bool inc_opt); extern void ndisc_send_redirect(struct sk_buff *skb, const struct in6_addr *target); diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 58de055..d80fe10 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -813,6 +813,7 @@ static const struct ipv6_stub ipv6_stub_impl = { .ipv6_sock_mc_drop = ipv6_sock_mc_drop, .ipv6_dst_lookup = ip6_dst_lookup, .udpv6_encap_enable = udpv6_encap_enable, + .ndisc_send_na = ndisc_send_na, }; static int __init inet6_init(void) diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c index 2712ab2..a17e4d0 100644 --- a/net/ipv6/ndisc.c +++ b/net/ipv6/ndisc.c @@ -462,10 +462,10 @@ static void ndisc_send_skb(struct sk_buff *skb, rcu_read_unlock(); } -static void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, - const struct in6_addr *daddr, - const struct in6_addr *solicited_addr, - bool router, bool solicited, bool override, bool inc_opt) +void ndisc_send_na(struct net_device *dev, struct neighbour *neigh, + const struct in6_addr *daddr, + const struct in6_addr *solicited_addr, + bool router, bool solicited, bool override, bool inc_opt) { struct sk_buff *skb; struct in6_addr tmpaddr; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (8 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 09/11] vxlan: add ipv6 proxy support Cong Wang @ 2013-05-17 0:21 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev; +Cc: David Stevens, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> As pointed out by David, we should take care of scope id for ll addr, and use it for route lookup. Cc: David Stevens <dlstevens@us.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- drivers/net/vxlan.c | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index f4d46bf..68ebfa4 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -1237,7 +1237,7 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, struct sock *sk = vn->sock6->sk; struct ipv6hdr *ip6h; #endif - const union vxlan_addr *dst; + union vxlan_addr *dst; struct dst_entry *ndst = NULL; __be16 src_port = 0, dst_port; u32 vni; @@ -1332,6 +1332,12 @@ static netdev_tx_t vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev, src_port = vxlan_src_port(vxlan, skb); + if (ipv6_addr_type(&dst->sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL) { + dst->sin6.sin6_scope_id = ipv6_iface_scope_id(&dst->sin6.sin6_addr, + rdst->remote_ifindex); + rdst->remote_ifindex = dst->sin6.sin6_scope_id; + } + memset(&fl6, 0, sizeof(fl6)); fl6.flowi6_oif = rdst->remote_ifindex; fl6.flowi6_tos = RT_TOS(tos); -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang ` (9 preceding siblings ...) 2013-05-17 0:21 ` [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr Cong Wang @ 2013-05-17 0:21 ` Cong Wang 10 siblings, 0 replies; 23+ messages in thread From: Cong Wang @ 2013-05-17 0:21 UTC (permalink / raw) To: netdev Cc: Jesse Gross, Pravin B Shelar, Stephen Hemminger, David S. Miller, Cong Wang From: Cong Wang <amwang@redhat.com> Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a (tunneling: Add generic Tunnel segmentation) This patch adds generic tunneling offloading support for IPv6-UDP based tunnels. This can be used by tunneling protocols like VXLAN. Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> --- net/ipv6/ip6_offload.c | 4 +- net/ipv6/udp_offload.c | 153 +++++++++++++++++++++++++++++++++--------------- 2 files changed, 108 insertions(+), 49 deletions(-) diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index 71b766e..87fbf2e 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -91,6 +91,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, unsigned int unfrag_ip6hlen; u8 *prevhdr; int offset = 0; + bool tunnel; if (unlikely(skb_shinfo(skb)->gso_type & ~(SKB_GSO_UDP | @@ -105,6 +106,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h)))) goto out; + tunnel = skb->encapsulation; ipv6h = ipv6_hdr(skb); __skb_pull(skb, sizeof(*ipv6h)); segs = ERR_PTR(-EPROTONOSUPPORT); @@ -125,7 +127,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb, ipv6h = ipv6_hdr(skb); ipv6h->payload_len = htons(skb->len - skb->mac_len - sizeof(*ipv6h)); - if (proto == IPPROTO_UDP) { + if (!tunnel && proto == IPPROTO_UDP) { unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr); fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen); diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 3bb3a89..2c3fa3b 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -21,26 +21,79 @@ static int udp6_ufo_send_check(struct sk_buff *skb) const struct ipv6hdr *ipv6h; struct udphdr *uh; - /* UDP Tunnel offload on ipv6 is not yet supported. */ - if (skb->encapsulation) - return -EINVAL; - if (!pskb_may_pull(skb, sizeof(*uh))) return -EINVAL; - ipv6h = ipv6_hdr(skb); - uh = udp_hdr(skb); + if (likely(!skb->encapsulation)) { + ipv6h = ipv6_hdr(skb); + uh = udp_hdr(skb); + + uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len, + IPPROTO_UDP, 0); + skb->csum_start = skb_transport_header(skb) - skb->head; + skb->csum_offset = offsetof(struct udphdr, check); + skb->ip_summed = CHECKSUM_PARTIAL; + } - uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, skb->len, - IPPROTO_UDP, 0); - skb->csum_start = skb_transport_header(skb) - skb->head; - skb->csum_offset = offsetof(struct udphdr, check); - skb->ip_summed = CHECKSUM_PARTIAL; return 0; } +static struct sk_buff *skb_udp6_tunnel_segment(struct sk_buff *skb, + netdev_features_t features) +{ + struct sk_buff *segs = ERR_PTR(-EINVAL); + int mac_len = skb->mac_len; + int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb); + int outer_hlen; + netdev_features_t enc_features; + + if (unlikely(!pskb_may_pull(skb, tnl_hlen))) + goto out; + + skb->encapsulation = 0; + __skb_pull(skb, tnl_hlen); + skb_reset_mac_header(skb); + skb_set_network_header(skb, skb_inner_network_offset(skb)); + skb->mac_len = skb_inner_network_offset(skb); + + /* segment inner packet. */ + enc_features = skb->dev->hw_enc_features & netif_skb_features(skb); + segs = skb_mac_gso_segment(skb, enc_features); + if (!segs || IS_ERR(segs)) + goto out; + + outer_hlen = skb_tnl_header_len(skb); + skb = segs; + do { + struct udphdr *uh; + struct ipv6hdr *ipv6h; + int udp_offset = outer_hlen - tnl_hlen; + u32 len; + + skb->mac_len = mac_len; + + skb_push(skb, outer_hlen); + skb_reset_mac_header(skb); + skb_set_network_header(skb, mac_len); + skb_set_transport_header(skb, udp_offset); + uh = udp_hdr(skb); + uh->len = htons(skb->len - udp_offset); + ipv6h = ipv6_hdr(skb); + len = skb->len - udp_offset; + + uh->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, + len, IPPROTO_UDP, 0); + uh->check = csum_fold(skb_checksum(skb, udp_offset, len, 0)); + if (uh->check == 0) + uh->check = CSUM_MANGLED_0; + skb->ip_summed = CHECKSUM_NONE; + } while ((skb = skb->next)); +out: + return segs; +} + static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, - netdev_features_t features) + netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EINVAL); unsigned int mss; @@ -73,43 +126,47 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, goto out; } - /* Do software UFO. Complete and fill in the UDP checksum as HW cannot - * do checksum of UDP packets sent as multiple IP fragments. - */ - offset = skb_checksum_start_offset(skb); - csum = skb_checksum(skb, offset, skb->len - offset, 0); - offset += skb->csum_offset; - *(__sum16 *)(skb->data + offset) = csum_fold(csum); - skb->ip_summed = CHECKSUM_NONE; - - /* Check if there is enough headroom to insert fragment header. */ - if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) && - pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC)) - goto out; + if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) + segs = skb_udp6_tunnel_segment(skb, features); + else { + /* Do software UFO. Complete and fill in the UDP checksum as HW cannot + * do checksum of UDP packets sent as multiple IP fragments. + */ + offset = skb_checksum_start_offset(skb); + csum = skb_checksum(skb, offset, skb->len - offset, 0); + offset += skb->csum_offset; + *(__sum16 *)(skb->data + offset) = csum_fold(csum); + skb->ip_summed = CHECKSUM_NONE; + + /* Check if there is enough headroom to insert fragment header. */ + if ((skb_mac_header(skb) < skb->head + frag_hdr_sz) && + pskb_expand_head(skb, frag_hdr_sz, 0, GFP_ATOMIC)) + goto out; - /* Find the unfragmentable header and shift it left by frag_hdr_sz - * bytes to insert fragment header. - */ - unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr); - nexthdr = *prevhdr; - *prevhdr = NEXTHDR_FRAGMENT; - unfrag_len = skb_network_header(skb) - skb_mac_header(skb) + - unfrag_ip6hlen; - mac_start = skb_mac_header(skb); - memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len); - - skb->mac_header -= frag_hdr_sz; - skb->network_header -= frag_hdr_sz; - - fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen); - fptr->nexthdr = nexthdr; - fptr->reserved = 0; - ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb)); - - /* Fragment the skb. ipv6 header and the remaining fields of the - * fragment header are updated in ipv6_gso_segment() - */ - segs = skb_segment(skb, features); + /* Find the unfragmentable header and shift it left by frag_hdr_sz + * bytes to insert fragment header. + */ + unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr); + nexthdr = *prevhdr; + *prevhdr = NEXTHDR_FRAGMENT; + unfrag_len = skb_network_header(skb) - skb_mac_header(skb) + + unfrag_ip6hlen; + mac_start = skb_mac_header(skb); + memmove(mac_start-frag_hdr_sz, mac_start, unfrag_len); + + skb->mac_header -= frag_hdr_sz; + skb->network_header -= frag_hdr_sz; + + fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen); + fptr->nexthdr = nexthdr; + fptr->reserved = 0; + ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb)); + + /* Fragment the skb. ipv6 header and the remaining fields of the + * fragment header are updated in ipv6_gso_segment() + */ + segs = skb_segment(skb, features); + } out: return segs; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2013-05-24 5:15 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-05-17 0:21 [Patch net-next v8 00/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 01/11] vxlan: defer vxlan init as late as possible Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 02/11] ipv6: make ip6_dst_hoplimit() static inline Cong Wang 2013-05-17 13:02 ` Sergei Shtylyov 2013-05-17 21:13 ` David Miller 2013-05-22 4:54 ` Cong Wang 2013-05-22 7:14 ` David Miller 2013-05-22 10:28 ` Cong Wang 2013-05-22 15:50 ` Mike Rapoport 2013-05-22 16:03 ` Cong Wang 2013-05-22 16:10 ` Mike Rapoport 2013-05-24 5:10 ` Cong Wang 2013-05-24 5:15 ` Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 03/11] ipv6: move ip6_local_out into core kernel Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 04/11] ipv6: export a stub for IPv6 symbols used by vxlan Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 05/11] ipv6: export in6addr_loopback to modules Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 06/11] vxlan: add ipv6 support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 07/11] vxlan: respect disable_ipv6 sysctl Cong Wang 2013-05-17 13:07 ` Sergei Shtylyov 2013-05-17 0:21 ` [Patch net-next v8 08/11] vxlan: add ipv6 route short circuit support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 09/11] vxlan: add ipv6 proxy support Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 10/11] vxlan: respect scope_id for ll addr Cong Wang 2013-05-17 0:21 ` [Patch net-next v8 11/11] ipv6: Add generic UDP Tunnel segmentation Cong Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).