* [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices
@ 2015-06-01 14:27 Thomas Graf
2015-06-01 14:27 ` [net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic Thomas Graf
` (14 more replies)
0 siblings, 15 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
This is the first series in a greater effort to bring the scalability
and programmability advantages of OVS to the rest of the network
stack and to get rid of as much OVS specific code as possible.
This first series focuses on getting rid of OVS tunnel vports and use
regular tunnel net_devices instead. As part of this effort, the
routing subsystem is extended with support for flow based tunneling.
In this new tunneling mode, the route is able to match on tunnel
information as well as set tunnel encapsulation parameters per route.
This allows to perform L3 forwarding for a large number of tunnel
endpoints and virtual networks using a single tunnel net_device.
TODO:
- Geneve support
- IPv6 support
- Benchmarks
Pravin Shelar (1):
openvswitch: Use regular GRE net_device instead of vport
Thomas Graf (13):
ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
ip_tunnel: support per packet tunnel metadata
vxlan: Flow based tunneling
route: Extend flow representation with tunnel key
route: Per route tunnel metadata with RTA_TUNNEL
fib: Add fib rule match on tunnel id
vxlan: Factor out device configuration
openvswitch: Allocate & attach ip_tunnel_info for tunnel set action
openvswitch: Move dev pointer into vport itself
openvswitch: Abstract vport name through ovs_vport_name()
openvswitch: Use regular VXLAN net_device device
vxlan: remove indirect call to vxlan_rcv() and vni member
arp: Associate ARP requests with tunnel info
drivers/net/vxlan.c | 663 ++++++++++++++++++++---------------
include/linux/skbuff.h | 2 +
include/net/fib_rules.h | 1 +
include/net/flow.h | 7 +
include/net/ip_fib.h | 3 +
include/net/ip_tunnels.h | 127 ++++++-
include/net/route.h | 18 +
include/net/vxlan.h | 82 ++++-
include/uapi/linux/fib_rules.h | 2 +-
include/uapi/linux/if_link.h | 1 +
include/uapi/linux/openvswitch.h | 2 +-
include/uapi/linux/rtnetlink.h | 16 +
net/core/dev.c | 5 +-
net/core/fib_rules.c | 17 +-
net/core/skbuff.c | 8 +
net/ipv4/arp.c | 8 +
net/ipv4/fib_frontend.c | 57 +++
net/ipv4/fib_semantics.c | 45 +++
net/ipv4/ip_gre.c | 161 ++++++++-
net/ipv4/ip_tunnel_core.c | 15 +
net/ipv4/route.c | 32 +-
net/openvswitch/Kconfig | 12 -
net/openvswitch/Makefile | 2 -
net/openvswitch/actions.c | 10 +-
net/openvswitch/datapath.c | 19 +-
net/openvswitch/datapath.h | 5 +-
net/openvswitch/dp_notify.c | 5 +-
net/openvswitch/flow.c | 4 +-
net/openvswitch/flow.h | 77 +---
net/openvswitch/flow_netlink.c | 78 ++++-
net/openvswitch/flow_netlink.h | 3 +-
net/openvswitch/vport-geneve.c | 17 +-
net/openvswitch/vport-gre.c | 313 -----------------
net/openvswitch/vport-internal_dev.c | 38 +-
net/openvswitch/vport-netdev.c | 271 +++++++++++---
net/openvswitch/vport-netdev.h | 13 -
net/openvswitch/vport-vxlan.c | 322 -----------------
net/openvswitch/vport.c | 34 +-
net/openvswitch/vport.h | 21 +-
39 files changed, 1334 insertions(+), 1182 deletions(-)
delete mode 100644 net/openvswitch/vport-gre.c
delete mode 100644 net/openvswitch/vport-vxlan.c
--
2.3.5
^ permalink raw reply [flat|nested] 21+ messages in thread
* [net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata Thomas Graf
` (13 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
Rename the tunnel metadata data structures currently internal to
OVS and make them generic for use by all IP tunnels.
Both structures are kernel internal and will stay that way. Their
members are exposed to user space through individual Netlink
attributes by OVS. It will therefore be possible to extend/modify
these structures without affecting user ABI.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
include/net/ip_tunnels.h | 63 +++++++++++++++++++++++++++++++++
include/uapi/linux/openvswitch.h | 2 +-
net/openvswitch/actions.c | 2 +-
net/openvswitch/datapath.h | 5 +--
net/openvswitch/flow.c | 4 +--
net/openvswitch/flow.h | 76 ++--------------------------------------
net/openvswitch/flow_netlink.c | 16 ++++-----
net/openvswitch/flow_netlink.h | 2 +-
net/openvswitch/vport-geneve.c | 17 +++++----
net/openvswitch/vport-gre.c | 16 ++++-----
net/openvswitch/vport-vxlan.c | 18 +++++-----
net/openvswitch/vport.c | 30 ++++++++--------
net/openvswitch/vport.h | 12 +++----
13 files changed, 128 insertions(+), 135 deletions(-)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d8214cb..6b9d559 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -22,6 +22,28 @@
/* Keep error state on tunnel for 30 sec */
#define IPTUNNEL_ERR_TIMEO (30*HZ)
+/* Used to memset ip_tunnel padding. */
+#define IP_TUNNEL_KEY_SIZE \
+ (offsetof(struct ip_tunnel_key, tp_dst) + \
+ FIELD_SIZEOF(struct ip_tunnel_key, tp_dst))
+
+struct ip_tunnel_key {
+ __be64 tun_id;
+ __be32 ipv4_src;
+ __be32 ipv4_dst;
+ __be16 tun_flags;
+ __u8 ipv4_tos;
+ __u8 ipv4_ttl;
+ __be16 tp_src;
+ __be16 tp_dst;
+} __packed __aligned(4); /* Minimize padding. */
+
+struct ip_tunnel_info {
+ struct ip_tunnel_key key;
+ const void *options;
+ u8 options_len;
+};
+
/* 6rd prefix/relay information */
#ifdef CONFIG_IPV6_SIT_6RD
struct ip_tunnel_6rd_parm {
@@ -136,6 +158,47 @@ int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *op,
int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
unsigned int num);
+static inline void __ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+ __be32 saddr, __be32 daddr,
+ u8 tos, u8 ttl,
+ __be16 tp_src, __be16 tp_dst,
+ __be64 tun_id, __be16 tun_flags,
+ const void *opts, u8 opts_len)
+{
+ tun_info->key.tun_id = tun_id;
+ tun_info->key.ipv4_src = saddr;
+ tun_info->key.ipv4_dst = daddr;
+ tun_info->key.ipv4_tos = tos;
+ tun_info->key.ipv4_ttl = ttl;
+ tun_info->key.tun_flags = tun_flags;
+
+ /* For the tunnel types on the top of IPsec, the tp_src and tp_dst of
+ * the upper tunnel are used.
+ * E.g: GRE over IPSEC, the tp_src and tp_port are zero.
+ */
+ tun_info->key.tp_src = tp_src;
+ tun_info->key.tp_dst = tp_dst;
+
+ /* Clear struct padding. */
+ if (sizeof(tun_info->key) != IP_TUNNEL_KEY_SIZE)
+ memset((unsigned char *)&tun_info->key + IP_TUNNEL_KEY_SIZE,
+ 0, sizeof(tun_info->key) - IP_TUNNEL_KEY_SIZE);
+
+ tun_info->options = opts;
+ tun_info->options_len = opts_len;
+}
+
+static inline void ip_tunnel_info_init(struct ip_tunnel_info *tun_info,
+ const struct iphdr *iph,
+ __be16 tp_src, __be16 tp_dst,
+ __be64 tun_id, __be16 tun_flags,
+ const void *opts, u8 opts_len)
+{
+ __ip_tunnel_info_init(tun_info, iph->saddr, iph->daddr,
+ iph->tos, iph->ttl, tp_src, tp_dst,
+ tun_id, tun_flags, opts, opts_len);
+}
+
#ifdef CONFIG_INET
int ip_tunnel_init(struct net_device *dev);
diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index bbd49a0..fffe317 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -319,7 +319,7 @@ enum ovs_key_attr {
* the accepted length of the array. */
#ifdef __KERNEL__
- OVS_KEY_ATTR_TUNNEL_INFO, /* struct ovs_tunnel_info */
+ OVS_KEY_ATTR_TUNNEL_INFO, /* struct ip_tunnel_info */
#endif
__OVS_KEY_ATTR_MAX
};
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index b491c1c..34cad57 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -610,7 +610,7 @@ static void do_output(struct datapath *dp, struct sk_buff *skb, int out_port)
static int output_userspace(struct datapath *dp, struct sk_buff *skb,
struct sw_flow_key *key, const struct nlattr *attr)
{
- struct ovs_tunnel_info info;
+ struct ip_tunnel_info info;
struct dp_upcall_info upcall;
const struct nlattr *a;
int rem;
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 4ec4a48..b93fdc8 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -25,6 +25,7 @@
#include <linux/netdevice.h>
#include <linux/skbuff.h>
#include <linux/u64_stats_sync.h>
+#include <net/ip_tunnels.h>
#include "flow.h"
#include "flow_table.h"
@@ -98,7 +99,7 @@ struct datapath {
* when a packet is received by OVS.
*/
struct ovs_skb_cb {
- struct ovs_tunnel_info *egress_tun_info;
+ struct ip_tunnel_info *egress_tun_info;
struct vport *input_vport;
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)
@@ -114,7 +115,7 @@ struct ovs_skb_cb {
* @egress_tun_info: If nonnull, becomes %OVS_PACKET_ATTR_EGRESS_TUN_KEY.
*/
struct dp_upcall_info {
- const struct ovs_tunnel_info *egress_tun_info;
+ const struct ip_tunnel_info *egress_tun_info;
const struct nlattr *userdata;
u32 portid;
u8 cmd;
diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index bc7b0ab..8db22ef 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -682,12 +682,12 @@ int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key)
return key_extract(skb, key);
}
-int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
+int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
struct sk_buff *skb, struct sw_flow_key *key)
{
/* Extract metadata from packet. */
if (tun_info) {
- memcpy(&key->tun_key, &tun_info->tunnel, sizeof(key->tun_key));
+ memcpy(&key->tun_key, &tun_info->key, sizeof(key->tun_key));
if (tun_info->options) {
BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) *
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index a076e44..cadc6c5 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -32,31 +32,10 @@
#include <linux/time.h>
#include <linux/flex_array.h>
#include <net/inet_ecn.h>
+#include <net/ip_tunnels.h>
struct sk_buff;
-/* Used to memset ovs_key_ipv4_tunnel padding. */
-#define OVS_TUNNEL_KEY_SIZE \
- (offsetof(struct ovs_key_ipv4_tunnel, tp_dst) + \
- FIELD_SIZEOF(struct ovs_key_ipv4_tunnel, tp_dst))
-
-struct ovs_key_ipv4_tunnel {
- __be64 tun_id;
- __be32 ipv4_src;
- __be32 ipv4_dst;
- __be16 tun_flags;
- u8 ipv4_tos;
- u8 ipv4_ttl;
- __be16 tp_src;
- __be16 tp_dst;
-} __packed __aligned(4); /* Minimize padding. */
-
-struct ovs_tunnel_info {
- struct ovs_key_ipv4_tunnel tunnel;
- const void *options;
- u8 options_len;
-};
-
/* Store options at the end of the array if they are less than the
* maximum size. This allows us to get the benefits of variable length
* matching for small options.
@@ -66,55 +45,6 @@ struct ovs_tunnel_info {
#define TUN_METADATA_OPTS(flow_key, opt_len) \
((void *)((flow_key)->tun_opts + TUN_METADATA_OFFSET(opt_len)))
-static inline void __ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
- __be32 saddr, __be32 daddr,
- u8 tos, u8 ttl,
- __be16 tp_src,
- __be16 tp_dst,
- __be64 tun_id,
- __be16 tun_flags,
- const void *opts,
- u8 opts_len)
-{
- tun_info->tunnel.tun_id = tun_id;
- tun_info->tunnel.ipv4_src = saddr;
- tun_info->tunnel.ipv4_dst = daddr;
- tun_info->tunnel.ipv4_tos = tos;
- tun_info->tunnel.ipv4_ttl = ttl;
- tun_info->tunnel.tun_flags = tun_flags;
-
- /* For the tunnel types on the top of IPsec, the tp_src and tp_dst of
- * the upper tunnel are used.
- * E.g: GRE over IPSEC, the tp_src and tp_port are zero.
- */
- tun_info->tunnel.tp_src = tp_src;
- tun_info->tunnel.tp_dst = tp_dst;
-
- /* Clear struct padding. */
- if (sizeof(tun_info->tunnel) != OVS_TUNNEL_KEY_SIZE)
- memset((unsigned char *)&tun_info->tunnel + OVS_TUNNEL_KEY_SIZE,
- 0, sizeof(tun_info->tunnel) - OVS_TUNNEL_KEY_SIZE);
-
- tun_info->options = opts;
- tun_info->options_len = opts_len;
-}
-
-static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
- const struct iphdr *iph,
- __be16 tp_src,
- __be16 tp_dst,
- __be64 tun_id,
- __be16 tun_flags,
- const void *opts,
- u8 opts_len)
-{
- __ovs_flow_tun_info_init(tun_info, iph->saddr, iph->daddr,
- iph->tos, iph->ttl,
- tp_src, tp_dst,
- tun_id, tun_flags,
- opts, opts_len);
-}
-
#define OVS_SW_FLOW_KEY_METADATA_SIZE \
(offsetof(struct sw_flow_key, recirc_id) + \
FIELD_SIZEOF(struct sw_flow_key, recirc_id))
@@ -122,7 +52,7 @@ static inline void ovs_flow_tun_info_init(struct ovs_tunnel_info *tun_info,
struct sw_flow_key {
u8 tun_opts[255];
u8 tun_opts_len;
- struct ovs_key_ipv4_tunnel tun_key; /* Encapsulating tunnel key. */
+ struct ip_tunnel_key tun_key; /* Encapsulating tunnel key. */
struct {
u32 priority; /* Packet QoS priority. */
u32 skb_mark; /* SKB mark. */
@@ -273,7 +203,7 @@ void ovs_flow_stats_clear(struct sw_flow *);
u64 ovs_flow_used_time(unsigned long flow_jiffies);
int ovs_flow_key_update(struct sk_buff *skb, struct sw_flow_key *key);
-int ovs_flow_key_extract(const struct ovs_tunnel_info *tun_info,
+int ovs_flow_key_extract(const struct ip_tunnel_info *tun_info,
struct sk_buff *skb,
struct sw_flow_key *key);
/* Extract key from packet coming from userspace. */
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 624e41c..ecfa530 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -641,7 +641,7 @@ static int vxlan_opt_to_nlattr(struct sk_buff *skb,
}
static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
- const struct ovs_key_ipv4_tunnel *output,
+ const struct ip_tunnel_key *output,
const void *tun_opts, int swkey_tun_opts_len)
{
if (output->tun_flags & TUNNEL_KEY &&
@@ -689,7 +689,7 @@ static int __ipv4_tun_to_nlattr(struct sk_buff *skb,
}
static int ipv4_tun_to_nlattr(struct sk_buff *skb,
- const struct ovs_key_ipv4_tunnel *output,
+ const struct ip_tunnel_key *output,
const void *tun_opts, int swkey_tun_opts_len)
{
struct nlattr *nla;
@@ -708,9 +708,9 @@ static int ipv4_tun_to_nlattr(struct sk_buff *skb,
}
int ovs_nla_put_egress_tunnel_key(struct sk_buff *skb,
- const struct ovs_tunnel_info *egress_tun_info)
+ const struct ip_tunnel_info *egress_tun_info)
{
- return __ipv4_tun_to_nlattr(skb, &egress_tun_info->tunnel,
+ return __ipv4_tun_to_nlattr(skb, &egress_tun_info->key,
egress_tun_info->options,
egress_tun_info->options_len);
}
@@ -1746,7 +1746,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
{
struct sw_flow_match match;
struct sw_flow_key key;
- struct ovs_tunnel_info *tun_info;
+ struct ip_tunnel_info *tun_info;
struct nlattr *a;
int err = 0, start, opts_type;
@@ -1777,7 +1777,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
return PTR_ERR(a);
tun_info = nla_data(a);
- tun_info->tunnel = key.tun_key;
+ tun_info->key = key.tun_key;
tun_info->options_len = key.tun_opts_len;
if (tun_info->options_len) {
@@ -2227,13 +2227,13 @@ static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb)
switch (key_type) {
case OVS_KEY_ATTR_TUNNEL_INFO: {
- struct ovs_tunnel_info *tun_info = nla_data(ovs_key);
+ struct ip_tunnel_info *tun_info = nla_data(ovs_key);
start = nla_nest_start(skb, OVS_ACTION_ATTR_SET);
if (!start)
return -EMSGSIZE;
- err = ipv4_tun_to_nlattr(skb, &tun_info->tunnel,
+ err = ipv4_tun_to_nlattr(skb, &tun_info->key,
tun_info->options_len ?
tun_info->options : NULL,
tun_info->options_len);
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index 5c3d75b..ec53eb6 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -55,7 +55,7 @@ int ovs_nla_put_mask(const struct sw_flow *flow, struct sk_buff *skb);
int ovs_nla_get_match(struct sw_flow_match *, const struct nlattr *key,
const struct nlattr *mask, bool log);
int ovs_nla_put_egress_tunnel_key(struct sk_buff *,
- const struct ovs_tunnel_info *);
+ const struct ip_tunnel_info *);
bool ovs_nla_get_ufid(struct sw_flow_id *, const struct nlattr *, bool log);
int ovs_nla_get_identifier(struct sw_flow_id *sfid, const struct nlattr *ufid,
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 208c576..1da3a14 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -77,7 +77,7 @@ static void geneve_rcv(struct geneve_sock *gs, struct sk_buff *skb)
struct vport *vport = gs->rcv_data;
struct genevehdr *geneveh = geneve_hdr(skb);
int opts_len;
- struct ovs_tunnel_info tun_info;
+ struct ip_tunnel_info tun_info;
__be64 key;
__be16 flags;
@@ -90,10 +90,9 @@ static void geneve_rcv(struct geneve_sock *gs, struct sk_buff *skb)
key = vni_to_tunnel_id(geneveh->vni);
- ovs_flow_tun_info_init(&tun_info, ip_hdr(skb),
- udp_hdr(skb)->source, udp_hdr(skb)->dest,
- key, flags,
- geneveh->options, opts_len);
+ ip_tunnel_info_init(&tun_info, ip_hdr(skb),
+ udp_hdr(skb)->source, udp_hdr(skb)->dest,
+ key, flags, geneveh->options, opts_len);
ovs_vport_receive(vport, skb, &tun_info);
}
@@ -165,8 +164,8 @@ error:
static int geneve_tnl_send(struct vport *vport, struct sk_buff *skb)
{
- const struct ovs_key_ipv4_tunnel *tun_key;
- struct ovs_tunnel_info *tun_info;
+ const struct ip_tunnel_key *tun_key;
+ struct ip_tunnel_info *tun_info;
struct net *net = ovs_dp_get_net(vport->dp);
struct geneve_port *geneve_port = geneve_vport(vport);
__be16 dport = inet_sk(geneve_port->gs->sock->sk)->inet_sport;
@@ -183,7 +182,7 @@ static int geneve_tnl_send(struct vport *vport, struct sk_buff *skb)
goto error;
}
- tun_key = &tun_info->tunnel;
+ tun_key = &tun_info->key;
rt = ovs_tunnel_route_lookup(net, tun_key, skb->mark, &fl, IPPROTO_UDP);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
@@ -225,7 +224,7 @@ static const char *geneve_get_name(const struct vport *vport)
}
static int geneve_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
- struct ovs_tunnel_info *egress_tun_info)
+ struct ip_tunnel_info *egress_tun_info)
{
struct geneve_port *geneve_port = geneve_vport(vport);
struct net *net = ovs_dp_get_net(vport->dp);
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index f17ac96..b87656c 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -67,9 +67,9 @@ static struct sk_buff *__build_header(struct sk_buff *skb,
int tunnel_hlen)
{
struct tnl_ptk_info tpi;
- const struct ovs_key_ipv4_tunnel *tun_key;
+ const struct ip_tunnel_key *tun_key;
- tun_key = &OVS_CB(skb)->egress_tun_info->tunnel;
+ tun_key = &OVS_CB(skb)->egress_tun_info->key;
skb = gre_handle_offloads(skb, !!(tun_key->tun_flags & TUNNEL_CSUM));
if (IS_ERR(skb))
@@ -97,7 +97,7 @@ static __be64 key_to_tunnel_id(__be32 key, __be32 seq)
static int gre_rcv(struct sk_buff *skb,
const struct tnl_ptk_info *tpi)
{
- struct ovs_tunnel_info tun_info;
+ struct ip_tunnel_info tun_info;
struct ovs_net *ovs_net;
struct vport *vport;
__be64 key;
@@ -108,8 +108,8 @@ static int gre_rcv(struct sk_buff *skb,
return PACKET_REJECT;
key = key_to_tunnel_id(tpi->key, tpi->seq);
- ovs_flow_tun_info_init(&tun_info, ip_hdr(skb), 0, 0, key,
- filter_tnl_flags(tpi->flags), NULL, 0);
+ ip_tunnel_info_init(&tun_info, ip_hdr(skb), 0, 0, key,
+ filter_tnl_flags(tpi->flags), NULL, 0);
ovs_vport_receive(vport, skb, &tun_info);
return PACKET_RCVD;
@@ -134,7 +134,7 @@ static int gre_err(struct sk_buff *skb, u32 info,
static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
{
struct net *net = ovs_dp_get_net(vport->dp);
- const struct ovs_key_ipv4_tunnel *tun_key;
+ const struct ip_tunnel_key *tun_key;
struct flowi4 fl;
struct rtable *rt;
int min_headroom;
@@ -147,7 +147,7 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
goto err_free_skb;
}
- tun_key = &OVS_CB(skb)->egress_tun_info->tunnel;
+ tun_key = &OVS_CB(skb)->egress_tun_info->key;
rt = ovs_tunnel_route_lookup(net, tun_key, skb->mark, &fl, IPPROTO_GRE);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
@@ -277,7 +277,7 @@ static void gre_tnl_destroy(struct vport *vport)
}
static int gre_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
- struct ovs_tunnel_info *egress_tun_info)
+ struct ip_tunnel_info *egress_tun_info)
{
return ovs_tunnel_get_egress_info(egress_tun_info,
ovs_dp_get_net(vport->dp),
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 6d39766..6f7986f 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -64,7 +64,7 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
struct vxlan_metadata *md)
{
- struct ovs_tunnel_info tun_info;
+ struct ip_tunnel_info tun_info;
struct vxlan_port *vxlan_port;
struct vport *vport = vs->data;
struct iphdr *iph;
@@ -82,9 +82,9 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
/* Save outer tunnel values */
iph = ip_hdr(skb);
key = cpu_to_be64(ntohl(md->vni) >> 8);
- ovs_flow_tun_info_init(&tun_info, iph,
- udp_hdr(skb)->source, udp_hdr(skb)->dest,
- key, flags, &opts, sizeof(opts));
+ ip_tunnel_info_init(&tun_info, iph,
+ udp_hdr(skb)->source, udp_hdr(skb)->dest,
+ key, flags, &opts, sizeof(opts));
ovs_vport_receive(vport, skb, &tun_info);
}
@@ -205,13 +205,13 @@ error:
static int vxlan_ext_gbp(struct sk_buff *skb)
{
- const struct ovs_tunnel_info *tun_info;
+ const struct ip_tunnel_info *tun_info;
const struct ovs_vxlan_opts *opts;
tun_info = OVS_CB(skb)->egress_tun_info;
opts = tun_info->options;
- if (tun_info->tunnel.tun_flags & TUNNEL_VXLAN_OPT &&
+ if (tun_info->key.tun_flags & TUNNEL_VXLAN_OPT &&
tun_info->options_len >= sizeof(*opts))
return opts->gbp;
else
@@ -224,7 +224,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
struct vxlan_port *vxlan_port = vxlan_vport(vport);
struct sock *sk = vxlan_port->vs->sock->sk;
__be16 dst_port = inet_sk(sk)->inet_sport;
- const struct ovs_key_ipv4_tunnel *tun_key;
+ const struct ip_tunnel_key *tun_key;
struct vxlan_metadata md = {0};
struct rtable *rt;
struct flowi4 fl;
@@ -238,7 +238,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
goto error;
}
- tun_key = &OVS_CB(skb)->egress_tun_info->tunnel;
+ tun_key = &OVS_CB(skb)->egress_tun_info->key;
rt = ovs_tunnel_route_lookup(net, tun_key, skb->mark, &fl, IPPROTO_UDP);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
@@ -269,7 +269,7 @@ error:
}
static int vxlan_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
- struct ovs_tunnel_info *egress_tun_info)
+ struct ip_tunnel_info *egress_tun_info)
{
struct net *net = ovs_dp_get_net(vport->dp);
struct vxlan_port *vxlan_port = vxlan_vport(vport);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 067a3ff..af23ba0 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -469,7 +469,7 @@ u32 ovs_vport_find_upcall_portid(const struct vport *vport, struct sk_buff *skb)
* skb->data should point to the Ethernet header.
*/
void ovs_vport_receive(struct vport *vport, struct sk_buff *skb,
- const struct ovs_tunnel_info *tun_info)
+ const struct ip_tunnel_info *tun_info)
{
struct pcpu_sw_netstats *stats;
struct sw_flow_key key;
@@ -572,22 +572,22 @@ void ovs_vport_deferred_free(struct vport *vport)
}
EXPORT_SYMBOL_GPL(ovs_vport_deferred_free);
-int ovs_tunnel_get_egress_info(struct ovs_tunnel_info *egress_tun_info,
+int ovs_tunnel_get_egress_info(struct ip_tunnel_info *egress_tun_info,
struct net *net,
- const struct ovs_tunnel_info *tun_info,
+ const struct ip_tunnel_info *tun_info,
u8 ipproto,
u32 skb_mark,
__be16 tp_src,
__be16 tp_dst)
{
- const struct ovs_key_ipv4_tunnel *tun_key;
+ const struct ip_tunnel_key *tun_key;
struct rtable *rt;
struct flowi4 fl;
if (unlikely(!tun_info))
return -EINVAL;
- tun_key = &tun_info->tunnel;
+ tun_key = &tun_info->key;
/* Route lookup to get srouce IP address.
* The process may need to be changed if the corresponding process
@@ -602,22 +602,22 @@ int ovs_tunnel_get_egress_info(struct ovs_tunnel_info *egress_tun_info,
/* Generate egress_tun_info based on tun_info,
* saddr, tp_src and tp_dst
*/
- __ovs_flow_tun_info_init(egress_tun_info,
- fl.saddr, tun_key->ipv4_dst,
- tun_key->ipv4_tos,
- tun_key->ipv4_ttl,
- tp_src, tp_dst,
- tun_key->tun_id,
- tun_key->tun_flags,
- tun_info->options,
- tun_info->options_len);
+ __ip_tunnel_info_init(egress_tun_info,
+ fl.saddr, tun_key->ipv4_dst,
+ tun_key->ipv4_tos,
+ tun_key->ipv4_ttl,
+ tp_src, tp_dst,
+ tun_key->tun_id,
+ tun_key->tun_flags,
+ tun_info->options,
+ tun_info->options_len);
return 0;
}
EXPORT_SYMBOL_GPL(ovs_tunnel_get_egress_info);
int ovs_vport_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
- struct ovs_tunnel_info *info)
+ struct ip_tunnel_info *info)
{
/* get_egress_tun_info() is only implemented on tunnel ports. */
if (unlikely(!vport->ops->get_egress_tun_info))
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index bc85331..4750fb6 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -58,15 +58,15 @@ u32 ovs_vport_find_upcall_portid(const struct vport *, struct sk_buff *);
int ovs_vport_send(struct vport *, struct sk_buff *);
-int ovs_tunnel_get_egress_info(struct ovs_tunnel_info *egress_tun_info,
+int ovs_tunnel_get_egress_info(struct ip_tunnel_info *egress_tun_info,
struct net *net,
- const struct ovs_tunnel_info *tun_info,
+ const struct ip_tunnel_info *tun_info,
u8 ipproto,
u32 skb_mark,
__be16 tp_src,
__be16 tp_dst);
int ovs_vport_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
- struct ovs_tunnel_info *info);
+ struct ip_tunnel_info *info);
/* The following definitions are for implementers of vport devices: */
@@ -176,7 +176,7 @@ struct vport_ops {
int (*send)(struct vport *, struct sk_buff *);
int (*get_egress_tun_info)(struct vport *, struct sk_buff *,
- struct ovs_tunnel_info *);
+ struct ip_tunnel_info *);
struct module *owner;
struct list_head list;
@@ -226,7 +226,7 @@ static inline struct vport *vport_from_priv(void *priv)
}
void ovs_vport_receive(struct vport *, struct sk_buff *,
- const struct ovs_tunnel_info *);
+ const struct ip_tunnel_info *);
static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb,
const void *start, unsigned int len)
@@ -239,7 +239,7 @@ int ovs_vport_ops_register(struct vport_ops *ops);
void ovs_vport_ops_unregister(struct vport_ops *ops);
static inline struct rtable *ovs_tunnel_route_lookup(struct net *net,
- const struct ovs_key_ipv4_tunnel *key,
+ const struct ip_tunnel_key *key,
u32 mark,
struct flowi4 *fl,
u8 protocol)
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
2015-06-01 14:27 ` [net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 03/14] vxlan: Flow based tunneling Thomas Graf
` (12 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
This allows to attach an ip_tunnel_info metadata structure to skbs
via skb_shared_info to represent receive side tunnel information
as well as transmit side encapsulation instructions.
The new field is added to skb_shared_info as the field is typically
immutable after it has been attached. A new flag indicates whether
the metadata is meant for receive or transmit. This allows to keep
receive metadata attached to the skb all the way through the
forwarding path without mistaking it for transmit instructions. The
tun_info pointer is thus only released if a packet which has been
received on a tunnel is being forwarded to tunnel device again.
Since transmit instructions are immutable per flow which attaches
them to the skb, a reference count is introduced which allows to
reuse the metadata for many packets. Therefore, when a route later
on receives the capability to attach tunnel metadata, it will only
have to allocate the metadata once and can simply increment the
reference counter for each packet that uses that instruction set.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
include/linux/skbuff.h | 1 +
include/net/ip_tunnels.h | 45 +++++++++++++++++++++++++++++++++++++++++++++
net/core/skbuff.c | 8 ++++++++
net/ipv4/ip_tunnel_core.c | 15 +++++++++++++++
4 files changed, 69 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 6b41c15..83f9a59 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -323,6 +323,7 @@ struct skb_shared_info {
unsigned short gso_segs;
unsigned short gso_type;
struct sk_buff *frag_list;
+ struct ip_tunnel_info *tun_info;
struct skb_shared_hwtstamps hwtstamps;
u32 tskey;
__be32 ip6_frag_id;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 6b9d559..3968705 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -38,10 +38,20 @@ struct ip_tunnel_key {
__be16 tp_dst;
} __packed __aligned(4); /* Minimize padding. */
+/* Indicates whether the tunnel info structure represents receive
+ * or transmit tunnel parameters.
+ */
+enum {
+ IP_TUNNEL_INFO_RX,
+ IP_TUNNEL_INFO_TX,
+};
+
struct ip_tunnel_info {
struct ip_tunnel_key key;
const void *options;
+ atomic_t refcnt;
u8 options_len;
+ u8 mode;
};
/* 6rd prefix/relay information */
@@ -284,6 +294,41 @@ static inline void iptunnel_xmit_stats(int err,
}
}
+struct ip_tunnel_info *ip_tunnel_info_alloc(size_t optslen, gfp_t flags);
+
+static inline void ip_tunnel_info_get(struct ip_tunnel_info *info)
+{
+ atomic_inc(&info->refcnt);
+}
+
+static inline void ip_tunnel_info_put(struct ip_tunnel_info *info)
+{
+ if (!info)
+ return;
+
+ if (atomic_dec_and_test(&info->refcnt))
+ kfree(info);
+}
+
+static inline int skb_attach_tunnel_info(struct sk_buff *skb,
+ struct ip_tunnel_info *info)
+{
+ if (skb_unclone(skb, GFP_ATOMIC))
+ return -ENOMEM;
+
+ ip_tunnel_info_put(skb_shinfo(skb)->tun_info);
+ ip_tunnel_info_get(info);
+ skb_shinfo(skb)->tun_info = info;
+
+ return 0;
+}
+
+static inline void skb_release_tunnel_info(struct sk_buff *skb)
+{
+ ip_tunnel_info_put(skb_shinfo(skb)->tun_info);
+ skb_shinfo(skb)->tun_info = NULL;
+}
+
#endif /* CONFIG_INET */
#endif /* __NET_IP_TUNNELS_H */
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9bac0e6..dbbace2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -69,6 +69,7 @@
#include <net/sock.h>
#include <net/checksum.h>
#include <net/ip6_checksum.h>
+#include <net/ip_tunnels.h>
#include <net/xfrm.h>
#include <asm/uaccess.h>
@@ -594,6 +595,8 @@ static void skb_release_data(struct sk_buff *skb)
uarg->callback(uarg, true);
}
+ ip_tunnel_info_put(shinfo->tun_info);
+
if (shinfo->frag_list)
kfree_skb_list(shinfo->frag_list);
@@ -985,6 +988,11 @@ static void copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
skb_shinfo(new)->gso_size = skb_shinfo(old)->gso_size;
skb_shinfo(new)->gso_segs = skb_shinfo(old)->gso_segs;
skb_shinfo(new)->gso_type = skb_shinfo(old)->gso_type;
+
+ if (skb_shinfo(old)->tun_info) {
+ ip_tunnel_info_get(skb_shinfo(old)->tun_info);
+ skb_shinfo(new)->tun_info = skb_shinfo(old)->tun_info;
+ }
}
static inline int skb_alloc_rx_flag(const struct sk_buff *skb)
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 6a51a71..bbd4f91 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -190,3 +190,18 @@ struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev,
return tot;
}
EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64);
+
+struct ip_tunnel_info *ip_tunnel_info_alloc(size_t optslen, gfp_t flags)
+{
+ struct ip_tunnel_info *info;
+
+ info = kzalloc(sizeof(*info) + optslen, flags);
+ if (!info)
+ return NULL;
+
+ info->options_len = optslen;
+
+ return info;
+
+}
+EXPORT_SYMBOL_GPL(ip_tunnel_info_alloc);
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 03/14] vxlan: Flow based tunneling
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
2015-06-01 14:27 ` [net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic Thomas Graf
2015-06-01 14:27 ` [net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 04/14] route: Extend flow representation with tunnel key Thomas Graf
` (11 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
Allows putting a VXLAN device into a new flow-based mode in which it
will populate a tunnel info structure for each packet received. The
metadata structure will contain the outer header and tunnel header
fields which have been stripped off. Layers further up in the stack
such as routing, tc or netfitler can later match on these fields.
On the transmit side, it allows skbs to carry their own encapsulation
instructions thus allowing encapsulations parameters to be set per
flow/route.
This prepares the VXLAN device to be steered by the routing subsystem
which will allow to support encapsulation for a large number of tunnel
endpoints and tunnel ids through a single net_device which improves
the scalability of current VXLAN tunnels.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
drivers/net/vxlan.c | 147 ++++++++++++++++++++++++++++++++++++-------
include/linux/skbuff.h | 1 +
include/net/ip_tunnels.h | 8 +++
include/net/route.h | 8 +++
include/net/vxlan.h | 4 +-
include/uapi/linux/if_link.h | 1 +
6 files changed, 146 insertions(+), 23 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 34c519e..d5edba5 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1164,10 +1164,12 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh,
/* Callback from net/ipv4/udp.c to receive packets */
static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
{
+ struct ip_tunnel_info *tun_info = NULL;
struct vxlan_sock *vs;
struct vxlanhdr *vxh;
u32 flags, vni;
- struct vxlan_metadata md = {0};
+ struct vxlan_metadata _md;
+ struct vxlan_metadata *md = &_md;
/* Need Vxlan and inner Ethernet header to be present */
if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1202,6 +1204,33 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
vni &= VXLAN_VNI_MASK;
}
+ if (vs->flags & VXLAN_F_FLOW_BASED) {
+ const struct iphdr *iph = ip_hdr(skb);
+
+ /* TODO: Consider optimizing by looking up in flow cache */
+ tun_info = ip_tunnel_info_alloc(sizeof(*md), GFP_ATOMIC);
+ if (!tun_info)
+ goto drop;
+
+ tun_info->key.ipv4_src = iph->saddr;
+ tun_info->key.ipv4_dst = iph->daddr;
+ tun_info->key.ipv4_tos = iph->tos;
+ tun_info->key.ipv4_ttl = iph->ttl;
+ tun_info->key.tp_src = udp_hdr(skb)->source;
+ tun_info->key.tp_dst = udp_hdr(skb)->dest;
+
+ tun_info->mode = IP_TUNNEL_INFO_RX;
+ tun_info->key.tun_flags = TUNNEL_KEY;
+ tun_info->key.tun_id = cpu_to_be64(vni >> 8);
+ if (udp_hdr(skb)->check != 0)
+ tun_info->key.tun_flags |= TUNNEL_CSUM;
+
+ md = ip_tunnel_info_opts(tun_info, sizeof(*md));
+ skb_attach_tunnel_info(skb, tun_info);
+ } else {
+ memset(md, 0, sizeof(*md));
+ }
+
/* For backwards compatibility, only allow reserved fields to be
* used by VXLAN extensions if explicitly requested.
*/
@@ -1209,13 +1238,16 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
struct vxlanhdr_gbp *gbp;
gbp = (struct vxlanhdr_gbp *)vxh;
- md.gbp = ntohs(gbp->policy_id);
+ md->gbp = ntohs(gbp->policy_id);
+
+ if (tun_info)
+ tun_info->key.tun_flags |= TUNNEL_VXLAN_OPT;
if (gbp->dont_learn)
- md.gbp |= VXLAN_GBP_DONT_LEARN;
+ md->gbp |= VXLAN_GBP_DONT_LEARN;
if (gbp->policy_applied)
- md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+ md->gbp |= VXLAN_GBP_POLICY_APPLIED;
flags &= ~VXLAN_GBP_USED_BITS;
}
@@ -1233,8 +1265,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
goto bad_flags;
}
- md.vni = vxh->vx_vni;
- vs->rcv(vs, skb, &md);
+ md->vni = vxh->vx_vni;
+ vs->rcv(vs, skb, md);
return 0;
drop:
@@ -1254,6 +1286,7 @@ error:
static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
struct vxlan_metadata *md)
{
+ struct ip_tunnel_info *tun_info = skb_shinfo(skb)->tun_info;
struct iphdr *oip = NULL;
struct ipv6hdr *oip6 = NULL;
struct vxlan_dev *vxlan;
@@ -1263,7 +1296,12 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
int err = 0;
union vxlan_addr *remote_ip;
- vni = ntohl(md->vni) >> 8;
+ /* For flow based devices, map all packets to VNI 0 */
+ if (vs->flags & VXLAN_F_FLOW_BASED)
+ vni = 0;
+ else
+ vni = ntohl(md->vni) >> 8;
+
/* Is this VNI defined? */
vxlan = vxlan_vs_find_vni(vs, vni);
if (!vxlan)
@@ -1284,11 +1322,20 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
oip = ip_hdr(skb);
saddr.sin.sin_addr.s_addr = oip->saddr;
saddr.sa.sa_family = AF_INET;
+
+ if (tun_info) {
+ tun_info->key.ipv4_src = oip->saddr;
+ tun_info->key.ipv4_dst = oip->daddr;
+ tun_info->key.ipv4_tos = oip->tos;
+ tun_info->key.ipv4_ttl = oip->ttl;
+ }
#if IS_ENABLED(CONFIG_IPV6)
} else {
oip6 = ipv6_hdr(skb);
saddr.sin6.sin6_addr = oip6->saddr;
saddr.sa.sa_family = AF_INET6;
+
+ /* TODO : Fill IPv6 tunnel info */
#endif
}
@@ -1297,7 +1344,8 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
goto drop;
skb_reset_network_header(skb);
- skb->mark = md->gbp;
+ if (!(vs->flags & VXLAN_F_FLOW_BASED))
+ skb->mark = md->gbp;
if (oip6)
err = IP6_ECN_decapsulate(oip6, skb);
@@ -1875,25 +1923,44 @@ static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
}
}
+/* If called with rdst=NULL, use tun_info instructions */
static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
struct vxlan_rdst *rdst, bool did_rsc)
{
+ struct ip_tunnel_info *tun_info = skb_tunnel_info(skb);
struct vxlan_dev *vxlan = netdev_priv(dev);
struct sock *sk = vxlan->vn_sock->sock->sk;
struct rtable *rt = NULL;
const struct iphdr *old_iph;
struct flowi4 fl4;
union vxlan_addr *dst;
- struct vxlan_metadata md;
+ union vxlan_addr remote_ip;
+ struct vxlan_metadata _md;
+ struct vxlan_metadata *md = &_md;
__be16 src_port = 0, dst_port;
u32 vni;
__be16 df = 0;
__u8 tos, ttl;
int err;
+ u32 flags = vxlan->flags;
- dst_port = rdst->remote_port ? rdst->remote_port : vxlan->dst_port;
- vni = rdst->remote_vni;
- dst = &rdst->remote_ip;
+ if (rdst) {
+ dst_port = rdst->remote_port ? rdst->remote_port : vxlan->dst_port;
+ vni = rdst->remote_vni;
+ dst = &rdst->remote_ip;
+ } else {
+ if (!tun_info) {
+ WARN_ONCE(1, "%s: Packet transmission with tunnel information\n",
+ dev->name);
+ goto drop;
+ }
+
+ dst_port = tun_info->key.tp_dst ? : vxlan->dst_port;
+ vni = be64_to_cpu(tun_info->key.tun_id);
+ remote_ip.sin.sin_family = AF_INET;
+ remote_ip.sin.sin_addr.s_addr = tun_info->key.ipv4_dst;
+ dst = &remote_ip;
+ }
if (vxlan_addr_any(dst)) {
if (did_rsc) {
@@ -1918,8 +1985,25 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
vxlan->port_max, true);
if (dst->sa.sa_family == AF_INET) {
+ if (tun_info) {
+ if (tun_info->key.tun_flags & TUNNEL_DONT_FRAGMENT)
+ df = htons(IP_DF);
+ if (tun_info->key.tun_flags & TUNNEL_CSUM)
+ flags |= VXLAN_F_UDP_CSUM;
+ else
+ flags &= ~VXLAN_F_UDP_CSUM;
+
+ ttl = tun_info->key.ipv4_ttl;
+ tos = tun_info->key.ipv4_tos;
+
+ if (tun_info->options_len)
+ md = ip_tunnel_info_opts(tun_info, sizeof(*md));
+ } else {
+ md->gbp = skb->mark;
+ }
+
memset(&fl4, 0, sizeof(fl4));
- fl4.flowi4_oif = rdst->remote_ifindex;
+ fl4.flowi4_oif = rdst ? rdst->remote_ifindex : 0;
fl4.flowi4_tos = RT_TOS(tos);
fl4.flowi4_mark = skb->mark;
fl4.flowi4_proto = IPPROTO_UDP;
@@ -1958,14 +2042,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
- md.vni = htonl(vni << 8);
- md.gbp = skb->mark;
-
+ md->vni = htonl(vni << 8);
err = vxlan_xmit_skb(rt, sk, skb, fl4.saddr,
dst->sin.sin_addr.s_addr, tos, ttl, df,
- src_port, dst_port, &md,
+ src_port, dst_port, md,
!net_eq(vxlan->net, dev_net(vxlan->dev)),
- vxlan->flags);
+ flags);
if (err < 0) {
/* skb is already freed. */
skb = NULL;
@@ -1980,7 +2062,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
u32 flags;
memset(&fl6, 0, sizeof(fl6));
- fl6.flowi6_oif = rdst->remote_ifindex;
+ fl6.flowi6_oif = rdst ? rdst->remote_ifindex : 0;
fl6.daddr = dst->sin6.sin6_addr;
fl6.saddr = vxlan->saddr.sin6.sin6_addr;
fl6.flowi6_mark = skb->mark;
@@ -2018,11 +2100,11 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
}
ttl = ttl ? : ip6_dst_hoplimit(ndst);
- md.vni = htonl(vni << 8);
- md.gbp = skb->mark;
+ md->vni = htonl(vni << 8);
+ md->gbp = skb->mark;
err = vxlan6_xmit_skb(ndst, sk, skb, dev, &fl6.saddr, &fl6.daddr,
- 0, ttl, src_port, dst_port, &md,
+ 0, ttl, src_port, dst_port, md,
!net_eq(vxlan->net, dev_net(vxlan->dev)),
vxlan->flags);
#endif
@@ -2051,6 +2133,7 @@ tx_free:
static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
+ const struct ip_tunnel_info *tun_info = skb_tunnel_info(skb);
struct ethhdr *eth;
bool did_rsc = false;
struct vxlan_rdst *rdst, *fdst = NULL;
@@ -2078,6 +2161,13 @@ static netdev_tx_t vxlan_xmit(struct sk_buff *skb, struct net_device *dev)
#endif
}
+ if (vxlan->flags & VXLAN_F_FLOW_BASED &&
+ tun_info && tun_info->key.ipv4_dst &&
+ tun_info->mode == IP_TUNNEL_INFO_TX) {
+ vxlan_xmit_one(skb, dev, NULL, false);
+ return NETDEV_TX_OK;
+ }
+
f = vxlan_find_mac(vxlan, eth->h_dest);
did_rsc = false;
@@ -2373,6 +2463,12 @@ static void vxlan_setup(struct net_device *dev)
netif_keep_dst(dev);
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
+ /* If in flow based mode, keep the dst including encapsulation
+ * instructions for vxlan_xmit().
+ */
+ if (vxlan->flags & VXLAN_F_FLOW_BASED)
+ netif_keep_dst(dev);
+
INIT_LIST_HEAD(&vxlan->next);
spin_lock_init(&vxlan->hash_lock);
@@ -2405,6 +2501,7 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
[IFLA_VXLAN_RSC] = { .type = NLA_U8 },
[IFLA_VXLAN_L2MISS] = { .type = NLA_U8 },
[IFLA_VXLAN_L3MISS] = { .type = NLA_U8 },
+ [IFLA_VXLAN_FLOWBASED] = { .type = NLA_U8 },
[IFLA_VXLAN_PORT] = { .type = NLA_U16 },
[IFLA_VXLAN_UDP_CSUM] = { .type = NLA_U8 },
[IFLA_VXLAN_UDP_ZERO_CSUM6_TX] = { .type = NLA_U8 },
@@ -2681,6 +2778,9 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
if (data[IFLA_VXLAN_LIMIT])
vxlan->addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]);
+ if (data[IFLA_VXLAN_FLOWBASED] && nla_get_u8(data[IFLA_VXLAN_FLOWBASED]))
+ vxlan->flags |= VXLAN_F_FLOW_BASED;
+
if (data[IFLA_VXLAN_PORT_RANGE]) {
const struct ifla_vxlan_port_range *p
= nla_data(data[IFLA_VXLAN_PORT_RANGE]);
@@ -2777,6 +2877,7 @@ static size_t vxlan_get_size(const struct net_device *dev)
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_RSC */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_L2MISS */
nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_L3MISS */
+ nla_total_size(sizeof(__u8)) + /* IFLA_VXLAN_FLOWBASED */
nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_AGEING */
nla_total_size(sizeof(__u32)) + /* IFLA_VXLAN_LIMIT */
nla_total_size(sizeof(struct ifla_vxlan_port_range)) +
@@ -2843,6 +2944,8 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
!!(vxlan->flags & VXLAN_F_L2MISS)) ||
nla_put_u8(skb, IFLA_VXLAN_L3MISS,
!!(vxlan->flags & VXLAN_F_L3MISS)) ||
+ nla_put_u8(skb, IFLA_VXLAN_FLOWBASED,
+ !!(vxlan->flags & VXLAN_F_FLOW_BASED)) ||
nla_put_u32(skb, IFLA_VXLAN_AGEING, vxlan->age_interval) ||
nla_put_u32(skb, IFLA_VXLAN_LIMIT, vxlan->addrmax) ||
nla_put_be16(skb, IFLA_VXLAN_PORT, vxlan->dst_port) ||
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 83f9a59..6286b05 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3463,5 +3463,6 @@ static inline unsigned int skb_gso_network_seglen(const struct sk_buff *skb)
skb_network_header(skb);
return hdr_len + skb_gso_transport_seglen(skb);
}
+
#endif /* __KERNEL__ */
#endif /* _LINUX_SKBUFF_H */
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 3968705..8b76ba1 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -329,6 +329,14 @@ static inline void skb_release_tunnel_info(struct sk_buff *skb)
skb_shinfo(skb)->tun_info = NULL;
}
+static inline void *ip_tunnel_info_opts(struct ip_tunnel_info *info,
+ size_t expect_len)
+{
+ WARN_ON(info->options_len != expect_len);
+
+ return info + 1;
+}
+
#endif /* CONFIG_INET */
#endif /* __NET_IP_TUNNELS_H */
diff --git a/include/net/route.h b/include/net/route.h
index fe22d03..6ede321 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -315,4 +315,12 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
return hoplimit;
}
+static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
+{
+ if (skb_shinfo(skb)->tun_info)
+ return skb_shinfo(skb)->tun_info;
+
+ return NULL;
+}
+
#endif /* _ROUTE_H */
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 0082b5d..4e73df5 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -130,6 +130,7 @@ struct vxlan_sock {
#define VXLAN_F_REMCSUM_RX 0x400
#define VXLAN_F_GBP 0x800
#define VXLAN_F_REMCSUM_NOPARTIAL 0x1000
+#define VXLAN_F_FLOW_BASED 0x2000
/* Flags that are used in the receive path. These flags must match in
* order for a socket to be shareable
@@ -137,7 +138,8 @@ struct vxlan_sock {
#define VXLAN_F_RCV_FLAGS (VXLAN_F_GBP | \
VXLAN_F_UDP_ZERO_CSUM6_RX | \
VXLAN_F_REMCSUM_RX | \
- VXLAN_F_REMCSUM_NOPARTIAL)
+ VXLAN_F_REMCSUM_NOPARTIAL | \
+ VXLAN_F_FLOW_BASED)
struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
vxlan_rcv_t *rcv, void *data,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index afccc93..374df97 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -381,6 +381,7 @@ enum {
IFLA_VXLAN_REMCSUM_RX,
IFLA_VXLAN_GBP,
IFLA_VXLAN_REMCSUM_NOPARTIAL,
+ IFLA_VXLAN_FLOWBASED,
__IFLA_VXLAN_MAX
};
#define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 04/14] route: Extend flow representation with tunnel key
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (2 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 03/14] vxlan: Flow based tunneling Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL Thomas Graf
` (10 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
Add a new flowi_tunnel structure which is a subset of ip_tunnel_key
to allow routes to match on tunnel metadata. For now, the tunnel id
is added to flowi_tunnel which allows for routes to be bound to
specific virtual tunnels.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
include/net/flow.h | 7 +++++++
include/net/ip_tunnels.h | 10 ++++++++++
net/ipv4/route.c | 2 ++
3 files changed, 19 insertions(+)
diff --git a/include/net/flow.h b/include/net/flow.h
index 8109a15..c15fb5e 100644
--- a/include/net/flow.h
+++ b/include/net/flow.h
@@ -19,6 +19,10 @@
#define LOOPBACK_IFINDEX 1
+struct flowi_tunnel {
+ __be64 tun_id;
+};
+
struct flowi_common {
int flowic_oif;
int flowic_iif;
@@ -30,6 +34,7 @@ struct flowi_common {
#define FLOWI_FLAG_ANYSRC 0x01
#define FLOWI_FLAG_KNOWN_NH 0x02
__u32 flowic_secid;
+ struct flowi_tunnel flowic_tun_key;
};
union flowi_uli {
@@ -66,6 +71,7 @@ struct flowi4 {
#define flowi4_proto __fl_common.flowic_proto
#define flowi4_flags __fl_common.flowic_flags
#define flowi4_secid __fl_common.flowic_secid
+#define flowi4_tun_key __fl_common.flowic_tun_key
/* (saddr,daddr) must be grouped, same order as in IP header */
__be32 saddr;
@@ -165,6 +171,7 @@ struct flowi {
#define flowi_proto u.__fl_common.flowic_proto
#define flowi_flags u.__fl_common.flowic_flags
#define flowi_secid u.__fl_common.flowic_secid
+#define flowi_tun_key u.__fl_common.flowic_tun_key
} __attribute__((__aligned__(BITS_PER_LONG/8)));
static inline struct flowi *flowi4_to_flowi(struct flowi4 *fl4)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 8b76ba1..df8cfd3 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -12,6 +12,7 @@
#include <net/ip.h>
#include <net/netns/generic.h>
#include <net/rtnetlink.h>
+#include <net/flow.h>
#if IS_ENABLED(CONFIG_IPV6)
#include <net/ipv6.h>
@@ -337,6 +338,15 @@ static inline void *ip_tunnel_info_opts(struct ip_tunnel_info *info,
return info + 1;
}
+static inline void ip_tunnel_derive_key(struct sk_buff *skb,
+ struct flowi_tunnel *key)
+{
+ struct ip_tunnel_info *tun_info = skb_shinfo(skb)->tun_info;
+
+ if (tun_info && tun_info->mode == IP_TUNNEL_INFO_RX)
+ key->tun_id = tun_info->key.tun_id;
+}
+
#endif /* CONFIG_INET */
#endif /* __NET_IP_TUNNELS_H */
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f605598..6e8e1be 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -109,6 +109,7 @@
#include <linux/kmemleak.h>
#endif
#include <net/secure_seq.h>
+#include <net/ip_tunnels.h>
#define RT_FL_TOS(oldflp4) \
((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
@@ -1716,6 +1717,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
fl4.flowi4_scope = RT_SCOPE_UNIVERSE;
fl4.daddr = daddr;
fl4.saddr = saddr;
+ ip_tunnel_derive_key(skb, &fl4.flowi4_tun_key);
err = fib_lookup(net, &fl4, &res);
if (err != 0) {
if (!IN_DEV_FORWARD(in_dev))
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (3 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 04/14] route: Extend flow representation with tunnel key Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 16:51 ` Robert Shearman
2015-06-01 14:27 ` [net-next RFC 06/14] fib: Add fib rule match on tunnel id Thomas Graf
` (9 subsequent siblings)
14 siblings, 1 reply; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
Introduces a new Netlink attribute RTA_TUNNEL which allows routes
to set tunnel transmit metadata and specify the tunnel endpoint or
tunnel id on a per route basis. The route must point to a tunnel
device which understands per skb tunnel metadata and has been put
into the respective mode.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
include/net/ip_fib.h | 3 +++
include/net/ip_tunnels.h | 1 -
include/net/route.h | 10 ++++++++
include/uapi/linux/rtnetlink.h | 16 ++++++++++++
net/ipv4/fib_frontend.c | 57 ++++++++++++++++++++++++++++++++++++++++++
net/ipv4/fib_semantics.c | 45 +++++++++++++++++++++++++++++++++
net/ipv4/route.c | 30 +++++++++++++++++++++-
net/openvswitch/vport.h | 1 +
8 files changed, 161 insertions(+), 2 deletions(-)
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 54271ed..1cd7cf8 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -22,6 +22,7 @@
#include <net/fib_rules.h>
#include <net/inetpeer.h>
#include <linux/percpu.h>
+#include <net/ip_tunnels.h>
struct fib_config {
u8 fc_dst_len;
@@ -44,6 +45,7 @@ struct fib_config {
u32 fc_flow;
u32 fc_nlflags;
struct nl_info fc_nlinfo;
+ struct ip_tunnel_info fc_tunnel;
};
struct fib_info;
@@ -117,6 +119,7 @@ struct fib_info {
#ifdef CONFIG_IP_ROUTE_MULTIPATH
int fib_power;
#endif
+ struct ip_tunnel_info *fib_tunnel;
struct rcu_head rcu;
struct fib_nh fib_nh[0];
#define fib_dev fib_nh[0].nh_dev
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index df8cfd3..b4ab930 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -9,7 +9,6 @@
#include <net/dsfield.h>
#include <net/gro_cells.h>
#include <net/inet_ecn.h>
-#include <net/ip.h>
#include <net/netns/generic.h>
#include <net/rtnetlink.h>
#include <net/flow.h>
diff --git a/include/net/route.h b/include/net/route.h
index 6ede321..dbda603 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -28,6 +28,7 @@
#include <net/inetpeer.h>
#include <net/flow.h>
#include <net/inet_sock.h>
+#include <net/ip_tunnels.h>
#include <linux/in_route.h>
#include <linux/rtnetlink.h>
#include <linux/rcupdate.h>
@@ -66,6 +67,7 @@ struct rtable {
struct list_head rt_uncached;
struct uncached_list *rt_uncached_list;
+ struct ip_tunnel_info *rt_tun_info;
};
static inline bool rt_is_input_route(const struct rtable *rt)
@@ -198,6 +200,8 @@ struct in_ifaddr;
void fib_add_ifaddr(struct in_ifaddr *);
void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *);
+int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info);
+
static inline void ip_rt_put(struct rtable *rt)
{
/* dst_release() accepts a NULL parameter.
@@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
{
+ struct rtable *rt;
+
if (skb_shinfo(skb)->tun_info)
return skb_shinfo(skb)->tun_info;
+ rt = skb_rtable(skb);
+ if (rt)
+ return rt->rt_tun_info;
+
return NULL;
}
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 17fb02f..1f7aa68 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -286,6 +286,21 @@ enum rt_class_t {
/* Routing message attributes */
+enum rta_tunnel_t {
+ RTA_TUN_UNSPEC,
+ RTA_TUN_ID,
+ RTA_TUN_DST,
+ RTA_TUN_SRC,
+ RTA_TUN_TTL,
+ RTA_TUN_TOS,
+ RTA_TUN_SPORT,
+ RTA_TUN_DPORT,
+ RTA_TUN_FLAGS,
+ __RTA_TUN_MAX,
+};
+
+#define RTA_TUN_MAX (__RTA_TUN_MAX - 1)
+
enum rtattr_type_t {
RTA_UNSPEC,
RTA_DST,
@@ -308,6 +323,7 @@ enum rtattr_type_t {
RTA_VIA,
RTA_NEWDST,
RTA_PREF,
+ RTA_TUNNEL, /* destination VTEP */
__RTA_MAX
};
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 872494e..bfa77a6 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
return -EINVAL;
}
+static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = {
+ [RTA_TUN_ID] = { .type = NLA_U64 },
+ [RTA_TUN_DST] = { .type = NLA_U32 },
+ [RTA_TUN_SRC] = { .type = NLA_U32 },
+ [RTA_TUN_TTL] = { .type = NLA_U8 },
+ [RTA_TUN_TOS] = { .type = NLA_U8 },
+ [RTA_TUN_SPORT] = { .type = NLA_U16 },
+ [RTA_TUN_DPORT] = { .type = NLA_U16 },
+ [RTA_TUN_FLAGS] = { .type = NLA_U16 },
+};
+
+static int parse_rta_tunnel(struct fib_config *cfg, struct nlattr *attr)
+{
+ struct nlattr *tb[RTA_TUN_MAX+1];
+ int err;
+
+ err = nla_parse_nested(tb, RTA_TUN_MAX, attr, tunnel_policy);
+ if (err < 0)
+ return err;
+
+ if (tb[RTA_TUN_ID])
+ cfg->fc_tunnel.key.tun_id = nla_get_u64(tb[RTA_TUN_ID]);
+
+ if (tb[RTA_TUN_DST])
+ cfg->fc_tunnel.key.ipv4_dst = nla_get_be32(tb[RTA_TUN_DST]);
+
+ if (tb[RTA_TUN_SRC])
+ cfg->fc_tunnel.key.ipv4_src = nla_get_be32(tb[RTA_TUN_SRC]);
+
+ if (tb[RTA_TUN_TTL])
+ cfg->fc_tunnel.key.ipv4_ttl = nla_get_u8(tb[RTA_TUN_TTL]);
+
+ if (tb[RTA_TUN_TOS])
+ cfg->fc_tunnel.key.ipv4_tos = nla_get_u8(tb[RTA_TUN_TOS]);
+
+ if (tb[RTA_TUN_SPORT])
+ cfg->fc_tunnel.key.tp_src = nla_get_be16(tb[RTA_TUN_SPORT]);
+
+ if (tb[RTA_TUN_DPORT])
+ cfg->fc_tunnel.key.tp_dst = nla_get_be16(tb[RTA_TUN_DPORT]);
+
+ if (tb[RTA_TUN_FLAGS])
+ cfg->fc_tunnel.key.tun_flags = nla_get_u16(tb[RTA_TUN_FLAGS]);
+
+ cfg->fc_tunnel.mode = IP_TUNNEL_INFO_TX;
+ cfg->fc_tunnel.options = NULL;
+ cfg->fc_tunnel.options_len = 0;
+
+ return 0;
+}
+
const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_DST] = { .type = NLA_U32 },
[RTA_SRC] = { .type = NLA_U32 },
@@ -591,6 +642,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
[RTA_METRICS] = { .type = NLA_NESTED },
[RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
[RTA_FLOW] = { .type = NLA_U32 },
+ [RTA_TUNNEL] = { .type = NLA_NESTED },
};
static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
@@ -656,6 +708,11 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
case RTA_TABLE:
cfg->fc_table = nla_get_u32(attr);
break;
+ case RTA_TUNNEL:
+ err = parse_rta_tunnel(cfg, attr);
+ if (err < 0)
+ goto errout;
+ break;
}
}
diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 28ec3c1..1e94c81 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -215,6 +215,9 @@ static void free_fib_info_rcu(struct rcu_head *head)
if (fi->fib_metrics != (u32 *) dst_default_metrics)
kfree(fi->fib_metrics);
+
+ ip_tunnel_info_put(fi->fib_tunnel);
+
kfree(fi);
}
@@ -760,6 +763,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
struct fib_info *ofi;
int nhs = 1;
struct net *net = cfg->fc_nlinfo.nl_net;
+ struct ip_tunnel_info *tun_info = NULL;
if (cfg->fc_type > RTN_MAX)
goto err_inval;
@@ -856,6 +860,19 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
}
}
+ if (cfg->fc_tunnel.mode) {
+ /* TODO: Allow specification of options */
+ tun_info = ip_tunnel_info_alloc(0, GFP_KERNEL);
+ if (!tun_info) {
+ err = -ENOMEM;
+ goto failure;
+ }
+
+ memcpy(tun_info, &cfg->fc_tunnel, sizeof(*tun_info));
+ ip_tunnel_info_get(tun_info);
+ fi->fib_tunnel = tun_info;
+ }
+
if (cfg->fc_mp) {
#ifdef CONFIG_IP_ROUTE_MULTIPATH
err = fib_get_nhs(fi, cfg->fc_mp, cfg->fc_mp_len, cfg);
@@ -975,6 +992,8 @@ err_inval:
err = -EINVAL;
failure:
+ kfree(tun_info);
+
if (fi) {
fi->fib_dead = 1;
free_fib_info(fi);
@@ -983,6 +1002,29 @@ failure:
return ERR_PTR(err);
}
+int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info)
+{
+ struct nlattr *tun_attr;
+
+ tun_attr = nla_nest_start(skb, RTA_TUNNEL);
+ if (!tun_attr)
+ return -ENOMEM;
+
+ if (nla_put_u64(skb, RTA_TUN_ID, tun_info->key.tun_id) ||
+ nla_put_be32(skb, RTA_TUN_DST, tun_info->key.ipv4_dst) ||
+ nla_put_be32(skb, RTA_TUN_SRC, tun_info->key.ipv4_src) ||
+ nla_put_u8(skb, RTA_TUN_TOS, tun_info->key.ipv4_tos) ||
+ nla_put_u8(skb, RTA_TUN_TTL, tun_info->key.ipv4_ttl) ||
+ nla_put_u16(skb, RTA_TUN_SPORT, tun_info->key.tp_src) ||
+ nla_put_u16(skb, RTA_TUN_DPORT, tun_info->key.tp_dst) ||
+ nla_put_u16(skb, RTA_TUN_FLAGS, tun_info->key.tun_flags))
+ return -ENOMEM;
+
+ nla_nest_end(skb, tun_attr);
+
+ return 0;
+}
+
int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
u32 tb_id, u8 type, __be32 dst, int dst_len, u8 tos,
struct fib_info *fi, unsigned int flags)
@@ -1068,6 +1110,9 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
nla_nest_end(skb, mp);
}
#endif
+ if (fi->fib_tunnel && fib_dump_tun_info(skb, fi->fib_tunnel))
+ goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6e8e1be..f53c62f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1356,6 +1356,8 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
list_del(&rt->rt_uncached);
spin_unlock_bh(&ul->lock);
}
+
+ ip_tunnel_info_put(rt->rt_tun_info);
}
void rt_flush_dev(struct net_device *dev)
@@ -1489,6 +1491,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
rth->rt_gateway = 0;
rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
+ rth->rt_tun_info = NULL;
if (our) {
rth->dst.input= ip_local_deliver;
rth->rt_flags |= RTCF_LOCAL;
@@ -1543,6 +1546,7 @@ static int __mkroute_input(struct sk_buff *skb,
struct in_device *in_dev,
__be32 daddr, __be32 saddr, u32 tos)
{
+ struct fib_info *fi = res->fi;
struct fib_nh_exception *fnhe;
struct rtable *rth;
int err;
@@ -1590,7 +1594,7 @@ static int __mkroute_input(struct sk_buff *skb,
}
fnhe = find_exception(&FIB_RES_NH(*res), daddr);
- if (do_cache) {
+ if (do_cache && !(fi && fi->fib_tunnel)) {
if (fnhe)
rth = rcu_dereference(fnhe->fnhe_rth_input);
else
@@ -1621,6 +1625,13 @@ static int __mkroute_input(struct sk_buff *skb,
INIT_LIST_HEAD(&rth->rt_uncached);
RT_CACHE_STAT_INC(in_slow_tot);
+ if (fi && fi->fib_tunnel) {
+ ip_tunnel_info_get(fi->fib_tunnel);
+ rth->rt_tun_info = fi->fib_tunnel;
+ } else {
+ rth->rt_tun_info = NULL;
+ }
+
rth->dst.input = ip_forward;
rth->dst.output = ip_output;
@@ -1794,6 +1805,7 @@ local_input:
rth->rt_gateway = 0;
rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
+ rth->rt_tun_info = NULL;
RT_CACHE_STAT_INC(in_slow_tot);
if (res.type == RTN_UNREACHABLE) {
rth->dst.input= ip_error;
@@ -1940,6 +1952,11 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
fnhe = NULL;
do_cache &= fi != NULL;
+
+ /* Force dst for flows with tunnel encapsulation */
+ if (fi && fi->fib_tunnel)
+ goto add;
+
if (do_cache) {
struct rtable __rcu **prth;
struct fib_nh *nh = &FIB_RES_NH(*res);
@@ -1984,6 +2001,13 @@ add:
rth->rt_uses_gateway = 0;
INIT_LIST_HEAD(&rth->rt_uncached);
+ if (fi && fi->fib_tunnel) {
+ ip_tunnel_info_get(fi->fib_tunnel);
+ rth->rt_tun_info = fi->fib_tunnel;
+ } else {
+ rth->rt_tun_info = NULL;
+ }
+
RT_CACHE_STAT_INC(out_slow_tot);
if (flags & RTCF_LOCAL)
@@ -2263,6 +2287,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
rt->rt_uses_gateway = ort->rt_uses_gateway;
INIT_LIST_HEAD(&rt->rt_uncached);
+ rt->rt_tun_info = NULL;
dst_free(new);
}
@@ -2394,6 +2419,9 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, error) < 0)
goto nla_put_failure;
+ if (rt->rt_tun_info && fib_dump_tun_info(skb, rt->rt_tun_info))
+ goto nla_put_failure;
+
nlmsg_end(skb, nlh);
return 0;
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 4750fb6..75d6824 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -27,6 +27,7 @@
#include <linux/skbuff.h>
#include <linux/spinlock.h>
#include <linux/u64_stats_sync.h>
+#include <net/route.h>
#include "datapath.h"
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 06/14] fib: Add fib rule match on tunnel id
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (4 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 07/14] vxlan: Factor out device configuration Thomas Graf
` (8 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
This add the ability to select a routing table based on the tunnel
id which allows to maintain separate routing tables for each virtual
tunnel network.
ip rule add from all tunnel-id 100 lookup 100
ip rule add from all tunnel-id 200 lookup 200
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
include/net/fib_rules.h | 1 +
include/uapi/linux/fib_rules.h | 2 +-
net/core/fib_rules.c | 17 +++++++++++++++--
3 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/include/net/fib_rules.h b/include/net/fib_rules.h
index 6d67383..822ed1e 100644
--- a/include/net/fib_rules.h
+++ b/include/net/fib_rules.h
@@ -19,6 +19,7 @@ struct fib_rule {
u8 action;
/* 3 bytes hole, try to use */
u32 target;
+ __be64 tun_id;
struct fib_rule __rcu *ctarget;
struct net *fr_net;
diff --git a/include/uapi/linux/fib_rules.h b/include/uapi/linux/fib_rules.h
index 2b82d7e..96161b8 100644
--- a/include/uapi/linux/fib_rules.h
+++ b/include/uapi/linux/fib_rules.h
@@ -43,7 +43,7 @@ enum {
FRA_UNUSED5,
FRA_FWMARK, /* mark */
FRA_FLOW, /* flow/class id */
- FRA_UNUSED6,
+ FRA_TUN_ID,
FRA_SUPPRESS_IFGROUP,
FRA_SUPPRESS_PREFIXLEN,
FRA_TABLE, /* Extended table id */
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 9a12668..6da78c9 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -186,6 +186,9 @@ static int fib_rule_match(struct fib_rule *rule, struct fib_rules_ops *ops,
if ((rule->mark ^ fl->flowi_mark) & rule->mark_mask)
goto out;
+ if (rule->tun_id && (rule->tun_id != fl->flowi_tun_key.tun_id))
+ goto out;
+
ret = ops->match(rule, fl, flags);
out:
return (rule->flags & FIB_RULE_INVERT) ? !ret : ret;
@@ -330,6 +333,9 @@ static int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr* nlh)
if (tb[FRA_FWMASK])
rule->mark_mask = nla_get_u32(tb[FRA_FWMASK]);
+ if (tb[FRA_TUN_ID])
+ rule->tun_id = nla_get_be64(tb[FRA_TUN_ID]);
+
rule->action = frh->action;
rule->flags = frh->flags;
rule->table = frh_get_table(frh, tb);
@@ -473,6 +479,10 @@ static int fib_nl_delrule(struct sk_buff *skb, struct nlmsghdr* nlh)
(rule->mark_mask != nla_get_u32(tb[FRA_FWMASK])))
continue;
+ if (tb[FRA_TUN_ID] &&
+ (rule->tun_id != nla_get_be64(tb[FRA_TUN_ID])))
+ continue;
+
if (!ops->compare(rule, frh, tb))
continue;
@@ -535,7 +545,8 @@ static inline size_t fib_rule_nlmsg_size(struct fib_rules_ops *ops,
+ nla_total_size(4) /* FRA_SUPPRESS_PREFIXLEN */
+ nla_total_size(4) /* FRA_SUPPRESS_IFGROUP */
+ nla_total_size(4) /* FRA_FWMARK */
- + nla_total_size(4); /* FRA_FWMASK */
+ + nla_total_size(4) /* FRA_FWMASK */
+ + nla_total_size(8); /* FRA_TUN_ID */
if (ops->nlmsg_payload)
payload += ops->nlmsg_payload(rule);
@@ -591,7 +602,9 @@ static int fib_nl_fill_rule(struct sk_buff *skb, struct fib_rule *rule,
((rule->mark_mask || rule->mark) &&
nla_put_u32(skb, FRA_FWMASK, rule->mark_mask)) ||
(rule->target &&
- nla_put_u32(skb, FRA_GOTO, rule->target)))
+ nla_put_u32(skb, FRA_GOTO, rule->target)) ||
+ (rule->tun_id &&
+ nla_put_be64(skb, FRA_TUN_ID, rule->tun_id)))
goto nla_put_failure;
if (rule->suppress_ifgroup != -1) {
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 07/14] vxlan: Factor out device configuration
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (5 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 06/14] fib: Add fib rule match on tunnel id Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 08/14] openvswitch: Allocate & attach ip_tunnel_info for tunnel set action Thomas Graf
` (7 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
This factors out the device configuration out of the RTNL newlink
API which allows for in-kernel creation of VXLAN net_devices.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
drivers/net/vxlan.c | 332 ++++++++++++++++++++++++++++------------------------
include/net/vxlan.h | 59 ++++++++++
2 files changed, 236 insertions(+), 155 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d5edba5..3acab95 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -54,10 +54,6 @@
#define PORT_HASH_BITS 8
#define PORT_HASH_SIZE (1<<PORT_HASH_BITS)
-#define VNI_HASH_BITS 10
-#define VNI_HASH_SIZE (1<<VNI_HASH_BITS)
-#define FDB_HASH_BITS 8
-#define FDB_HASH_SIZE (1<<FDB_HASH_BITS)
#define FDB_AGE_DEFAULT 300 /* 5 min */
#define FDB_AGE_INTERVAL (10 * HZ) /* rescan interval */
@@ -74,6 +70,7 @@ module_param(log_ecn_error, bool, 0644);
MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
static int vxlan_net_id;
+static struct rtnl_link_ops vxlan_link_ops;
static const u8 all_zeros_mac[ETH_ALEN];
@@ -84,21 +81,6 @@ struct vxlan_net {
spinlock_t sock_lock;
};
-union vxlan_addr {
- struct sockaddr_in sin;
- struct sockaddr_in6 sin6;
- struct sockaddr sa;
-};
-
-struct vxlan_rdst {
- union vxlan_addr remote_ip;
- __be16 remote_port;
- u32 remote_vni;
- u32 remote_ifindex;
- struct list_head list;
- struct rcu_head rcu;
-};
-
/* Forwarding table entry */
struct vxlan_fdb {
struct hlist_node hlist; /* linked list of entries */
@@ -111,31 +93,6 @@ struct vxlan_fdb {
u8 eth_addr[ETH_ALEN];
};
-/* Pseudo network device */
-struct vxlan_dev {
- struct hlist_node hlist; /* vni hash table */
- struct list_head next; /* vxlan's per namespace list */
- struct vxlan_sock *vn_sock; /* listening socket */
- struct net_device *dev;
- struct net *net; /* netns for packet i/o */
- struct vxlan_rdst default_dst; /* default destination */
- union vxlan_addr saddr; /* source address */
- __be16 dst_port;
- __u16 port_min; /* source port range */
- __u16 port_max;
- __u8 tos; /* TOS override */
- __u8 ttl;
- u32 flags; /* VXLAN_F_* in vxlan.h */
-
- unsigned long age_interval;
- struct timer_list age_timer;
- spinlock_t hash_lock;
- unsigned int addrcnt;
- unsigned int addrmax;
-
- struct hlist_head fdb_head[FDB_HASH_SIZE];
-};
-
/* salt for hash table */
static u32 vxlan_salt __read_mostly;
static struct workqueue_struct *vxlan_wq;
@@ -345,7 +302,7 @@ static int vxlan_fdb_info(struct sk_buff *skb, struct vxlan_dev *vxlan,
if (send_ip && vxlan_nla_put_addr(skb, NDA_DST, &rdst->remote_ip))
goto nla_put_failure;
- if (rdst->remote_port && rdst->remote_port != vxlan->dst_port &&
+ if (rdst->remote_port && rdst->remote_port != vxlan->cfg.dst_port &&
nla_put_be16(skb, NDA_PORT, rdst->remote_port))
goto nla_put_failure;
if (rdst->remote_vni != vxlan->default_dst.remote_vni &&
@@ -749,7 +706,8 @@ static int vxlan_fdb_create(struct vxlan_dev *vxlan,
if (!(flags & NLM_F_CREATE))
return -ENOENT;
- if (vxlan->addrmax && vxlan->addrcnt >= vxlan->addrmax)
+ if (vxlan->cfg.addrmax &&
+ vxlan->addrcnt >= vxlan->cfg.addrmax)
return -ENOSPC;
/* Disallow replace to add a multicast entry */
@@ -835,7 +793,7 @@ static int vxlan_fdb_parse(struct nlattr *tb[], struct vxlan_dev *vxlan,
return -EINVAL;
*port = nla_get_be16(tb[NDA_PORT]);
} else {
- *port = vxlan->dst_port;
+ *port = vxlan->cfg.dst_port;
}
if (tb[NDA_VNI]) {
@@ -1021,7 +979,7 @@ static bool vxlan_snoop(struct net_device *dev,
vxlan_fdb_create(vxlan, src_mac, src_ip,
NUD_REACHABLE,
NLM_F_EXCL|NLM_F_CREATE,
- vxlan->dst_port,
+ vxlan->cfg.dst_port,
vxlan->default_dst.remote_vni,
0, NTF_SELF);
spin_unlock(&vxlan->hash_lock);
@@ -1945,7 +1903,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
u32 flags = vxlan->flags;
if (rdst) {
- dst_port = rdst->remote_port ? rdst->remote_port : vxlan->dst_port;
+ dst_port = rdst->remote_port ? rdst->remote_port : vxlan->cfg.dst_port;
vni = rdst->remote_vni;
dst = &rdst->remote_ip;
} else {
@@ -1955,7 +1913,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
goto drop;
}
- dst_port = tun_info->key.tp_dst ? : vxlan->dst_port;
+ dst_port = tun_info->key.tp_dst ? : vxlan->cfg.dst_port;
vni = be64_to_cpu(tun_info->key.tun_id);
remote_ip.sin.sin_family = AF_INET;
remote_ip.sin.sin_addr.s_addr = tun_info->key.ipv4_dst;
@@ -1973,16 +1931,16 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
old_iph = ip_hdr(skb);
- ttl = vxlan->ttl;
+ ttl = vxlan->cfg.ttl;
if (!ttl && vxlan_addr_multicast(dst))
ttl = 1;
- tos = vxlan->tos;
+ tos = vxlan->cfg.tos;
if (tos == 1)
tos = ip_tunnel_get_dsfield(old_iph, skb);
- src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->port_min,
- vxlan->port_max, true);
+ src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min,
+ vxlan->cfg.port_max, true);
if (dst->sa.sa_family == AF_INET) {
if (tun_info) {
@@ -2008,7 +1966,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
fl4.flowi4_mark = skb->mark;
fl4.flowi4_proto = IPPROTO_UDP;
fl4.daddr = dst->sin.sin_addr.s_addr;
- fl4.saddr = vxlan->saddr.sin.sin_addr.s_addr;
+ fl4.saddr = vxlan->cfg.saddr.sin.sin_addr.s_addr;
rt = ip_route_output_key(vxlan->net, &fl4);
if (IS_ERR(rt)) {
@@ -2064,7 +2022,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
memset(&fl6, 0, sizeof(fl6));
fl6.flowi6_oif = rdst ? rdst->remote_ifindex : 0;
fl6.daddr = dst->sin6.sin6_addr;
- fl6.saddr = vxlan->saddr.sin6.sin6_addr;
+ fl6.saddr = vxlan->cfg.saddr.sin6.sin6_addr;
fl6.flowi6_mark = skb->mark;
fl6.flowi6_proto = IPPROTO_UDP;
@@ -2233,7 +2191,7 @@ static void vxlan_cleanup(unsigned long arg)
if (f->state & NUD_PERMANENT)
continue;
- timeout = f->used + vxlan->age_interval * HZ;
+ timeout = f->used + vxlan->cfg.age_interval * HZ;
if (time_before_eq(timeout, jiffies)) {
netdev_dbg(vxlan->dev,
"garbage collect %pM\n",
@@ -2297,8 +2255,8 @@ static int vxlan_open(struct net_device *dev)
struct vxlan_sock *vs;
int ret = 0;
- vs = vxlan_sock_add(vxlan->net, vxlan->dst_port, vxlan_rcv, NULL,
- false, vxlan->flags);
+ vs = vxlan_sock_add(vxlan->net, vxlan->cfg.dst_port, vxlan_rcv,
+ NULL, vxlan->cfg.no_share, vxlan->flags);
if (IS_ERR(vs))
return PTR_ERR(vs);
@@ -2312,7 +2270,7 @@ static int vxlan_open(struct net_device *dev)
}
}
- if (vxlan->age_interval)
+ if (vxlan->cfg.age_interval)
mod_timer(&vxlan->age_timer, jiffies + FDB_AGE_INTERVAL);
return ret;
@@ -2476,7 +2434,7 @@ static void vxlan_setup(struct net_device *dev)
vxlan->age_timer.function = vxlan_cleanup;
vxlan->age_timer.data = (unsigned long) vxlan;
- vxlan->dst_port = htons(vxlan_port);
+ vxlan->cfg.dst_port = htons(vxlan_port);
vxlan->dev = dev;
@@ -2676,54 +2634,35 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
}
EXPORT_SYMBOL_GPL(vxlan_sock_add);
-static int vxlan_newlink(struct net *src_net, struct net_device *dev,
- struct nlattr *tb[], struct nlattr *data[])
+static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
+ struct vxlan_config *conf)
{
struct vxlan_net *vn = net_generic(src_net, vxlan_net_id);
struct vxlan_dev *vxlan = netdev_priv(dev);
struct vxlan_rdst *dst = &vxlan->default_dst;
- __u32 vni;
int err;
bool use_ipv6 = false;
-
- if (!data[IFLA_VXLAN_ID])
- return -EINVAL;
+ __be16 default_port = vxlan->cfg.dst_port;
vxlan->net = src_net;
- vni = nla_get_u32(data[IFLA_VXLAN_ID]);
- dst->remote_vni = vni;
+ dst->remote_vni = conf->vni;
- /* Unless IPv6 is explicitly requested, assume IPv4 */
- dst->remote_ip.sa.sa_family = AF_INET;
- if (data[IFLA_VXLAN_GROUP]) {
- dst->remote_ip.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_GROUP]);
- } else if (data[IFLA_VXLAN_GROUP6]) {
- if (!IS_ENABLED(CONFIG_IPV6))
- return -EPFNOSUPPORT;
-
- dst->remote_ip.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]);
- dst->remote_ip.sa.sa_family = AF_INET6;
- use_ipv6 = true;
- }
+ memcpy(&dst->remote_ip, &conf->remote_ip, sizeof(conf->remote_ip));
- if (data[IFLA_VXLAN_LOCAL]) {
- vxlan->saddr.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_LOCAL]);
- vxlan->saddr.sa.sa_family = AF_INET;
- } else if (data[IFLA_VXLAN_LOCAL6]) {
- if (!IS_ENABLED(CONFIG_IPV6))
- return -EPFNOSUPPORT;
+ /* Unless IPv6 is explicitly requested, assume IPv4 */
+ if (!dst->remote_ip.sa.sa_family)
+ dst->remote_ip.sa.sa_family = AF_INET;
- /* TODO: respect scope id */
- vxlan->saddr.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]);
- vxlan->saddr.sa.sa_family = AF_INET6;
+ if (dst->remote_ip.sa.sa_family == AF_INET6 ||
+ vxlan->cfg.saddr.sa.sa_family == AF_INET6)
use_ipv6 = true;
- }
- if (data[IFLA_VXLAN_LINK] &&
- (dst->remote_ifindex = nla_get_u32(data[IFLA_VXLAN_LINK]))) {
+ if (conf->remote_ifindex) {
struct net_device *lowerdev
- = __dev_get_by_index(src_net, dst->remote_ifindex);
+ = __dev_get_by_index(src_net, conf->remote_ifindex);
+
+ dst->remote_ifindex = conf->remote_ifindex;
if (!lowerdev) {
pr_info("ifindex %d does not exist\n", dst->remote_ifindex);
@@ -2741,7 +2680,7 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
}
#endif
- if (!tb[IFLA_MTU])
+ if (!conf->mtu)
dev->mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM);
dev->needed_headroom = lowerdev->hard_header_len +
@@ -2749,104 +2688,187 @@ static int vxlan_newlink(struct net *src_net, struct net_device *dev,
} else if (use_ipv6)
vxlan->flags |= VXLAN_F_IPV6;
+ memcpy(&vxlan->cfg, conf, sizeof(*conf));
+ if (!vxlan->cfg.dst_port)
+ vxlan->cfg.dst_port = default_port;
+ vxlan->flags = conf->flags;
+
+ if (!vxlan->cfg.age_interval)
+ vxlan->cfg.age_interval = FDB_AGE_DEFAULT;
+
+ if (vxlan_find_vni(src_net, conf->vni, use_ipv6 ? AF_INET6 : AF_INET,
+ vxlan->cfg.dst_port, vxlan->flags))
+ return -EEXIST;
+
+ dev->ethtool_ops = &vxlan_ethtool_ops;
+
+ /* create an fdb entry for a valid default destination */
+ if (!vxlan_addr_any(&vxlan->default_dst.remote_ip)) {
+ err = vxlan_fdb_create(vxlan, all_zeros_mac,
+ &vxlan->default_dst.remote_ip,
+ NUD_REACHABLE|NUD_PERMANENT,
+ NLM_F_EXCL|NLM_F_CREATE,
+ vxlan->cfg.dst_port,
+ vxlan->default_dst.remote_vni,
+ vxlan->default_dst.remote_ifindex,
+ NTF_SELF);
+ if (err)
+ return err;
+ }
+
+ err = register_netdevice(dev);
+ if (err) {
+ vxlan_fdb_delete_default(vxlan);
+ return err;
+ }
+
+ list_add(&vxlan->next, &vn->vxlan_list);
+
+ return 0;
+}
+
+struct net_device *vxlan_dev_create(struct net *net, const char *name,
+ u8 name_assign_type, struct vxlan_config *conf)
+{
+ struct nlattr *tb[IFLA_MAX+1];
+ struct net_device *dev;
+ int err;
+
+ memset(&tb, 0, sizeof(tb));
+
+ dev = rtnl_create_link(net, name, name_assign_type,
+ &vxlan_link_ops, tb);
+ if (IS_ERR(dev))
+ return dev;
+
+ err = vxlan_dev_configure(net, dev, conf);
+ if (err < 0) {
+ free_netdev(dev);
+ return ERR_PTR(err);
+ }
+
+ return dev;
+}
+EXPORT_SYMBOL_GPL(vxlan_dev_create);
+
+static int vxlan_newlink(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[])
+{
+ struct vxlan_config conf;
+ int err;
+
+ if (!data[IFLA_VXLAN_ID])
+ return -EINVAL;
+
+ memset(&conf, 0, sizeof(conf));
+ conf.vni = nla_get_u32(data[IFLA_VXLAN_ID]);
+
+ if (data[IFLA_VXLAN_GROUP]) {
+ conf.remote_ip.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_GROUP]);
+ } else if (data[IFLA_VXLAN_GROUP6]) {
+ if (!IS_ENABLED(CONFIG_IPV6))
+ return -EPFNOSUPPORT;
+
+ conf.remote_ip.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_GROUP6]);
+ conf.remote_ip.sa.sa_family = AF_INET6;
+ }
+
+ if (data[IFLA_VXLAN_LOCAL]) {
+ conf.saddr.sin.sin_addr.s_addr = nla_get_in_addr(data[IFLA_VXLAN_LOCAL]);
+ conf.saddr.sa.sa_family = AF_INET;
+ } else if (data[IFLA_VXLAN_LOCAL6]) {
+ if (!IS_ENABLED(CONFIG_IPV6))
+ return -EPFNOSUPPORT;
+
+ /* TODO: respect scope id */
+ conf.saddr.sin6.sin6_addr = nla_get_in6_addr(data[IFLA_VXLAN_LOCAL6]);
+ conf.saddr.sa.sa_family = AF_INET6;
+ }
+
+ if (data[IFLA_VXLAN_LINK])
+ conf.remote_ifindex = nla_get_u32(data[IFLA_VXLAN_LINK]);
+
if (data[IFLA_VXLAN_TOS])
- vxlan->tos = nla_get_u8(data[IFLA_VXLAN_TOS]);
+ conf.tos = nla_get_u8(data[IFLA_VXLAN_TOS]);
if (data[IFLA_VXLAN_TTL])
- vxlan->ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
+ conf.ttl = nla_get_u8(data[IFLA_VXLAN_TTL]);
if (!data[IFLA_VXLAN_LEARNING] || nla_get_u8(data[IFLA_VXLAN_LEARNING]))
- vxlan->flags |= VXLAN_F_LEARN;
+ conf.flags |= VXLAN_F_LEARN;
if (data[IFLA_VXLAN_AGEING])
- vxlan->age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
- else
- vxlan->age_interval = FDB_AGE_DEFAULT;
+ conf.age_interval = nla_get_u32(data[IFLA_VXLAN_AGEING]);
if (data[IFLA_VXLAN_PROXY] && nla_get_u8(data[IFLA_VXLAN_PROXY]))
- vxlan->flags |= VXLAN_F_PROXY;
+ conf.flags |= VXLAN_F_PROXY;
if (data[IFLA_VXLAN_RSC] && nla_get_u8(data[IFLA_VXLAN_RSC]))
- vxlan->flags |= VXLAN_F_RSC;
+ conf.flags |= VXLAN_F_RSC;
if (data[IFLA_VXLAN_L2MISS] && nla_get_u8(data[IFLA_VXLAN_L2MISS]))
- vxlan->flags |= VXLAN_F_L2MISS;
+ conf.flags |= VXLAN_F_L2MISS;
if (data[IFLA_VXLAN_L3MISS] && nla_get_u8(data[IFLA_VXLAN_L3MISS]))
- vxlan->flags |= VXLAN_F_L3MISS;
+ conf.flags |= VXLAN_F_L3MISS;
if (data[IFLA_VXLAN_LIMIT])
- vxlan->addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]);
+ conf.addrmax = nla_get_u32(data[IFLA_VXLAN_LIMIT]);
if (data[IFLA_VXLAN_FLOWBASED] && nla_get_u8(data[IFLA_VXLAN_FLOWBASED]))
- vxlan->flags |= VXLAN_F_FLOW_BASED;
+ conf.flags |= VXLAN_F_FLOW_BASED;
if (data[IFLA_VXLAN_PORT_RANGE]) {
const struct ifla_vxlan_port_range *p
= nla_data(data[IFLA_VXLAN_PORT_RANGE]);
- vxlan->port_min = ntohs(p->low);
- vxlan->port_max = ntohs(p->high);
+ conf.port_min = ntohs(p->low);
+ conf.port_max = ntohs(p->high);
}
if (data[IFLA_VXLAN_PORT])
- vxlan->dst_port = nla_get_be16(data[IFLA_VXLAN_PORT]);
+ conf.dst_port = nla_get_be16(data[IFLA_VXLAN_PORT]);
if (data[IFLA_VXLAN_UDP_CSUM] && nla_get_u8(data[IFLA_VXLAN_UDP_CSUM]))
- vxlan->flags |= VXLAN_F_UDP_CSUM;
+ conf.flags |= VXLAN_F_UDP_CSUM;
if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX] &&
nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]))
- vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_TX;
+ conf.flags |= VXLAN_F_UDP_ZERO_CSUM6_TX;
if (data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX] &&
nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
- vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
+ conf.flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
if (data[IFLA_VXLAN_REMCSUM_TX] &&
nla_get_u8(data[IFLA_VXLAN_REMCSUM_TX]))
- vxlan->flags |= VXLAN_F_REMCSUM_TX;
+ conf.flags |= VXLAN_F_REMCSUM_TX;
if (data[IFLA_VXLAN_REMCSUM_RX] &&
nla_get_u8(data[IFLA_VXLAN_REMCSUM_RX]))
- vxlan->flags |= VXLAN_F_REMCSUM_RX;
+ conf.flags |= VXLAN_F_REMCSUM_RX;
if (data[IFLA_VXLAN_GBP])
- vxlan->flags |= VXLAN_F_GBP;
+ conf.flags |= VXLAN_F_GBP;
if (data[IFLA_VXLAN_REMCSUM_NOPARTIAL])
- vxlan->flags |= VXLAN_F_REMCSUM_NOPARTIAL;
+ conf.flags |= VXLAN_F_REMCSUM_NOPARTIAL;
- if (vxlan_find_vni(src_net, vni, use_ipv6 ? AF_INET6 : AF_INET,
- vxlan->dst_port, vxlan->flags)) {
- pr_info("duplicate VNI %u\n", vni);
- return -EEXIST;
- }
-
- dev->ethtool_ops = &vxlan_ethtool_ops;
+ err = vxlan_dev_configure(src_net, dev, &conf);
+ switch (err) {
+ case -ENODEV:
+ pr_info("ifindex %d does not exist\n", conf.remote_ifindex);
+ break;
- /* create an fdb entry for a valid default destination */
- if (!vxlan_addr_any(&vxlan->default_dst.remote_ip)) {
- err = vxlan_fdb_create(vxlan, all_zeros_mac,
- &vxlan->default_dst.remote_ip,
- NUD_REACHABLE|NUD_PERMANENT,
- NLM_F_EXCL|NLM_F_CREATE,
- vxlan->dst_port,
- vxlan->default_dst.remote_vni,
- vxlan->default_dst.remote_ifindex,
- NTF_SELF);
- if (err)
- return err;
- }
+ case -EPERM:
+ pr_info("IPv6 is disabled via sysctl\n");
+ break;
- err = register_netdevice(dev);
- if (err) {
- vxlan_fdb_delete_default(vxlan);
- return err;
+ case -EEXIST:
+ pr_info("duplicate VNI %u\n", conf.vni);
+ break;
}
- list_add(&vxlan->next, &vn->vxlan_list);
-
- return 0;
+ return err;
}
static void vxlan_dellink(struct net_device *dev, struct list_head *head)
@@ -2895,8 +2917,8 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
const struct vxlan_dev *vxlan = netdev_priv(dev);
const struct vxlan_rdst *dst = &vxlan->default_dst;
struct ifla_vxlan_port_range ports = {
- .low = htons(vxlan->port_min),
- .high = htons(vxlan->port_max),
+ .low = htons(vxlan->cfg.port_min),
+ .high = htons(vxlan->cfg.port_max),
};
if (nla_put_u32(skb, IFLA_VXLAN_ID, dst->remote_vni))
@@ -2919,22 +2941,22 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
if (dst->remote_ifindex && nla_put_u32(skb, IFLA_VXLAN_LINK, dst->remote_ifindex))
goto nla_put_failure;
- if (!vxlan_addr_any(&vxlan->saddr)) {
- if (vxlan->saddr.sa.sa_family == AF_INET) {
+ if (!vxlan_addr_any(&vxlan->cfg.saddr)) {
+ if (vxlan->cfg.saddr.sa.sa_family == AF_INET) {
if (nla_put_in_addr(skb, IFLA_VXLAN_LOCAL,
- vxlan->saddr.sin.sin_addr.s_addr))
+ vxlan->cfg.saddr.sin.sin_addr.s_addr))
goto nla_put_failure;
#if IS_ENABLED(CONFIG_IPV6)
} else {
if (nla_put_in6_addr(skb, IFLA_VXLAN_LOCAL6,
- &vxlan->saddr.sin6.sin6_addr))
+ &vxlan->cfg.saddr.sin6.sin6_addr))
goto nla_put_failure;
#endif
}
}
- if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->ttl) ||
- nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->tos) ||
+ if (nla_put_u8(skb, IFLA_VXLAN_TTL, vxlan->cfg.ttl) ||
+ nla_put_u8(skb, IFLA_VXLAN_TOS, vxlan->cfg.tos) ||
nla_put_u8(skb, IFLA_VXLAN_LEARNING,
!!(vxlan->flags & VXLAN_F_LEARN)) ||
nla_put_u8(skb, IFLA_VXLAN_PROXY,
@@ -2946,9 +2968,9 @@ static int vxlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
!!(vxlan->flags & VXLAN_F_L3MISS)) ||
nla_put_u8(skb, IFLA_VXLAN_FLOWBASED,
!!(vxlan->flags & VXLAN_F_FLOW_BASED)) ||
- nla_put_u32(skb, IFLA_VXLAN_AGEING, vxlan->age_interval) ||
- nla_put_u32(skb, IFLA_VXLAN_LIMIT, vxlan->addrmax) ||
- nla_put_be16(skb, IFLA_VXLAN_PORT, vxlan->dst_port) ||
+ nla_put_u32(skb, IFLA_VXLAN_AGEING, vxlan->cfg.age_interval) ||
+ nla_put_u32(skb, IFLA_VXLAN_LIMIT, vxlan->cfg.addrmax) ||
+ nla_put_be16(skb, IFLA_VXLAN_PORT, vxlan->cfg.dst_port) ||
nla_put_u8(skb, IFLA_VXLAN_UDP_CSUM,
!!(vxlan->flags & VXLAN_F_UDP_CSUM)) ||
nla_put_u8(skb, IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 4e73df5..c037b27 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -94,6 +94,11 @@ struct vxlanhdr {
#define VXLAN_VNI_MASK (VXLAN_VID_MASK << 8)
#define VXLAN_HLEN (sizeof(struct udphdr) + sizeof(struct vxlanhdr))
+#define VNI_HASH_BITS 10
+#define VNI_HASH_SIZE (1<<VNI_HASH_BITS)
+#define FDB_HASH_BITS 8
+#define FDB_HASH_SIZE (1<<FDB_HASH_BITS)
+
struct vxlan_metadata {
__be32 vni;
u32 gbp;
@@ -117,6 +122,57 @@ struct vxlan_sock {
u32 flags;
};
+union vxlan_addr {
+ struct sockaddr_in sin;
+ struct sockaddr_in6 sin6;
+ struct sockaddr sa;
+};
+
+struct vxlan_rdst {
+ union vxlan_addr remote_ip;
+ __be16 remote_port;
+ u32 remote_vni;
+ u32 remote_ifindex;
+ struct list_head list;
+ struct rcu_head rcu;
+};
+
+struct vxlan_config {
+ union vxlan_addr remote_ip;
+ union vxlan_addr saddr;
+ u32 vni;
+ int remote_ifindex;
+ int mtu;
+ __be16 dst_port;
+ __u16 port_min;
+ __u16 port_max;
+ __u8 tos;
+ __u8 ttl;
+ u32 flags;
+ unsigned long age_interval;
+ unsigned int addrmax;
+ bool no_share;
+};
+
+/* Pseudo network device */
+struct vxlan_dev {
+ struct hlist_node hlist; /* vni hash table */
+ struct list_head next; /* vxlan's per namespace list */
+ struct vxlan_sock *vn_sock; /* listening socket */
+ struct net_device *dev;
+ struct net *net; /* netns for packet i/o */
+ struct vxlan_rdst default_dst; /* default destination */
+ u32 flags; /* VXLAN_F_* in vxlan.h */
+
+ struct timer_list age_timer;
+ spinlock_t hash_lock;
+ unsigned int addrcnt;
+
+ struct vxlan_config cfg;
+
+ struct hlist_head fdb_head[FDB_HASH_SIZE];
+};
+
#define VXLAN_F_LEARN 0x01
#define VXLAN_F_PROXY 0x02
#define VXLAN_F_RSC 0x04
@@ -145,6 +201,9 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
vxlan_rcv_t *rcv, void *data,
bool no_share, u32 flags);
+struct net_device *vxlan_dev_create(struct net *net, const char *name,
+ u8 name_assign_type, struct vxlan_config *conf);
+
void vxlan_sock_release(struct vxlan_sock *vs);
int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 08/14] openvswitch: Allocate & attach ip_tunnel_info for tunnel set action
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (6 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 07/14] vxlan: Factor out device configuration Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-03 15:29 ` Jiri Benc
2015-06-01 14:27 ` [net-next RFC 09/14] openvswitch: Move dev pointer into vport itself Thomas Graf
` (6 subsequent siblings)
14 siblings, 1 reply; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
Make use of the new skb tunnel metadata field by allocating a
ip_tunnel_info per OVS tunnel set action and then attaching that
metadata to each skb that passes the set action.
The old egress_tun_info via the OVS_CB() is left in place until
all tunnel vports have been converted to the new method.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
net/openvswitch/actions.c | 8 +++++-
net/openvswitch/datapath.c | 8 +++---
net/openvswitch/flow.h | 5 ++++
net/openvswitch/flow_netlink.c | 59 +++++++++++++++++++++++++++++++++++++-----
net/openvswitch/flow_netlink.h | 1 +
5 files changed, 69 insertions(+), 12 deletions(-)
diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 34cad57..484d965 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -726,7 +726,13 @@ static int execute_set_action(struct sk_buff *skb,
{
/* Only tunnel set execution is supported without a mask. */
if (nla_type(a) == OVS_KEY_ATTR_TUNNEL_INFO) {
- OVS_CB(skb)->egress_tun_info = nla_data(a);
+ struct ovs_tunnel_info *tun = nla_data(a);
+
+ skb_attach_tunnel_info(skb, tun->info);
+
+ /* FIXME: Remove when all vports have been converted */
+ OVS_CB(skb)->egress_tun_info = tun->info;
+
return 0;
}
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 3b90461..3315e3a 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -1004,7 +1004,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
}
ovs_unlock();
- ovs_nla_free_flow_actions(old_acts);
+ ovs_nla_free_flow_actions_rcu(old_acts);
ovs_flow_free(new_flow, false);
}
@@ -1016,7 +1016,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
err_kfree_acts:
- kfree(acts);
+ ovs_nla_free_flow_actions(acts);
err_kfree_flow:
ovs_flow_free(new_flow, false);
error:
@@ -1143,7 +1143,7 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct genl_info *info)
if (reply)
ovs_notify(&dp_flow_genl_family, reply, info);
if (old_acts)
- ovs_nla_free_flow_actions(old_acts);
+ ovs_nla_free_flow_actions_rcu(old_acts);
return 0;
@@ -1151,7 +1151,7 @@ err_unlock_ovs:
ovs_unlock();
kfree_skb(reply);
err_kfree_acts:
- kfree(acts);
+ ovs_nla_free_flow_actions(acts);
error:
return error;
}
diff --git a/net/openvswitch/flow.h b/net/openvswitch/flow.h
index cadc6c5..193eab9 100644
--- a/net/openvswitch/flow.h
+++ b/net/openvswitch/flow.h
@@ -45,6 +45,11 @@ struct sk_buff;
#define TUN_METADATA_OPTS(flow_key, opt_len) \
((void *)((flow_key)->tun_opts + TUN_METADATA_OFFSET(opt_len)))
+struct ovs_tunnel_info
+{
+ struct ip_tunnel_info *info;
+};
+
#define OVS_SW_FLOW_KEY_METADATA_SIZE \
(offsetof(struct sw_flow_key, recirc_id) + \
FIELD_SIZEOF(struct sw_flow_key, recirc_id))
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index ecfa530..35086c6 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1548,11 +1548,45 @@ static struct sw_flow_actions *nla_alloc_flow_actions(int size, bool log)
return sfa;
}
+static void ovs_nla_free_set_action(const struct nlattr *a)
+{
+ const struct nlattr *ovs_key = nla_data(a);
+ struct ovs_tunnel_info *ovs_tun;
+
+ switch (nla_type(ovs_key)) {
+ case OVS_KEY_ATTR_TUNNEL_INFO:
+ ovs_tun = nla_data(ovs_key);
+ ip_tunnel_info_put(ovs_tun->info);
+ break;
+ }
+}
+
+void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+{
+ const struct nlattr *a;
+ int rem;
+
+ nla_for_each_attr(a, sf_acts->actions, sf_acts->actions_len, rem) {
+ switch (nla_type(a)) {
+ case OVS_ACTION_ATTR_SET:
+ ovs_nla_free_set_action(a);
+ break;
+ }
+ }
+
+ kfree(sf_acts);
+}
+
+static void __ovs_nla_free_flow_actions(struct rcu_head *head)
+{
+ ovs_nla_free_flow_actions(container_of(head, struct sw_flow_actions, rcu));
+}
+
/* Schedules 'sf_acts' to be freed after the next RCU grace period.
* The caller must hold rcu_read_lock for this to be sensible. */
-void ovs_nla_free_flow_actions(struct sw_flow_actions *sf_acts)
+void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *sf_acts)
{
- kfree_rcu(sf_acts, rcu);
+ call_rcu(&sf_acts->rcu, __ovs_nla_free_flow_actions);
}
static struct nlattr *reserve_sfa_size(struct sw_flow_actions **sfa,
@@ -1747,6 +1781,7 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
struct sw_flow_match match;
struct sw_flow_key key;
struct ip_tunnel_info *tun_info;
+ struct ovs_tunnel_info *ovs_tun;
struct nlattr *a;
int err = 0, start, opts_type;
@@ -1771,12 +1806,21 @@ static int validate_and_copy_set_tun(const struct nlattr *attr,
if (start < 0)
return start;
+ tun_info = ip_tunnel_info_alloc(key.tun_opts_len, GFP_KERNEL);
+ if (!tun_info)
+ return -ENOMEM;
+
+ ip_tunnel_info_get(tun_info);
a = __add_action(sfa, OVS_KEY_ATTR_TUNNEL_INFO, NULL,
- sizeof(*tun_info) + key.tun_opts_len, log);
- if (IS_ERR(a))
+ sizeof(*ovs_tun), log);
+ if (IS_ERR(a)) {
+ ip_tunnel_info_put(tun_info);
return PTR_ERR(a);
+ }
- tun_info = nla_data(a);
+ ovs_tun = nla_data(a);
+ ovs_tun->info = tun_info;
+ tun_info->mode = IP_TUNNEL_INFO_TX;
tun_info->key = key.tun_key;
tun_info->options_len = key.tun_opts_len;
@@ -2177,7 +2221,7 @@ int ovs_nla_copy_actions(const struct nlattr *attr,
err = __ovs_nla_copy_actions(attr, key, 0, sfa, key->eth.type,
key->eth.tci, log);
if (err)
- kfree(*sfa);
+ ovs_nla_free_flow_actions(*sfa);
return err;
}
@@ -2227,7 +2271,8 @@ static int set_action_to_attr(const struct nlattr *a, struct sk_buff *skb)
switch (key_type) {
case OVS_KEY_ATTR_TUNNEL_INFO: {
- struct ip_tunnel_info *tun_info = nla_data(ovs_key);
+ struct ovs_tunnel_info *ovs_tun = nla_data(ovs_key);
+ struct ip_tunnel_info *tun_info = ovs_tun->info;
start = nla_nest_start(skb, OVS_ACTION_ATTR_SET);
if (!start)
diff --git a/net/openvswitch/flow_netlink.h b/net/openvswitch/flow_netlink.h
index ec53eb6..acd0744 100644
--- a/net/openvswitch/flow_netlink.h
+++ b/net/openvswitch/flow_netlink.h
@@ -69,5 +69,6 @@ int ovs_nla_put_actions(const struct nlattr *attr,
int len, struct sk_buff *skb);
void ovs_nla_free_flow_actions(struct sw_flow_actions *);
+void ovs_nla_free_flow_actions_rcu(struct sw_flow_actions *);
#endif /* flow_netlink.h */
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 09/14] openvswitch: Move dev pointer into vport itself
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (7 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 08/14] openvswitch: Allocate & attach ip_tunnel_info for tunnel set action Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
[not found] ` <cover.1433167295.git.tgraf-G/eBtMaohhA@public.gmane.org>
` (5 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
This is the first step in representing all OVS vports as regular
struct net_devices. Move the net_device pointer into the vport
structure itself to get rid of struct vport_netdev.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
net/openvswitch/datapath.c | 7 +--
net/openvswitch/dp_notify.c | 5 +--
net/openvswitch/vport-internal_dev.c | 37 +++++++---------
net/openvswitch/vport-netdev.c | 84 ++++++++++++++++--------------------
net/openvswitch/vport-netdev.h | 12 ------
net/openvswitch/vport.h | 3 +-
6 files changed, 58 insertions(+), 90 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 3315e3a..c3ecfd4 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -188,7 +188,7 @@ static int get_dpifindex(const struct datapath *dp)
local = ovs_vport_rcu(dp, OVSP_LOCAL);
if (local)
- ifindex = netdev_vport_priv(local)->dev->ifindex;
+ ifindex = local->dev->ifindex;
else
ifindex = 0;
@@ -2205,13 +2205,10 @@ static void __net_exit list_vports_from_net(struct net *net, struct net *dnet,
struct vport *vport;
hlist_for_each_entry(vport, &dp->ports[i], dp_hash_node) {
- struct netdev_vport *netdev_vport;
-
if (vport->ops->type != OVS_VPORT_TYPE_INTERNAL)
continue;
- netdev_vport = netdev_vport_priv(vport);
- if (dev_net(netdev_vport->dev) == dnet)
+ if (dev_net(vport->dev) == dnet)
list_add(&vport->detach_list, head);
}
}
diff --git a/net/openvswitch/dp_notify.c b/net/openvswitch/dp_notify.c
index 2c631fe..a7a80a6 100644
--- a/net/openvswitch/dp_notify.c
+++ b/net/openvswitch/dp_notify.c
@@ -58,13 +58,10 @@ void ovs_dp_notify_wq(struct work_struct *work)
struct hlist_node *n;
hlist_for_each_entry_safe(vport, n, &dp->ports[i], dp_hash_node) {
- struct netdev_vport *netdev_vport;
-
if (vport->ops->type != OVS_VPORT_TYPE_NETDEV)
continue;
- netdev_vport = netdev_vport_priv(vport);
- if (!(netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH))
+ if (!(vport->dev->priv_flags & IFF_OVS_DATAPATH))
dp_detach_port_notify(vport);
}
}
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index 6a55f71..a2c205d 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -156,49 +156,44 @@ static void do_setup(struct net_device *netdev)
static struct vport *internal_dev_create(const struct vport_parms *parms)
{
struct vport *vport;
- struct netdev_vport *netdev_vport;
struct internal_dev *internal_dev;
int err;
- vport = ovs_vport_alloc(sizeof(struct netdev_vport),
- &ovs_internal_vport_ops, parms);
+ vport = ovs_vport_alloc(0, &ovs_internal_vport_ops, parms);
if (IS_ERR(vport)) {
err = PTR_ERR(vport);
goto error;
}
- netdev_vport = netdev_vport_priv(vport);
-
- netdev_vport->dev = alloc_netdev(sizeof(struct internal_dev),
- parms->name, NET_NAME_UNKNOWN,
- do_setup);
- if (!netdev_vport->dev) {
+ vport->dev = alloc_netdev(sizeof(struct internal_dev),
+ parms->name, NET_NAME_UNKNOWN, do_setup);
+ if (!vport->dev) {
err = -ENOMEM;
goto error_free_vport;
}
- dev_net_set(netdev_vport->dev, ovs_dp_get_net(vport->dp));
- internal_dev = internal_dev_priv(netdev_vport->dev);
+ dev_net_set(vport->dev, ovs_dp_get_net(vport->dp));
+ internal_dev = internal_dev_priv(vport->dev);
internal_dev->vport = vport;
/* Restrict bridge port to current netns. */
if (vport->port_no == OVSP_LOCAL)
- netdev_vport->dev->features |= NETIF_F_NETNS_LOCAL;
+ vport->dev->features |= NETIF_F_NETNS_LOCAL;
rtnl_lock();
- err = register_netdevice(netdev_vport->dev);
+ err = register_netdevice(vport->dev);
if (err)
goto error_free_netdev;
- dev_set_promiscuity(netdev_vport->dev, 1);
+ dev_set_promiscuity(vport->dev, 1);
rtnl_unlock();
- netif_start_queue(netdev_vport->dev);
+ netif_start_queue(vport->dev);
return vport;
error_free_netdev:
rtnl_unlock();
- free_netdev(netdev_vport->dev);
+ free_netdev(vport->dev);
error_free_vport:
ovs_vport_free(vport);
error:
@@ -207,21 +202,19 @@ error:
static void internal_dev_destroy(struct vport *vport)
{
- struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
-
- netif_stop_queue(netdev_vport->dev);
+ netif_stop_queue(vport->dev);
rtnl_lock();
- dev_set_promiscuity(netdev_vport->dev, -1);
+ dev_set_promiscuity(vport->dev, -1);
/* unregister_netdevice() waits for an RCU grace period. */
- unregister_netdevice(netdev_vport->dev);
+ unregister_netdevice(vport->dev);
rtnl_unlock();
}
static int internal_dev_recv(struct vport *vport, struct sk_buff *skb)
{
- struct net_device *netdev = netdev_vport_priv(vport)->dev;
+ struct net_device *netdev = vport->dev;
int len;
if (unlikely(!(netdev->flags & IFF_UP))) {
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 4776282..cb22051 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -83,103 +83,96 @@ static struct net_device *get_dpdev(const struct datapath *dp)
local = ovs_vport_ovsl(dp, OVSP_LOCAL);
BUG_ON(!local);
- return netdev_vport_priv(local)->dev;
+ return local->dev;
}
-static struct vport *netdev_create(const struct vport_parms *parms)
+static struct vport *netdev_link(struct vport *vport, const char *name)
{
- struct vport *vport;
- struct netdev_vport *netdev_vport;
int err;
- vport = ovs_vport_alloc(sizeof(struct netdev_vport),
- &ovs_netdev_vport_ops, parms);
- if (IS_ERR(vport)) {
- err = PTR_ERR(vport);
- goto error;
- }
-
- netdev_vport = netdev_vport_priv(vport);
-
- netdev_vport->dev = dev_get_by_name(ovs_dp_get_net(vport->dp), parms->name);
- if (!netdev_vport->dev) {
+ vport->dev = dev_get_by_name(ovs_dp_get_net(vport->dp), name);
+ if (!vport->dev) {
err = -ENODEV;
goto error_free_vport;
}
- if (netdev_vport->dev->flags & IFF_LOOPBACK ||
- netdev_vport->dev->type != ARPHRD_ETHER ||
- ovs_is_internal_dev(netdev_vport->dev)) {
+ if (vport->dev->flags & IFF_LOOPBACK ||
+ vport->dev->type != ARPHRD_ETHER ||
+ ovs_is_internal_dev(vport->dev)) {
err = -EINVAL;
goto error_put;
}
rtnl_lock();
- err = netdev_master_upper_dev_link(netdev_vport->dev,
+ err = netdev_master_upper_dev_link(vport->dev,
get_dpdev(vport->dp));
if (err)
goto error_unlock;
- err = netdev_rx_handler_register(netdev_vport->dev, netdev_frame_hook,
+ err = netdev_rx_handler_register(vport->dev, netdev_frame_hook,
vport);
if (err)
goto error_master_upper_dev_unlink;
- dev_set_promiscuity(netdev_vport->dev, 1);
- netdev_vport->dev->priv_flags |= IFF_OVS_DATAPATH;
+ dev_set_promiscuity(vport->dev, 1);
+ vport->dev->priv_flags |= IFF_OVS_DATAPATH;
rtnl_unlock();
return vport;
error_master_upper_dev_unlink:
- netdev_upper_dev_unlink(netdev_vport->dev, get_dpdev(vport->dp));
+ netdev_upper_dev_unlink(vport->dev, get_dpdev(vport->dp));
error_unlock:
rtnl_unlock();
error_put:
- dev_put(netdev_vport->dev);
+ dev_put(vport->dev);
error_free_vport:
ovs_vport_free(vport);
-error:
return ERR_PTR(err);
}
+static struct vport *netdev_create(const struct vport_parms *parms)
+{
+ struct vport *vport;
+
+ vport = ovs_vport_alloc(0, &ovs_netdev_vport_ops, parms);
+ if (IS_ERR(vport))
+ return vport;
+
+ return netdev_link(vport, parms->name);
+}
+
static void free_port_rcu(struct rcu_head *rcu)
{
- struct netdev_vport *netdev_vport = container_of(rcu,
- struct netdev_vport, rcu);
+ struct vport *vport = container_of(rcu, struct vport, rcu);
- dev_put(netdev_vport->dev);
- ovs_vport_free(vport_from_priv(netdev_vport));
+ dev_put(vport->dev);
+ ovs_vport_free(vport);
}
void ovs_netdev_detach_dev(struct vport *vport)
{
- struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
-
ASSERT_RTNL();
- netdev_vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
- netdev_rx_handler_unregister(netdev_vport->dev);
- netdev_upper_dev_unlink(netdev_vport->dev,
- netdev_master_upper_dev_get(netdev_vport->dev));
- dev_set_promiscuity(netdev_vport->dev, -1);
+ vport->dev->priv_flags &= ~IFF_OVS_DATAPATH;
+ netdev_rx_handler_unregister(vport->dev);
+ netdev_upper_dev_unlink(vport->dev,
+ netdev_master_upper_dev_get(vport->dev));
+ dev_set_promiscuity(vport->dev, -1);
}
static void netdev_destroy(struct vport *vport)
{
- struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
-
rtnl_lock();
- if (netdev_vport->dev->priv_flags & IFF_OVS_DATAPATH)
+ if (vport->dev->priv_flags & IFF_OVS_DATAPATH)
ovs_netdev_detach_dev(vport);
rtnl_unlock();
- call_rcu(&netdev_vport->rcu, free_port_rcu);
+ call_rcu(&vport->rcu, free_port_rcu);
}
const char *ovs_netdev_get_name(const struct vport *vport)
{
- const struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
- return netdev_vport->dev->name;
+ return vport->dev->name;
}
static unsigned int packet_length(const struct sk_buff *skb)
@@ -194,18 +187,17 @@ static unsigned int packet_length(const struct sk_buff *skb)
static int netdev_send(struct vport *vport, struct sk_buff *skb)
{
- struct netdev_vport *netdev_vport = netdev_vport_priv(vport);
- int mtu = netdev_vport->dev->mtu;
+ int mtu = vport->dev->mtu;
int len;
if (unlikely(packet_length(skb) > mtu && !skb_is_gso(skb))) {
net_warn_ratelimited("%s: dropped over-mtu packet: %d > %d\n",
- netdev_vport->dev->name,
+ vport->dev->name,
packet_length(skb), mtu);
goto drop;
}
- skb->dev = netdev_vport->dev;
+ skb->dev = vport->dev;
len = skb->len;
dev_queue_xmit(skb);
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 6f7038e..1c52aed 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -26,18 +26,6 @@
struct vport *ovs_netdev_get_vport(struct net_device *dev);
-struct netdev_vport {
- struct rcu_head rcu;
-
- struct net_device *dev;
-};
-
-static inline struct netdev_vport *
-netdev_vport_priv(const struct vport *vport)
-{
- return vport_priv(vport);
-}
-
const char *ovs_netdev_get_name(const struct vport *);
void ovs_netdev_detach_dev(struct vport *);
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index 75d6824..e05ec68 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -107,7 +107,7 @@ struct vport_portids {
* @detach_list: list used for detaching vport in net-exit call.
*/
struct vport {
- struct rcu_head rcu;
+ struct net_device *dev;
struct datapath *dp;
struct vport_portids __rcu *upcall_portids;
u16 port_no;
@@ -120,6 +120,7 @@ struct vport {
struct vport_err_stats err_stats;
struct list_head detach_list;
+ struct rcu_head rcu;
};
/**
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 10/14] openvswitch: Abstract vport name through ovs_vport_name()
[not found] ` <cover.1433167295.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-02 19:02 ` [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Eric W. Biederman
1 sibling, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev-u79uwXL29TY76Z2rM5mHXA
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
marcelo.leitner-Re5JQEeQqe8AvxtiuMwx3w,
jiri-rHqAuBHg3fBzbRFIqnYvSA, daniel-FeC+5ew28dpmcu3hnIyYJQ,
tom-BjP2VixgY4xUbtYUoyoikg, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
kaber-dcUjhNyLwpNeoWH0uzbU5w,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r,
davem-fT/PcQaiUtIeIZ0/mPfg9Q
This allows to get rid of the get_name() vport ops later on.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
net/openvswitch/datapath.c | 4 ++--
net/openvswitch/vport-internal_dev.c | 1 -
net/openvswitch/vport-netdev.c | 6 ------
net/openvswitch/vport-netdev.h | 1 -
net/openvswitch/vport.c | 4 ++--
net/openvswitch/vport.h | 5 +++++
6 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c3ecfd4..8986558 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -176,7 +176,7 @@ static inline struct datapath *get_dp(struct net *net, int dp_ifindex)
const char *ovs_dp_name(const struct datapath *dp)
{
struct vport *vport = ovs_vport_ovsl_rcu(dp, OVSP_LOCAL);
- return vport->ops->get_name(vport);
+ return ovs_vport_name(vport);
}
static int get_dpifindex(const struct datapath *dp)
@@ -1786,7 +1786,7 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
if (nla_put_u32(skb, OVS_VPORT_ATTR_PORT_NO, vport->port_no) ||
nla_put_u32(skb, OVS_VPORT_ATTR_TYPE, vport->ops->type) ||
nla_put_string(skb, OVS_VPORT_ATTR_NAME,
- vport->ops->get_name(vport)))
+ ovs_vport_name(vport)))
goto nla_put_failure;
ovs_vport_get_stats(vport, &vport_stats);
diff --git a/net/openvswitch/vport-internal_dev.c b/net/openvswitch/vport-internal_dev.c
index a2c205d..c058bbf 100644
--- a/net/openvswitch/vport-internal_dev.c
+++ b/net/openvswitch/vport-internal_dev.c
@@ -242,7 +242,6 @@ static struct vport_ops ovs_internal_vport_ops = {
.type = OVS_VPORT_TYPE_INTERNAL,
.create = internal_dev_create,
.destroy = internal_dev_destroy,
- .get_name = ovs_netdev_get_name,
.send = internal_dev_recv,
};
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index cb22051..ef11a41 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -170,11 +170,6 @@ static void netdev_destroy(struct vport *vport)
call_rcu(&vport->rcu, free_port_rcu);
}
-const char *ovs_netdev_get_name(const struct vport *vport)
-{
- return vport->dev->name;
-}
-
static unsigned int packet_length(const struct sk_buff *skb)
{
unsigned int length = skb->len - ETH_HLEN;
@@ -222,7 +217,6 @@ static struct vport_ops ovs_netdev_vport_ops = {
.type = OVS_VPORT_TYPE_NETDEV,
.create = netdev_create,
.destroy = netdev_destroy,
- .get_name = ovs_netdev_get_name,
.send = netdev_send,
};
diff --git a/net/openvswitch/vport-netdev.h b/net/openvswitch/vport-netdev.h
index 1c52aed..684fb88 100644
--- a/net/openvswitch/vport-netdev.h
+++ b/net/openvswitch/vport-netdev.h
@@ -26,7 +26,6 @@
struct vport *ovs_netdev_get_vport(struct net_device *dev);
-const char *ovs_netdev_get_name(const struct vport *);
void ovs_netdev_detach_dev(struct vport *);
int __init ovs_netdev_init(void);
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index af23ba0..d14f594 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -113,7 +113,7 @@ struct vport *ovs_vport_locate(const struct net *net, const char *name)
struct vport *vport;
hlist_for_each_entry_rcu(vport, bucket, hash_node)
- if (!strcmp(name, vport->ops->get_name(vport)) &&
+ if (!strcmp(name, ovs_vport_name(vport)) &&
net_eq(ovs_dp_get_net(vport->dp), net))
return vport;
@@ -226,7 +226,7 @@ struct vport *ovs_vport_add(const struct vport_parms *parms)
}
bucket = hash_bucket(ovs_dp_get_net(vport->dp),
- vport->ops->get_name(vport));
+ ovs_vport_name(vport));
hlist_add_head_rcu(&vport->hash_node, bucket);
return vport;
}
diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
index e05ec68..1a689c2 100644
--- a/net/openvswitch/vport.h
+++ b/net/openvswitch/vport.h
@@ -237,6 +237,11 @@ static inline void ovs_skb_postpush_rcsum(struct sk_buff *skb,
skb->csum = csum_add(skb->csum, csum_partial(start, len, 0));
}
+static inline const char *ovs_vport_name(struct vport *vport)
+{
+ return vport->dev ? vport->dev->name : vport->ops->get_name(vport);
+}
+
int ovs_vport_ops_register(struct vport_ops *ops);
void ovs_vport_ops_unregister(struct vport_ops *ops);
--
2.3.5
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 11/14] openvswitch: Use regular VXLAN net_device device
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (9 preceding siblings ...)
[not found] ` <cover.1433167295.git.tgraf-G/eBtMaohhA@public.gmane.org>
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member Thomas Graf
` (3 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
This gets rid of all OVS specific VXLAN code in the receive and
transmit path by using a VXLAN net_device to represent the vport.
Only a small shim layer remains which takes care of handling the
VXLAN specific OVS Netlink configuration.
Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb()
since they are no longer needed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
drivers/net/vxlan.c | 23 +--
include/net/vxlan.h | 14 +-
net/openvswitch/Kconfig | 12 --
net/openvswitch/Makefile | 1 -
net/openvswitch/flow_netlink.c | 5 +-
net/openvswitch/vport-netdev.c | 176 +++++++++++++++++++++-
net/openvswitch/vport-vxlan.c | 322 -----------------------------------------
7 files changed, 193 insertions(+), 360 deletions(-)
delete mode 100644 net/openvswitch/vport-vxlan.c
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 3acab95..b696871 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -74,6 +74,10 @@ static struct rtnl_link_ops vxlan_link_ops;
static const u8 all_zeros_mac[ETH_ALEN];
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+ vxlan_rcv_t *rcv, void *data,
+ bool no_share, u32 flags);
+
/* per-network namespace private data for this module */
struct vxlan_net {
struct list_head vxlan_list;
@@ -1020,7 +1024,7 @@ static bool vxlan_group_used(struct vxlan_net *vn, struct vxlan_dev *dev)
return false;
}
-void vxlan_sock_release(struct vxlan_sock *vs)
+static void vxlan_sock_release(struct vxlan_sock *vs)
{
struct sock *sk = vs->sock->sk;
struct net *net = sock_net(sk);
@@ -1036,7 +1040,6 @@ void vxlan_sock_release(struct vxlan_sock *vs)
queue_work(vxlan_wq, &vs->del_work);
}
-EXPORT_SYMBOL_GPL(vxlan_sock_release);
/* Update multicast group membership when first VNI on
* multicast address is brought up
@@ -1761,10 +1764,10 @@ err:
}
#endif
-int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
- __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
- __be16 src_port, __be16 dst_port,
- struct vxlan_metadata *md, bool xnet, u32 vxflags)
+static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
+ __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
+ __be16 src_port, __be16 dst_port,
+ struct vxlan_metadata *md, bool xnet, u32 vxflags)
{
struct vxlanhdr *vxh;
int min_headroom;
@@ -1834,7 +1837,6 @@ int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
ttl, df, src_port, dst_port, xnet,
!(vxflags & VXLAN_F_UDP_CSUM));
}
-EXPORT_SYMBOL_GPL(vxlan_xmit_skb);
/* Bypass encapsulation if the destination is local */
static void vxlan_encap_bypass(struct sk_buff *skb, struct vxlan_dev *src_vxlan,
@@ -2609,9 +2611,9 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
return vs;
}
-struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
- bool no_share, u32 flags)
+static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
+ vxlan_rcv_t *rcv, void *data,
+ bool no_share, u32 flags)
{
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
struct vxlan_sock *vs;
@@ -2632,7 +2634,6 @@ struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
return vxlan_socket_create(net, port, rcv, data, flags);
}
-EXPORT_SYMBOL_GPL(vxlan_sock_add);
static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
struct vxlan_config *conf)
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index c037b27..d3ce81f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -197,19 +197,13 @@ struct vxlan_dev {
VXLAN_F_REMCSUM_NOPARTIAL | \
VXLAN_F_FLOW_BASED)
-struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
- bool no_share, u32 flags);
-
struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type, struct vxlan_config *conf);
-void vxlan_sock_release(struct vxlan_sock *vs);
-
-int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
- __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
- __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
- bool xnet, u32 vxflags);
+static inline __be16 vxlan_dev_dst_port(struct vxlan_dev *vxlan)
+{
+ return inet_sk(vxlan->vn_sock->sock->sk)->inet_sport;
+}
static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
netdev_features_t features)
diff --git a/net/openvswitch/Kconfig b/net/openvswitch/Kconfig
index 1584040..1119f46 100644
--- a/net/openvswitch/Kconfig
+++ b/net/openvswitch/Kconfig
@@ -44,18 +44,6 @@ config OPENVSWITCH_GRE
If unsure, say Y.
-config OPENVSWITCH_VXLAN
- tristate "Open vSwitch VXLAN tunneling support"
- depends on OPENVSWITCH
- depends on VXLAN
- default OPENVSWITCH
- ---help---
- If you say Y here, then the Open vSwitch will be able create vxlan vport.
-
- Say N to exclude this support and reduce the binary size.
-
- If unsure, say Y.
-
config OPENVSWITCH_GENEVE
tristate "Open vSwitch Geneve tunneling support"
depends on OPENVSWITCH
diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 91b9478..38e0e14 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -16,5 +16,4 @@ openvswitch-y := \
vport-netdev.o
obj-$(CONFIG_OPENVSWITCH_GENEVE)+= vport-geneve.o
-obj-$(CONFIG_OPENVSWITCH_VXLAN) += vport-vxlan.o
obj-$(CONFIG_OPENVSWITCH_GRE) += vport-gre.o
diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 35086c6..f43ce04 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -47,6 +47,7 @@
#include <net/ipv6.h>
#include <net/ndisc.h>
#include <net/mpls.h>
+#include <net/vxlan.h>
#include "flow_netlink.h"
#include "vport-vxlan.h"
@@ -475,7 +476,7 @@ static int vxlan_tun_opt_from_nlattr(const struct nlattr *a,
{
struct nlattr *tb[OVS_VXLAN_EXT_MAX+1];
unsigned long opt_key_offset;
- struct ovs_vxlan_opts opts;
+ struct vxlan_metadata opts;
int err;
BUILD_BUG_ON(sizeof(opts) > sizeof(match->key->tun_opts));
@@ -626,7 +627,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
static int vxlan_opt_to_nlattr(struct sk_buff *skb,
const void *tun_opts, int swkey_tun_opts_len)
{
- const struct ovs_vxlan_opts *opts = tun_opts;
+ const struct vxlan_metadata *opts = tun_opts;
struct nlattr *nla;
nla = nla_nest_start(skb, OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS);
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index ef11a41..77703ee 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -27,9 +27,14 @@
#include <linux/skbuff.h>
#include <linux/openvswitch.h>
-#include <net/llc.h>
+#include <net/udp.h>
+#include <net/ip_tunnels.h>
+#include <net/rtnetlink.h>
+#include <net/vxlan.h>
#include "datapath.h"
+#include "vport.h"
+#include "vport-vxlan.h"
#include "vport-internal_dev.h"
#include "vport-netdev.h"
@@ -220,12 +225,179 @@ static struct vport_ops ovs_netdev_vport_ops = {
.send = netdev_send,
};
+/* Compat code for old userspace. */
+#if IS_ENABLED(CONFIG_VXLAN)
+static struct vport_ops ovs_vxlan_netdev_vport_ops;
+
+static int vxlan_get_options(const struct vport *vport, struct sk_buff *skb)
+{
+ struct vxlan_dev *vxlan = netdev_priv(vport->dev);
+ __be16 dst_port = vxlan->cfg.dst_port;
+
+ if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(dst_port)))
+ return -EMSGSIZE;
+
+ if (vxlan->flags & VXLAN_F_GBP) {
+ struct nlattr *exts;
+
+ exts = nla_nest_start(skb, OVS_TUNNEL_ATTR_EXTENSION);
+ if (!exts)
+ return -EMSGSIZE;
+
+ if (vxlan->flags & VXLAN_F_GBP &&
+ nla_put_flag(skb, OVS_VXLAN_EXT_GBP))
+ return -EMSGSIZE;
+
+ nla_nest_end(skb, exts);
+ }
+
+ return 0;
+}
+
+static const struct nla_policy exts_policy[OVS_VXLAN_EXT_MAX + 1] = {
+ [OVS_VXLAN_EXT_GBP] = { .type = NLA_FLAG, },
+};
+
+static int vxlan_configure_exts(struct vport *vport, struct nlattr *attr,
+ struct vxlan_config *conf)
+{
+ struct nlattr *exts[OVS_VXLAN_EXT_MAX + 1];
+ int err;
+
+ if (nla_len(attr) < sizeof(struct nlattr))
+ return -EINVAL;
+
+ err = nla_parse_nested(exts, OVS_VXLAN_EXT_MAX, attr, exts_policy);
+ if (err < 0)
+ return err;
+
+ if (exts[OVS_VXLAN_EXT_GBP])
+ conf->flags |= VXLAN_F_GBP;
+
+ return 0;
+}
+
+static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
+{
+ struct net *net = ovs_dp_get_net(parms->dp);
+ struct nlattr *options = parms->options;
+ struct vport *vport;
+ struct nlattr *a;
+ int err;
+ struct vxlan_config conf = {
+ .no_share = true,
+ .flags = VXLAN_F_FLOW_BASED,
+ };
+
+ if (!options) {
+ err = -EINVAL;
+ goto error;
+ }
+
+ a = nla_find_nested(options, OVS_TUNNEL_ATTR_DST_PORT);
+ if (a && nla_len(a) == sizeof(u16)) {
+ conf.dst_port = htons(nla_get_u16(a));
+ } else {
+ /* Require destination port from userspace. */
+ err = -EINVAL;
+ goto error;
+ }
+
+ vport = ovs_vport_alloc(0, &ovs_vxlan_netdev_vport_ops, parms);
+ if (IS_ERR(vport))
+ return vport;
+
+ a = nla_find_nested(options, OVS_TUNNEL_ATTR_EXTENSION);
+ if (a) {
+ err = vxlan_configure_exts(vport, a, &conf);
+ if (err) {
+ ovs_vport_free(vport);
+ goto error;
+ }
+ }
+
+ rtnl_lock();
+ vxlan_dev_create(net, parms->name, NET_NAME_USER, &conf);
+ rtnl_unlock();
+ return vport;
+error:
+ return ERR_PTR(err);
+}
+
+static struct vport *vxlan_create(const struct vport_parms *parms)
+{
+ struct vport *vport;
+
+ vport = vxlan_tnl_create(parms);
+ if (IS_ERR(vport))
+ return vport;
+
+ return netdev_link(vport, parms->name);
+}
+
+static int vxlan_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
+ struct ip_tunnel_info *egress_tun_info)
+{
+ struct vxlan_dev *vxlan = netdev_priv(vport->dev);
+ struct net *net = ovs_dp_get_net(vport->dp);
+ __be16 dst_port = vxlan_dev_dst_port(vxlan);
+ __be16 src_port;
+ int port_min;
+ int port_max;
+
+ inet_get_local_port_range(net, &port_min, &port_max);
+ src_port = udp_flow_src_port(net, skb, 0, 0, true);
+
+ return ovs_tunnel_get_egress_info(egress_tun_info, net,
+ OVS_CB(skb)->egress_tun_info,
+ IPPROTO_UDP, skb->mark,
+ src_port, dst_port);
+}
+
+static struct vport_ops ovs_vxlan_netdev_vport_ops = {
+ .type = OVS_VPORT_TYPE_VXLAN,
+ .create = vxlan_create,
+ .destroy = netdev_destroy,
+ .get_options = vxlan_get_options,
+ .send = netdev_send,
+ .get_egress_tun_info = vxlan_get_egress_tun_info,
+};
+
+static int vxlan_campat_init(void)
+{
+ return ovs_vport_ops_register(&ovs_vxlan_netdev_vport_ops);
+}
+
+static void vxlan_campat_exit(void)
+{
+ ovs_vport_ops_unregister(&ovs_vxlan_netdev_vport_ops);
+}
+#else
+static int vxlan_campat_init(void)
+{
+ return 0;
+}
+
+static void vxlan_campat_exit(void)
+{
+}
+#endif
+
int __init ovs_netdev_init(void)
{
- return ovs_vport_ops_register(&ovs_netdev_vport_ops);
+ int err;
+
+ err = ovs_vport_ops_register(&ovs_netdev_vport_ops);
+ if (err)
+ return err;
+ err = vxlan_campat_init();
+ if (err)
+ vxlan_campat_exit();
+ return err;
}
void ovs_netdev_exit(void)
{
ovs_vport_ops_unregister(&ovs_netdev_vport_ops);
+ vxlan_campat_exit();
}
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
deleted file mode 100644
index 6f7986f..0000000
--- a/net/openvswitch/vport-vxlan.c
+++ /dev/null
@@ -1,322 +0,0 @@
-/*
- * Copyright (c) 2014 Nicira, Inc.
- * Copyright (c) 2013 Cisco Systems, Inc.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
- * 02110-1301, USA
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/in.h>
-#include <linux/ip.h>
-#include <linux/net.h>
-#include <linux/rculist.h>
-#include <linux/udp.h>
-#include <linux/module.h>
-
-#include <net/icmp.h>
-#include <net/ip.h>
-#include <net/udp.h>
-#include <net/ip_tunnels.h>
-#include <net/rtnetlink.h>
-#include <net/route.h>
-#include <net/dsfield.h>
-#include <net/inet_ecn.h>
-#include <net/net_namespace.h>
-#include <net/netns/generic.h>
-#include <net/vxlan.h>
-
-#include "datapath.h"
-#include "vport.h"
-#include "vport-vxlan.h"
-
-/**
- * struct vxlan_port - Keeps track of open UDP ports
- * @vs: vxlan_sock created for the port.
- * @name: vport name.
- */
-struct vxlan_port {
- struct vxlan_sock *vs;
- char name[IFNAMSIZ];
- u32 exts; /* VXLAN_F_* in <net/vxlan.h> */
-};
-
-static struct vport_ops ovs_vxlan_vport_ops;
-
-static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
-{
- return vport_priv(vport);
-}
-
-/* Called with rcu_read_lock and BH disabled. */
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
- struct vxlan_metadata *md)
-{
- struct ip_tunnel_info tun_info;
- struct vxlan_port *vxlan_port;
- struct vport *vport = vs->data;
- struct iphdr *iph;
- struct ovs_vxlan_opts opts = {
- .gbp = md->gbp,
- };
- __be64 key;
- __be16 flags;
-
- flags = TUNNEL_KEY | (udp_hdr(skb)->check != 0 ? TUNNEL_CSUM : 0);
- vxlan_port = vxlan_vport(vport);
- if (vxlan_port->exts & VXLAN_F_GBP && md->gbp)
- flags |= TUNNEL_VXLAN_OPT;
-
- /* Save outer tunnel values */
- iph = ip_hdr(skb);
- key = cpu_to_be64(ntohl(md->vni) >> 8);
- ip_tunnel_info_init(&tun_info, iph,
- udp_hdr(skb)->source, udp_hdr(skb)->dest,
- key, flags, &opts, sizeof(opts));
-
- ovs_vport_receive(vport, skb, &tun_info);
-}
-
-static int vxlan_get_options(const struct vport *vport, struct sk_buff *skb)
-{
- struct vxlan_port *vxlan_port = vxlan_vport(vport);
- __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
-
- if (nla_put_u16(skb, OVS_TUNNEL_ATTR_DST_PORT, ntohs(dst_port)))
- return -EMSGSIZE;
-
- if (vxlan_port->exts) {
- struct nlattr *exts;
-
- exts = nla_nest_start(skb, OVS_TUNNEL_ATTR_EXTENSION);
- if (!exts)
- return -EMSGSIZE;
-
- if (vxlan_port->exts & VXLAN_F_GBP &&
- nla_put_flag(skb, OVS_VXLAN_EXT_GBP))
- return -EMSGSIZE;
-
- nla_nest_end(skb, exts);
- }
-
- return 0;
-}
-
-static void vxlan_tnl_destroy(struct vport *vport)
-{
- struct vxlan_port *vxlan_port = vxlan_vport(vport);
-
- vxlan_sock_release(vxlan_port->vs);
-
- ovs_vport_deferred_free(vport);
-}
-
-static const struct nla_policy exts_policy[OVS_VXLAN_EXT_MAX+1] = {
- [OVS_VXLAN_EXT_GBP] = { .type = NLA_FLAG, },
-};
-
-static int vxlan_configure_exts(struct vport *vport, struct nlattr *attr)
-{
- struct nlattr *exts[OVS_VXLAN_EXT_MAX+1];
- struct vxlan_port *vxlan_port;
- int err;
-
- if (nla_len(attr) < sizeof(struct nlattr))
- return -EINVAL;
-
- err = nla_parse_nested(exts, OVS_VXLAN_EXT_MAX, attr, exts_policy);
- if (err < 0)
- return err;
-
- vxlan_port = vxlan_vport(vport);
-
- if (exts[OVS_VXLAN_EXT_GBP])
- vxlan_port->exts |= VXLAN_F_GBP;
-
- return 0;
-}
-
-static struct vport *vxlan_tnl_create(const struct vport_parms *parms)
-{
- struct net *net = ovs_dp_get_net(parms->dp);
- struct nlattr *options = parms->options;
- struct vxlan_port *vxlan_port;
- struct vxlan_sock *vs;
- struct vport *vport;
- struct nlattr *a;
- u16 dst_port;
- int err;
-
- if (!options) {
- err = -EINVAL;
- goto error;
- }
- a = nla_find_nested(options, OVS_TUNNEL_ATTR_DST_PORT);
- if (a && nla_len(a) == sizeof(u16)) {
- dst_port = nla_get_u16(a);
- } else {
- /* Require destination port from userspace. */
- err = -EINVAL;
- goto error;
- }
-
- vport = ovs_vport_alloc(sizeof(struct vxlan_port),
- &ovs_vxlan_vport_ops, parms);
- if (IS_ERR(vport))
- return vport;
-
- vxlan_port = vxlan_vport(vport);
- strncpy(vxlan_port->name, parms->name, IFNAMSIZ);
-
- a = nla_find_nested(options, OVS_TUNNEL_ATTR_EXTENSION);
- if (a) {
- err = vxlan_configure_exts(vport, a);
- if (err) {
- ovs_vport_free(vport);
- goto error;
- }
- }
-
- vs = vxlan_sock_add(net, htons(dst_port), vxlan_rcv, vport, true,
- vxlan_port->exts);
- if (IS_ERR(vs)) {
- ovs_vport_free(vport);
- return (void *)vs;
- }
- vxlan_port->vs = vs;
-
- return vport;
-
-error:
- return ERR_PTR(err);
-}
-
-static int vxlan_ext_gbp(struct sk_buff *skb)
-{
- const struct ip_tunnel_info *tun_info;
- const struct ovs_vxlan_opts *opts;
-
- tun_info = OVS_CB(skb)->egress_tun_info;
- opts = tun_info->options;
-
- if (tun_info->key.tun_flags & TUNNEL_VXLAN_OPT &&
- tun_info->options_len >= sizeof(*opts))
- return opts->gbp;
- else
- return 0;
-}
-
-static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
-{
- struct net *net = ovs_dp_get_net(vport->dp);
- struct vxlan_port *vxlan_port = vxlan_vport(vport);
- struct sock *sk = vxlan_port->vs->sock->sk;
- __be16 dst_port = inet_sk(sk)->inet_sport;
- const struct ip_tunnel_key *tun_key;
- struct vxlan_metadata md = {0};
- struct rtable *rt;
- struct flowi4 fl;
- __be16 src_port;
- __be16 df;
- int err;
- u32 vxflags;
-
- if (unlikely(!OVS_CB(skb)->egress_tun_info)) {
- err = -EINVAL;
- goto error;
- }
-
- tun_key = &OVS_CB(skb)->egress_tun_info->key;
- rt = ovs_tunnel_route_lookup(net, tun_key, skb->mark, &fl, IPPROTO_UDP);
- if (IS_ERR(rt)) {
- err = PTR_ERR(rt);
- goto error;
- }
-
- df = tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ?
- htons(IP_DF) : 0;
-
- skb->ignore_df = 1;
-
- src_port = udp_flow_src_port(net, skb, 0, 0, true);
- md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
- md.gbp = vxlan_ext_gbp(skb);
- vxflags = vxlan_port->exts |
- (tun_key->tun_flags & TUNNEL_CSUM ? VXLAN_F_UDP_CSUM : 0);
-
- err = vxlan_xmit_skb(rt, sk, skb, fl.saddr, tun_key->ipv4_dst,
- tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
- src_port, dst_port,
- &md, false, vxflags);
- if (err < 0)
- ip_rt_put(rt);
- return err;
-error:
- kfree_skb(skb);
- return err;
-}
-
-static int vxlan_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
- struct ip_tunnel_info *egress_tun_info)
-{
- struct net *net = ovs_dp_get_net(vport->dp);
- struct vxlan_port *vxlan_port = vxlan_vport(vport);
- __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
- __be16 src_port;
- int port_min;
- int port_max;
-
- inet_get_local_port_range(net, &port_min, &port_max);
- src_port = udp_flow_src_port(net, skb, 0, 0, true);
-
- return ovs_tunnel_get_egress_info(egress_tun_info, net,
- OVS_CB(skb)->egress_tun_info,
- IPPROTO_UDP, skb->mark,
- src_port, dst_port);
-}
-
-static const char *vxlan_get_name(const struct vport *vport)
-{
- struct vxlan_port *vxlan_port = vxlan_vport(vport);
- return vxlan_port->name;
-}
-
-static struct vport_ops ovs_vxlan_vport_ops = {
- .type = OVS_VPORT_TYPE_VXLAN,
- .create = vxlan_tnl_create,
- .destroy = vxlan_tnl_destroy,
- .get_name = vxlan_get_name,
- .get_options = vxlan_get_options,
- .send = vxlan_tnl_send,
- .get_egress_tun_info = vxlan_get_egress_tun_info,
- .owner = THIS_MODULE,
-};
-
-static int __init ovs_vxlan_tnl_init(void)
-{
- return ovs_vport_ops_register(&ovs_vxlan_vport_ops);
-}
-
-static void __exit ovs_vxlan_tnl_exit(void)
-{
- ovs_vport_ops_unregister(&ovs_vxlan_vport_ops);
-}
-
-module_init(ovs_vxlan_tnl_init);
-module_exit(ovs_vxlan_tnl_exit);
-
-MODULE_DESCRIPTION("OVS: VXLAN switching port");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS("vport-type-4");
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (10 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 11/14] openvswitch: Use regular VXLAN net_device device Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport Thomas Graf
` (2 subsequent siblings)
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
With the removal of the special treating of OVS VXLAN vports, the
indirect call to vxlan_rcv() can be avoided and the VNI member
in vxlan_metadata can be removed.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
drivers/net/vxlan.c | 225 +++++++++++++++++++++++++---------------------------
include/net/vxlan.h | 7 --
2 files changed, 107 insertions(+), 125 deletions(-)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index b696871..9cc7d5a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -75,7 +75,6 @@ static struct rtnl_link_ops vxlan_link_ops;
static const u8 all_zeros_mac[ETH_ALEN];
static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
bool no_share, u32 flags);
/* per-network namespace private data for this module */
@@ -1122,6 +1121,102 @@ static struct vxlanhdr *vxlan_remcsum(struct sk_buff *skb, struct vxlanhdr *vh,
return vh;
}
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+ struct vxlan_metadata *md, __u32 vni)
+{
+ struct ip_tunnel_info *tun_info = skb_shinfo(skb)->tun_info;
+ struct iphdr *oip = NULL;
+ struct ipv6hdr *oip6 = NULL;
+ struct vxlan_dev *vxlan;
+ struct pcpu_sw_netstats *stats;
+ union vxlan_addr saddr;
+ int err = 0;
+ union vxlan_addr *remote_ip;
+
+ /* For flow based devices, map all packets to VNI 0 */
+ if (vs->flags & VXLAN_F_FLOW_BASED)
+ vni = 0;
+
+ /* Is this VNI defined? */
+ vxlan = vxlan_vs_find_vni(vs, vni);
+ if (!vxlan)
+ goto drop;
+
+ remote_ip = &vxlan->default_dst.remote_ip;
+ skb_reset_mac_header(skb);
+ skb_scrub_packet(skb, !net_eq(vxlan->net, dev_net(vxlan->dev)));
+ skb->protocol = eth_type_trans(skb, vxlan->dev);
+ skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
+
+ /* Ignore packet loops (and multicast echo) */
+ if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr))
+ goto drop;
+
+ /* Re-examine inner Ethernet packet */
+ if (remote_ip->sa.sa_family == AF_INET) {
+ oip = ip_hdr(skb);
+ saddr.sin.sin_addr.s_addr = oip->saddr;
+ saddr.sa.sa_family = AF_INET;
+
+ if (tun_info) {
+ tun_info->key.ipv4_src = oip->saddr;
+ tun_info->key.ipv4_dst = oip->daddr;
+ tun_info->key.ipv4_tos = oip->tos;
+ tun_info->key.ipv4_ttl = oip->ttl;
+ }
+#if IS_ENABLED(CONFIG_IPV6)
+ } else {
+ oip6 = ipv6_hdr(skb);
+ saddr.sin6.sin6_addr = oip6->saddr;
+ saddr.sa.sa_family = AF_INET6;
+
+ /* TODO : Fill IPv6 tunnel info */
+#endif
+ }
+
+ if ((vxlan->flags & VXLAN_F_LEARN) &&
+ vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source))
+ goto drop;
+
+ skb_reset_network_header(skb);
+ if (!(vs->flags & VXLAN_F_FLOW_BASED))
+ skb->mark = md->gbp;
+
+ if (oip6)
+ err = IP6_ECN_decapsulate(oip6, skb);
+ if (oip)
+ err = IP_ECN_decapsulate(oip, skb);
+
+ if (unlikely(err)) {
+ if (log_ecn_error) {
+ if (oip6)
+ net_info_ratelimited("non-ECT from %pI6\n",
+ &oip6->saddr);
+ if (oip)
+ net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
+ &oip->saddr, oip->tos);
+ }
+ if (err > 1) {
+ ++vxlan->dev->stats.rx_frame_errors;
+ ++vxlan->dev->stats.rx_errors;
+ goto drop;
+ }
+ }
+
+ stats = this_cpu_ptr(vxlan->dev->tstats);
+ u64_stats_update_begin(&stats->syncp);
+ stats->rx_packets++;
+ stats->rx_bytes += skb->len;
+ u64_stats_update_end(&stats->syncp);
+
+ netif_rx(skb);
+
+ return;
+drop:
+ /* Consume bad packet */
+ kfree_skb(skb);
+}
+
/* Callback from net/ipv4/udp.c to receive packets */
static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
{
@@ -1226,8 +1321,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
goto bad_flags;
}
- md->vni = vxh->vx_vni;
- vs->rcv(vs, skb, md);
+ vxlan_rcv(vs, skb, md, vni >> 8);
return 0;
drop:
@@ -1244,105 +1338,6 @@ error:
return 1;
}
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
- struct vxlan_metadata *md)
-{
- struct ip_tunnel_info *tun_info = skb_shinfo(skb)->tun_info;
- struct iphdr *oip = NULL;
- struct ipv6hdr *oip6 = NULL;
- struct vxlan_dev *vxlan;
- struct pcpu_sw_netstats *stats;
- union vxlan_addr saddr;
- __u32 vni;
- int err = 0;
- union vxlan_addr *remote_ip;
-
- /* For flow based devices, map all packets to VNI 0 */
- if (vs->flags & VXLAN_F_FLOW_BASED)
- vni = 0;
- else
- vni = ntohl(md->vni) >> 8;
-
- /* Is this VNI defined? */
- vxlan = vxlan_vs_find_vni(vs, vni);
- if (!vxlan)
- goto drop;
-
- remote_ip = &vxlan->default_dst.remote_ip;
- skb_reset_mac_header(skb);
- skb_scrub_packet(skb, !net_eq(vxlan->net, dev_net(vxlan->dev)));
- skb->protocol = eth_type_trans(skb, vxlan->dev);
- skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
-
- /* Ignore packet loops (and multicast echo) */
- if (ether_addr_equal(eth_hdr(skb)->h_source, vxlan->dev->dev_addr))
- goto drop;
-
- /* Re-examine inner Ethernet packet */
- if (remote_ip->sa.sa_family == AF_INET) {
- oip = ip_hdr(skb);
- saddr.sin.sin_addr.s_addr = oip->saddr;
- saddr.sa.sa_family = AF_INET;
-
- if (tun_info) {
- tun_info->key.ipv4_src = oip->saddr;
- tun_info->key.ipv4_dst = oip->daddr;
- tun_info->key.ipv4_tos = oip->tos;
- tun_info->key.ipv4_ttl = oip->ttl;
- }
-#if IS_ENABLED(CONFIG_IPV6)
- } else {
- oip6 = ipv6_hdr(skb);
- saddr.sin6.sin6_addr = oip6->saddr;
- saddr.sa.sa_family = AF_INET6;
-
- /* TODO : Fill IPv6 tunnel info */
-#endif
- }
-
- if ((vxlan->flags & VXLAN_F_LEARN) &&
- vxlan_snoop(skb->dev, &saddr, eth_hdr(skb)->h_source))
- goto drop;
-
- skb_reset_network_header(skb);
- if (!(vs->flags & VXLAN_F_FLOW_BASED))
- skb->mark = md->gbp;
-
- if (oip6)
- err = IP6_ECN_decapsulate(oip6, skb);
- if (oip)
- err = IP_ECN_decapsulate(oip, skb);
-
- if (unlikely(err)) {
- if (log_ecn_error) {
- if (oip6)
- net_info_ratelimited("non-ECT from %pI6\n",
- &oip6->saddr);
- if (oip)
- net_info_ratelimited("non-ECT from %pI4 with TOS=%#x\n",
- &oip->saddr, oip->tos);
- }
- if (err > 1) {
- ++vxlan->dev->stats.rx_frame_errors;
- ++vxlan->dev->stats.rx_errors;
- goto drop;
- }
- }
-
- stats = this_cpu_ptr(vxlan->dev->tstats);
- u64_stats_update_begin(&stats->syncp);
- stats->rx_packets++;
- stats->rx_bytes += skb->len;
- u64_stats_update_end(&stats->syncp);
-
- netif_rx(skb);
-
- return;
-drop:
- /* Consume bad packet */
- kfree_skb(skb);
-}
-
static int arp_reduce(struct net_device *dev, struct sk_buff *skb)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
@@ -1681,7 +1676,7 @@ static int vxlan6_xmit_skb(struct dst_entry *dst, struct sock *sk,
struct sk_buff *skb,
struct net_device *dev, struct in6_addr *saddr,
struct in6_addr *daddr, __u8 prio, __u8 ttl,
- __be16 src_port, __be16 dst_port,
+ __be16 src_port, __be16 dst_port, __u32 vni,
struct vxlan_metadata *md, bool xnet, u32 vxflags)
{
struct vxlanhdr *vxh;
@@ -1731,7 +1726,7 @@ static int vxlan6_xmit_skb(struct dst_entry *dst, struct sock *sk,
vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
vxh->vx_flags = htonl(VXLAN_HF_VNI);
- vxh->vx_vni = md->vni;
+ vxh->vx_vni = vni;
if (type & SKB_GSO_TUNNEL_REMCSUM) {
u32 data = (skb_checksum_start_offset(skb) - hdrlen) >>
@@ -1766,7 +1761,7 @@ err:
static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *skb,
__be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
- __be16 src_port, __be16 dst_port,
+ __be16 src_port, __be16 dst_port, __u32 vni,
struct vxlan_metadata *md, bool xnet, u32 vxflags)
{
struct vxlanhdr *vxh;
@@ -1810,7 +1805,7 @@ static int vxlan_xmit_skb(struct rtable *rt, struct sock *sk, struct sk_buff *sk
vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
vxh->vx_flags = htonl(VXLAN_HF_VNI);
- vxh->vx_vni = md->vni;
+ vxh->vx_vni = vni;
if (type & SKB_GSO_TUNNEL_REMCSUM) {
u32 data = (skb_checksum_start_offset(skb) - hdrlen) >>
@@ -2002,10 +1997,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
- md->vni = htonl(vni << 8);
err = vxlan_xmit_skb(rt, sk, skb, fl4.saddr,
dst->sin.sin_addr.s_addr, tos, ttl, df,
- src_port, dst_port, md,
+ src_port, dst_port, htonl(vni << 8), md,
!net_eq(vxlan->net, dev_net(vxlan->dev)),
flags);
if (err < 0) {
@@ -2060,11 +2054,10 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
}
ttl = ttl ? : ip6_dst_hoplimit(ndst);
- md->vni = htonl(vni << 8);
md->gbp = skb->mark;
err = vxlan6_xmit_skb(ndst, sk, skb, dev, &fl6.saddr, &fl6.daddr,
- 0, ttl, src_port, dst_port, md,
+ 0, ttl, src_port, dst_port, htonl(vni << 8), md,
!net_eq(vxlan->net, dev_net(vxlan->dev)),
vxlan->flags);
#endif
@@ -2257,8 +2250,8 @@ static int vxlan_open(struct net_device *dev)
struct vxlan_sock *vs;
int ret = 0;
- vs = vxlan_sock_add(vxlan->net, vxlan->cfg.dst_port, vxlan_rcv,
- NULL, vxlan->cfg.no_share, vxlan->flags);
+ vs = vxlan_sock_add(vxlan->net, vxlan->cfg.dst_port,
+ vxlan->cfg.no_share, vxlan->flags);
if (IS_ERR(vs))
return PTR_ERR(vs);
@@ -2557,7 +2550,6 @@ static struct socket *vxlan_create_sock(struct net *net, bool ipv6,
/* Create new listen socket if needed */
static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
u32 flags)
{
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -2586,8 +2578,6 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
vs->sock = sock;
atomic_set(&vs->refcnt, 1);
- vs->rcv = rcv;
- vs->data = data;
vs->flags = (flags & VXLAN_F_RCV_FLAGS);
/* Initialize the vxlan udp offloads structure */
@@ -2612,7 +2602,6 @@ static struct vxlan_sock *vxlan_socket_create(struct net *net, __be16 port,
}
static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
- vxlan_rcv_t *rcv, void *data,
bool no_share, u32 flags)
{
struct vxlan_net *vn = net_generic(net, vxlan_net_id);
@@ -2623,7 +2612,7 @@ static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
spin_lock(&vn->sock_lock);
vs = vxlan_find_sock(net, ipv6 ? AF_INET6 : AF_INET, port,
flags);
- if (vs && vs->rcv == rcv) {
+ if (vs) {
if (!atomic_add_unless(&vs->refcnt, 1, 0))
vs = ERR_PTR(-EBUSY);
spin_unlock(&vn->sock_lock);
@@ -2632,7 +2621,7 @@ static struct vxlan_sock *vxlan_sock_add(struct net *net, __be16 port,
spin_unlock(&vn->sock_lock);
}
- return vxlan_socket_create(net, port, rcv, data, flags);
+ return vxlan_socket_create(net, port, flags);
}
static int vxlan_dev_configure(struct net *src_net, struct net_device *dev,
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index d3ce81f..c157c5d 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -100,19 +100,12 @@ struct vxlanhdr {
#define FDB_HASH_SIZE (1<<FDB_HASH_BITS)
struct vxlan_metadata {
- __be32 vni;
u32 gbp;
};
-struct vxlan_sock;
-typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
- struct vxlan_metadata *md);
-
/* per UDP socket information */
struct vxlan_sock {
struct hlist_node hlist;
- vxlan_rcv_t *rcv;
- void *data;
struct work_struct del_work;
struct socket *sock;
struct rcu_head rcu;
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (11 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 14/14] arp: Associate ARP requests with tunnel info Thomas Graf
2015-06-02 17:52 ` [ovs-dev] [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Flavio Leitner
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
From: Pravin Shelar <pshelar@nicira.com>
Removes all of the OVS specific GRE code and makes OVS use a
GRE net_device .
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
net/core/dev.c | 5 +-
net/ipv4/ip_gre.c | 161 ++++++++++++++++++++-
net/openvswitch/Makefile | 1 -
net/openvswitch/vport-gre.c | 313 -----------------------------------------
net/openvswitch/vport-netdev.c | 7 +-
5 files changed, 168 insertions(+), 319 deletions(-)
delete mode 100644 net/openvswitch/vport-gre.c
diff --git a/net/core/dev.c b/net/core/dev.c
index 594163d..656f3b4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6969,6 +6969,9 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
INIT_LIST_HEAD(&dev->ptype_all);
INIT_LIST_HEAD(&dev->ptype_specific);
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
+
+ strcpy(dev->name, name);
+ dev->name_assign_type = name_assign_type;
setup(dev);
dev->num_tx_queues = txqs;
@@ -6983,8 +6986,6 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
goto free_all;
#endif
- strcpy(dev->name, name);
- dev->name_assign_type = name_assign_type;
dev->group = INIT_NETDEV_GROUP;
if (!dev->ethtool_ops)
dev->ethtool_ops = &default_ethtool_ops;
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 5fd7064..b37515e 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -25,6 +25,7 @@
#include <linux/udp.h>
#include <linux/if_arp.h>
#include <linux/mroute.h>
+#include <linux/if_vlan.h>
#include <linux/init.h>
#include <linux/in6.h>
#include <linux/inetdevice.h>
@@ -115,6 +116,8 @@ static bool log_ecn_error = true;
module_param(log_ecn_error, bool, 0644);
MODULE_PARM_DESC(log_ecn_error, "Log packets received with corrupted ECN");
+#define GRE_TAP_FB_NAME "gretap0"
+
static struct rtnl_link_ops ipgre_link_ops __read_mostly;
static int ipgre_tunnel_init(struct net_device *dev);
@@ -217,7 +220,17 @@ static int ipgre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi)
iph->saddr, iph->daddr, tpi->key);
if (tunnel) {
+
skb_pop_mac_header(skb);
+ if (tunnel->dev == itn->fb_tunnel_dev) {
+ struct ip_tunnel_info *tun_info;
+
+ tun_info = ip_tunnel_info_alloc(0, GFP_ATOMIC);
+
+ /* TODO: setup tun info from tpi */
+ skb_attach_tunnel_info(skb, tun_info);
+ }
+
ip_tunnel_rcv(tunnel, skb, tpi, log_ecn_error);
return PACKET_RCVD;
}
@@ -287,6 +300,135 @@ out:
return NETDEV_TX_OK;
}
+/* TODO: share xmit code */
+static inline struct rtable *tunnel_route_lookup(struct net *net,
+ const struct ip_tunnel_key *key,
+ u32 mark,
+ struct flowi4 *fl,
+ u8 protocol)
+{
+ struct rtable *rt;
+
+ memset(fl, 0, sizeof(*fl));
+ fl->daddr = key->ipv4_dst;
+ fl->saddr = key->ipv4_src;
+ fl->flowi4_tos = RT_TOS(key->ipv4_tos);
+ fl->flowi4_mark = mark;
+ fl->flowi4_proto = protocol;
+
+ rt = ip_route_output_key(net, fl);
+ return rt;
+}
+
+
+/* Returns the least-significant 32 bits of a __be64. */
+static __be32 be64_get_low32(__be64 x)
+{
+#ifdef __BIG_ENDIAN
+ return (__force __be32)x;
+#else
+ return (__force __be32)((__force u64)x >> 32);
+#endif
+}
+
+static __be16 filter_tnl_flags(__be16 flags)
+{
+ return flags & (TUNNEL_CSUM | TUNNEL_KEY);
+}
+
+
+static struct sk_buff *__build_header(struct sk_buff *skb,
+ const struct ip_tunnel_info *tun_info,
+ int tunnel_hlen)
+{
+ struct tnl_ptk_info tpi;
+
+ skb = gre_handle_offloads(skb, !!(tun_info->key.tun_flags & TUNNEL_CSUM));
+ if (IS_ERR(skb))
+ return skb;
+
+ tpi.flags = filter_tnl_flags(tun_info->key.tun_flags);
+ tpi.proto = htons(ETH_P_TEB);
+ tpi.key = be64_get_low32(tun_info->key.tun_id);
+ tpi.seq = 0;
+ gre_build_header(skb, &tpi, tunnel_hlen);
+
+ return skb;
+}
+
+static netdev_tx_t gre_fb_xmit(struct sk_buff *skb,
+ struct net_device *dev)
+{
+ struct net *net = dev_net(dev);
+ struct ip_tunnel_info *tun_info;
+ const struct ip_tunnel_key *key;
+ struct flowi4 fl;
+ struct rtable *rt;
+ int min_headroom;
+ int tunnel_hlen;
+ __be16 df;
+ int err;
+
+ tun_info = skb_shinfo(skb)->tun_info;
+ if (unlikely(!tun_info)) {
+ err = -EINVAL;
+ goto err_free_skb;
+ }
+
+ key = &tun_info->key;
+
+ rt = tunnel_route_lookup(net, key, skb->mark, &fl, IPPROTO_GRE);
+ if (IS_ERR(rt)) {
+ err = PTR_ERR(rt);
+ goto err_free_skb;
+ }
+
+ tunnel_hlen = ip_gre_calc_hlen(key->tun_flags);
+
+ min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
+ + tunnel_hlen + sizeof(struct iphdr)
+ + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
+ if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
+ int head_delta = SKB_DATA_ALIGN(min_headroom -
+ skb_headroom(skb) +
+ 16);
+ err = pskb_expand_head(skb, max_t(int, head_delta, 0),
+ 0, GFP_ATOMIC);
+ if (unlikely(err))
+ goto err_free_rt;
+ }
+
+ skb = vlan_hwaccel_push_inside(skb);
+ if (unlikely(!skb)) {
+ err = -ENOMEM;
+ goto err_free_rt;
+ }
+
+ /* Push Tunnel header. */
+ skb = __build_header(skb, tun_info, tunnel_hlen);
+ if (IS_ERR(skb)) {
+ err = PTR_ERR(skb);
+ skb = NULL;
+ goto err_free_rt;
+ }
+
+ df = key->tun_flags & TUNNEL_DONT_FRAGMENT ? htons(IP_DF) : 0;
+
+ skb->ignore_df = 1;
+
+ err = iptunnel_xmit(skb->sk, rt, skb, fl.saddr,
+ key->ipv4_dst, IPPROTO_GRE,
+ key->ipv4_tos, key->ipv4_ttl, df, false);
+ skb_release_tunnel_info(skb);
+ return err;
+
+err_free_rt:
+ ip_rt_put(rt);
+err_free_skb:
+ kfree_skb(skb);
+ return err;
+}
+
static netdev_tx_t gre_tap_xmit(struct sk_buff *skb,
struct net_device *dev)
{
@@ -690,12 +832,27 @@ static const struct net_device_ops gre_tap_netdev_ops = {
.ndo_get_iflink = ip_tunnel_get_iflink,
};
+static const struct net_device_ops gre_fb_netdev_ops = {
+ .ndo_init = gre_tap_init,
+ .ndo_uninit = ip_tunnel_uninit,
+ .ndo_start_xmit = gre_fb_xmit,
+ .ndo_set_mac_address = eth_mac_addr,
+ .ndo_validate_addr = eth_validate_addr,
+ .ndo_change_mtu = ip_tunnel_change_mtu,
+ .ndo_get_stats64 = ip_tunnel_get_stats64,
+ .ndo_get_iflink = ip_tunnel_get_iflink,
+};
+
static void ipgre_tap_setup(struct net_device *dev)
{
ether_setup(dev);
- dev->netdev_ops = &gre_tap_netdev_ops;
dev->priv_flags |= IFF_LIVE_ADDR_CHANGE;
ip_tunnel_setup(dev, gre_tap_net_id);
+
+ if (!strcmp(dev->name, GRE_TAP_FB_NAME))
+ dev->netdev_ops = &gre_fb_netdev_ops;
+ else
+ dev->netdev_ops = &gre_tap_netdev_ops;
}
static int ipgre_newlink(struct net *src_net, struct net_device *dev,
@@ -851,7 +1008,7 @@ static struct rtnl_link_ops ipgre_tap_ops __read_mostly = {
static int __net_init ipgre_tap_init_net(struct net *net)
{
- return ip_tunnel_init_net(net, gre_tap_net_id, &ipgre_tap_ops, NULL);
+ return ip_tunnel_init_net(net, gre_tap_net_id, &ipgre_tap_ops, GRE_TAP_FB_NAME);
}
static void __net_exit ipgre_tap_exit_net(struct net *net)
diff --git a/net/openvswitch/Makefile b/net/openvswitch/Makefile
index 38e0e14..7153c6e 100644
--- a/net/openvswitch/Makefile
+++ b/net/openvswitch/Makefile
@@ -16,4 +16,3 @@ openvswitch-y := \
vport-netdev.o
obj-$(CONFIG_OPENVSWITCH_GENEVE)+= vport-geneve.o
-obj-$(CONFIG_OPENVSWITCH_GRE) += vport-gre.o
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
deleted file mode 100644
index b87656c..0000000
--- a/net/openvswitch/vport-gre.c
+++ /dev/null
@@ -1,313 +0,0 @@
-/*
- * Copyright (c) 2007-2014 Nicira, Inc.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
- * 02110-1301, USA
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/if.h>
-#include <linux/skbuff.h>
-#include <linux/ip.h>
-#include <linux/if_tunnel.h>
-#include <linux/if_vlan.h>
-#include <linux/in.h>
-#include <linux/in_route.h>
-#include <linux/inetdevice.h>
-#include <linux/jhash.h>
-#include <linux/list.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/workqueue.h>
-#include <linux/rculist.h>
-#include <net/route.h>
-#include <net/xfrm.h>
-
-#include <net/icmp.h>
-#include <net/ip.h>
-#include <net/ip_tunnels.h>
-#include <net/gre.h>
-#include <net/net_namespace.h>
-#include <net/netns/generic.h>
-#include <net/protocol.h>
-
-#include "datapath.h"
-#include "vport.h"
-
-static struct vport_ops ovs_gre_vport_ops;
-
-/* Returns the least-significant 32 bits of a __be64. */
-static __be32 be64_get_low32(__be64 x)
-{
-#ifdef __BIG_ENDIAN
- return (__force __be32)x;
-#else
- return (__force __be32)((__force u64)x >> 32);
-#endif
-}
-
-static __be16 filter_tnl_flags(__be16 flags)
-{
- return flags & (TUNNEL_CSUM | TUNNEL_KEY);
-}
-
-static struct sk_buff *__build_header(struct sk_buff *skb,
- int tunnel_hlen)
-{
- struct tnl_ptk_info tpi;
- const struct ip_tunnel_key *tun_key;
-
- tun_key = &OVS_CB(skb)->egress_tun_info->key;
-
- skb = gre_handle_offloads(skb, !!(tun_key->tun_flags & TUNNEL_CSUM));
- if (IS_ERR(skb))
- return skb;
-
- tpi.flags = filter_tnl_flags(tun_key->tun_flags);
- tpi.proto = htons(ETH_P_TEB);
- tpi.key = be64_get_low32(tun_key->tun_id);
- tpi.seq = 0;
- gre_build_header(skb, &tpi, tunnel_hlen);
-
- return skb;
-}
-
-static __be64 key_to_tunnel_id(__be32 key, __be32 seq)
-{
-#ifdef __BIG_ENDIAN
- return (__force __be64)((__force u64)seq << 32 | (__force u32)key);
-#else
- return (__force __be64)((__force u64)key << 32 | (__force u32)seq);
-#endif
-}
-
-/* Called with rcu_read_lock and BH disabled. */
-static int gre_rcv(struct sk_buff *skb,
- const struct tnl_ptk_info *tpi)
-{
- struct ip_tunnel_info tun_info;
- struct ovs_net *ovs_net;
- struct vport *vport;
- __be64 key;
-
- ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
- vport = rcu_dereference(ovs_net->vport_net.gre_vport);
- if (unlikely(!vport))
- return PACKET_REJECT;
-
- key = key_to_tunnel_id(tpi->key, tpi->seq);
- ip_tunnel_info_init(&tun_info, ip_hdr(skb), 0, 0, key,
- filter_tnl_flags(tpi->flags), NULL, 0);
-
- ovs_vport_receive(vport, skb, &tun_info);
- return PACKET_RCVD;
-}
-
-/* Called with rcu_read_lock and BH disabled. */
-static int gre_err(struct sk_buff *skb, u32 info,
- const struct tnl_ptk_info *tpi)
-{
- struct ovs_net *ovs_net;
- struct vport *vport;
-
- ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
- vport = rcu_dereference(ovs_net->vport_net.gre_vport);
-
- if (unlikely(!vport))
- return PACKET_REJECT;
- else
- return PACKET_RCVD;
-}
-
-static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
-{
- struct net *net = ovs_dp_get_net(vport->dp);
- const struct ip_tunnel_key *tun_key;
- struct flowi4 fl;
- struct rtable *rt;
- int min_headroom;
- int tunnel_hlen;
- __be16 df;
- int err;
-
- if (unlikely(!OVS_CB(skb)->egress_tun_info)) {
- err = -EINVAL;
- goto err_free_skb;
- }
-
- tun_key = &OVS_CB(skb)->egress_tun_info->key;
- rt = ovs_tunnel_route_lookup(net, tun_key, skb->mark, &fl, IPPROTO_GRE);
- if (IS_ERR(rt)) {
- err = PTR_ERR(rt);
- goto err_free_skb;
- }
-
- tunnel_hlen = ip_gre_calc_hlen(tun_key->tun_flags);
-
- min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
- + tunnel_hlen + sizeof(struct iphdr)
- + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
- if (skb_headroom(skb) < min_headroom || skb_header_cloned(skb)) {
- int head_delta = SKB_DATA_ALIGN(min_headroom -
- skb_headroom(skb) +
- 16);
- err = pskb_expand_head(skb, max_t(int, head_delta, 0),
- 0, GFP_ATOMIC);
- if (unlikely(err))
- goto err_free_rt;
- }
-
- skb = vlan_hwaccel_push_inside(skb);
- if (unlikely(!skb)) {
- err = -ENOMEM;
- goto err_free_rt;
- }
-
- /* Push Tunnel header. */
- skb = __build_header(skb, tunnel_hlen);
- if (IS_ERR(skb)) {
- err = PTR_ERR(skb);
- skb = NULL;
- goto err_free_rt;
- }
-
- df = tun_key->tun_flags & TUNNEL_DONT_FRAGMENT ?
- htons(IP_DF) : 0;
-
- skb->ignore_df = 1;
-
- return iptunnel_xmit(skb->sk, rt, skb, fl.saddr,
- tun_key->ipv4_dst, IPPROTO_GRE,
- tun_key->ipv4_tos, tun_key->ipv4_ttl, df, false);
-err_free_rt:
- ip_rt_put(rt);
-err_free_skb:
- kfree_skb(skb);
- return err;
-}
-
-static struct gre_cisco_protocol gre_protocol = {
- .handler = gre_rcv,
- .err_handler = gre_err,
- .priority = 1,
-};
-
-static int gre_ports;
-static int gre_init(void)
-{
- int err;
-
- gre_ports++;
- if (gre_ports > 1)
- return 0;
-
- err = gre_cisco_register(&gre_protocol);
- if (err)
- pr_warn("cannot register gre protocol handler\n");
-
- return err;
-}
-
-static void gre_exit(void)
-{
- gre_ports--;
- if (gre_ports > 0)
- return;
-
- gre_cisco_unregister(&gre_protocol);
-}
-
-static const char *gre_get_name(const struct vport *vport)
-{
- return vport_priv(vport);
-}
-
-static struct vport *gre_create(const struct vport_parms *parms)
-{
- struct net *net = ovs_dp_get_net(parms->dp);
- struct ovs_net *ovs_net;
- struct vport *vport;
- int err;
-
- err = gre_init();
- if (err)
- return ERR_PTR(err);
-
- ovs_net = net_generic(net, ovs_net_id);
- if (ovsl_dereference(ovs_net->vport_net.gre_vport)) {
- vport = ERR_PTR(-EEXIST);
- goto error;
- }
-
- vport = ovs_vport_alloc(IFNAMSIZ, &ovs_gre_vport_ops, parms);
- if (IS_ERR(vport))
- goto error;
-
- strncpy(vport_priv(vport), parms->name, IFNAMSIZ);
- rcu_assign_pointer(ovs_net->vport_net.gre_vport, vport);
- return vport;
-
-error:
- gre_exit();
- return vport;
-}
-
-static void gre_tnl_destroy(struct vport *vport)
-{
- struct net *net = ovs_dp_get_net(vport->dp);
- struct ovs_net *ovs_net;
-
- ovs_net = net_generic(net, ovs_net_id);
-
- RCU_INIT_POINTER(ovs_net->vport_net.gre_vport, NULL);
- ovs_vport_deferred_free(vport);
- gre_exit();
-}
-
-static int gre_get_egress_tun_info(struct vport *vport, struct sk_buff *skb,
- struct ip_tunnel_info *egress_tun_info)
-{
- return ovs_tunnel_get_egress_info(egress_tun_info,
- ovs_dp_get_net(vport->dp),
- OVS_CB(skb)->egress_tun_info,
- IPPROTO_GRE, skb->mark, 0, 0);
-}
-
-static struct vport_ops ovs_gre_vport_ops = {
- .type = OVS_VPORT_TYPE_GRE,
- .create = gre_create,
- .destroy = gre_tnl_destroy,
- .get_name = gre_get_name,
- .send = gre_tnl_send,
- .get_egress_tun_info = gre_get_egress_tun_info,
- .owner = THIS_MODULE,
-};
-
-static int __init ovs_gre_tnl_init(void)
-{
- return ovs_vport_ops_register(&ovs_gre_vport_ops);
-}
-
-static void __exit ovs_gre_tnl_exit(void)
-{
- ovs_vport_ops_unregister(&ovs_gre_vport_ops);
-}
-
-module_init(ovs_gre_tnl_init);
-module_exit(ovs_gre_tnl_exit);
-
-MODULE_DESCRIPTION("OVS: GRE switching port");
-MODULE_LICENSE("GPL");
-MODULE_ALIAS("vport-type-3");
diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 77703ee..a8b4360 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -43,6 +43,8 @@ static struct vport_ops ovs_netdev_vport_ops;
/* Must be called with rcu_read_lock. */
static void netdev_port_receive(struct vport *vport, struct sk_buff *skb)
{
+ struct ip_tunnel_info *tun_info;
+
if (unlikely(!vport))
goto error;
@@ -59,7 +61,10 @@ static void netdev_port_receive(struct vport *vport, struct sk_buff *skb)
skb_push(skb, ETH_HLEN);
ovs_skb_postpush_rcsum(skb, skb->data, ETH_HLEN);
- ovs_vport_receive(vport, skb, NULL);
+ tun_info = skb_shinfo(skb)->tun_info;
+ skb_shinfo(skb)->tun_info = NULL;
+ ovs_vport_receive(vport, skb, tun_info);
+ ip_tunnel_info_put(tun_info);
return;
error:
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [net-next RFC 14/14] arp: Associate ARP requests with tunnel info
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (12 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport Thomas Graf
@ 2015-06-01 14:27 ` Thomas Graf
2015-06-02 17:52 ` [ovs-dev] [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Flavio Leitner
14 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 14:27 UTC (permalink / raw)
To: netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
Since ARP performs its own route lookup call, eventually
returned tunnel metadata must be attached manually.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
net/ipv4/arp.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
index 933a928..6cf0502 100644
--- a/net/ipv4/arp.c
+++ b/net/ipv4/arp.c
@@ -489,6 +489,7 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
unsigned char *arp_ptr;
int hlen = LL_RESERVED_SPACE(dev);
int tlen = dev->needed_tailroom;
+ struct rtable *rt;
/*
* Allocate a buffer
@@ -577,6 +578,13 @@ struct sk_buff *arp_create(int type, int ptype, __be32 dest_ip,
}
memcpy(arp_ptr, &dest_ip, 4);
+ rt = ip_route_output(dev_net(dev), dest_ip, src_ip, 0, dev->ifindex);
+ if (!IS_ERR(rt)) {
+ if (rt->rt_tun_info)
+ skb_attach_tunnel_info(skb, rt->rt_tun_info);
+ ip_rt_put(rt);
+ }
+
return skb;
out:
--
2.3.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL
2015-06-01 14:27 ` [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL Thomas Graf
@ 2015-06-01 16:51 ` Robert Shearman
[not found] ` <556C8D95.7030008-43mecJUBy8ZBDgjK7y7TUQ@public.gmane.org>
0 siblings, 1 reply; 21+ messages in thread
From: Robert Shearman @ 2015-06-01 16:51 UTC (permalink / raw)
To: Thomas Graf, netdev
Cc: pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri, hannes,
marcelo.leitner, stephen, jpettit, kaber
On 01/06/15 15:27, Thomas Graf wrote:
> Introduces a new Netlink attribute RTA_TUNNEL which allows routes
> to set tunnel transmit metadata and specify the tunnel endpoint or
> tunnel id on a per route basis. The route must point to a tunnel
> device which understands per skb tunnel metadata and has been put
> into the respective mode.
We've been discussing something similar for the purposes of IP over
MPLS, but most of the attributes for IP tunnels aren't relevant for
MPLS. It be great if we can come up with something general enough that
can serve both purposes. I've just sent a patch series ("[RFC net-next
0/3] IP imposition of per-nh MPLS encap") which I believe would allow this.
Thanks,
Rob
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> include/net/ip_fib.h | 3 +++
> include/net/ip_tunnels.h | 1 -
> include/net/route.h | 10 ++++++++
> include/uapi/linux/rtnetlink.h | 16 ++++++++++++
> net/ipv4/fib_frontend.c | 57 ++++++++++++++++++++++++++++++++++++++++++
> net/ipv4/fib_semantics.c | 45 +++++++++++++++++++++++++++++++++
> net/ipv4/route.c | 30 +++++++++++++++++++++-
> net/openvswitch/vport.h | 1 +
> 8 files changed, 161 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 54271ed..1cd7cf8 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -22,6 +22,7 @@
> #include <net/fib_rules.h>
> #include <net/inetpeer.h>
> #include <linux/percpu.h>
> +#include <net/ip_tunnels.h>
>
> struct fib_config {
> u8 fc_dst_len;
> @@ -44,6 +45,7 @@ struct fib_config {
> u32 fc_flow;
> u32 fc_nlflags;
> struct nl_info fc_nlinfo;
> + struct ip_tunnel_info fc_tunnel;
> };
>
> struct fib_info;
> @@ -117,6 +119,7 @@ struct fib_info {
> #ifdef CONFIG_IP_ROUTE_MULTIPATH
> int fib_power;
> #endif
> + struct ip_tunnel_info *fib_tunnel;
> struct rcu_head rcu;
> struct fib_nh fib_nh[0];
> #define fib_dev fib_nh[0].nh_dev
> diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
> index df8cfd3..b4ab930 100644
> --- a/include/net/ip_tunnels.h
> +++ b/include/net/ip_tunnels.h
> @@ -9,7 +9,6 @@
> #include <net/dsfield.h>
> #include <net/gro_cells.h>
> #include <net/inet_ecn.h>
> -#include <net/ip.h>
> #include <net/netns/generic.h>
> #include <net/rtnetlink.h>
> #include <net/flow.h>
> diff --git a/include/net/route.h b/include/net/route.h
> index 6ede321..dbda603 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -28,6 +28,7 @@
> #include <net/inetpeer.h>
> #include <net/flow.h>
> #include <net/inet_sock.h>
> +#include <net/ip_tunnels.h>
> #include <linux/in_route.h>
> #include <linux/rtnetlink.h>
> #include <linux/rcupdate.h>
> @@ -66,6 +67,7 @@ struct rtable {
>
> struct list_head rt_uncached;
> struct uncached_list *rt_uncached_list;
> + struct ip_tunnel_info *rt_tun_info;
> };
>
> static inline bool rt_is_input_route(const struct rtable *rt)
> @@ -198,6 +200,8 @@ struct in_ifaddr;
> void fib_add_ifaddr(struct in_ifaddr *);
> void fib_del_ifaddr(struct in_ifaddr *, struct in_ifaddr *);
>
> +int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info);
> +
> static inline void ip_rt_put(struct rtable *rt)
> {
> /* dst_release() accepts a NULL parameter.
> @@ -317,9 +321,15 @@ static inline int ip4_dst_hoplimit(const struct dst_entry *dst)
>
> static inline struct ip_tunnel_info *skb_tunnel_info(struct sk_buff *skb)
> {
> + struct rtable *rt;
> +
> if (skb_shinfo(skb)->tun_info)
> return skb_shinfo(skb)->tun_info;
>
> + rt = skb_rtable(skb);
> + if (rt)
> + return rt->rt_tun_info;
> +
> return NULL;
> }
>
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 17fb02f..1f7aa68 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -286,6 +286,21 @@ enum rt_class_t {
>
> /* Routing message attributes */
>
> +enum rta_tunnel_t {
> + RTA_TUN_UNSPEC,
> + RTA_TUN_ID,
> + RTA_TUN_DST,
> + RTA_TUN_SRC,
> + RTA_TUN_TTL,
> + RTA_TUN_TOS,
> + RTA_TUN_SPORT,
> + RTA_TUN_DPORT,
> + RTA_TUN_FLAGS,
> + __RTA_TUN_MAX,
> +};
> +
> +#define RTA_TUN_MAX (__RTA_TUN_MAX - 1)
> +
> enum rtattr_type_t {
> RTA_UNSPEC,
> RTA_DST,
> @@ -308,6 +323,7 @@ enum rtattr_type_t {
> RTA_VIA,
> RTA_NEWDST,
> RTA_PREF,
> + RTA_TUNNEL, /* destination VTEP */
> __RTA_MAX
> };
>
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 872494e..bfa77a6 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -580,6 +580,57 @@ int ip_rt_ioctl(struct net *net, unsigned int cmd, void __user *arg)
> return -EINVAL;
> }
>
> +static const struct nla_policy tunnel_policy[RTA_TUN_MAX + 1] = {
> + [RTA_TUN_ID] = { .type = NLA_U64 },
> + [RTA_TUN_DST] = { .type = NLA_U32 },
> + [RTA_TUN_SRC] = { .type = NLA_U32 },
> + [RTA_TUN_TTL] = { .type = NLA_U8 },
> + [RTA_TUN_TOS] = { .type = NLA_U8 },
> + [RTA_TUN_SPORT] = { .type = NLA_U16 },
> + [RTA_TUN_DPORT] = { .type = NLA_U16 },
> + [RTA_TUN_FLAGS] = { .type = NLA_U16 },
> +};
> +
> +static int parse_rta_tunnel(struct fib_config *cfg, struct nlattr *attr)
> +{
> + struct nlattr *tb[RTA_TUN_MAX+1];
> + int err;
> +
> + err = nla_parse_nested(tb, RTA_TUN_MAX, attr, tunnel_policy);
> + if (err < 0)
> + return err;
> +
> + if (tb[RTA_TUN_ID])
> + cfg->fc_tunnel.key.tun_id = nla_get_u64(tb[RTA_TUN_ID]);
> +
> + if (tb[RTA_TUN_DST])
> + cfg->fc_tunnel.key.ipv4_dst = nla_get_be32(tb[RTA_TUN_DST]);
> +
> + if (tb[RTA_TUN_SRC])
> + cfg->fc_tunnel.key.ipv4_src = nla_get_be32(tb[RTA_TUN_SRC]);
> +
> + if (tb[RTA_TUN_TTL])
> + cfg->fc_tunnel.key.ipv4_ttl = nla_get_u8(tb[RTA_TUN_TTL]);
> +
> + if (tb[RTA_TUN_TOS])
> + cfg->fc_tunnel.key.ipv4_tos = nla_get_u8(tb[RTA_TUN_TOS]);
> +
> + if (tb[RTA_TUN_SPORT])
> + cfg->fc_tunnel.key.tp_src = nla_get_be16(tb[RTA_TUN_SPORT]);
> +
> + if (tb[RTA_TUN_DPORT])
> + cfg->fc_tunnel.key.tp_dst = nla_get_be16(tb[RTA_TUN_DPORT]);
> +
> + if (tb[RTA_TUN_FLAGS])
> + cfg->fc_tunnel.key.tun_flags = nla_get_u16(tb[RTA_TUN_FLAGS]);
> +
> + cfg->fc_tunnel.mode = IP_TUNNEL_INFO_TX;
> + cfg->fc_tunnel.options = NULL;
> + cfg->fc_tunnel.options_len = 0;
> +
> + return 0;
> +}
> +
> const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
> [RTA_DST] = { .type = NLA_U32 },
> [RTA_SRC] = { .type = NLA_U32 },
> @@ -591,6 +642,7 @@ const struct nla_policy rtm_ipv4_policy[RTA_MAX + 1] = {
> [RTA_METRICS] = { .type = NLA_NESTED },
> [RTA_MULTIPATH] = { .len = sizeof(struct rtnexthop) },
> [RTA_FLOW] = { .type = NLA_U32 },
> + [RTA_TUNNEL] = { .type = NLA_NESTED },
> };
>
> static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
> @@ -656,6 +708,11 @@ static int rtm_to_fib_config(struct net *net, struct sk_buff *skb,
> case RTA_TABLE:
> cfg->fc_table = nla_get_u32(attr);
> break;
> + case RTA_TUNNEL:
> + err = parse_rta_tunnel(cfg, attr);
> + if (err < 0)
> + goto errout;
> + break;
> }
> }
>
> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 28ec3c1..1e94c81 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c
> @@ -215,6 +215,9 @@ static void free_fib_info_rcu(struct rcu_head *head)
>
> if (fi->fib_metrics != (u32 *) dst_default_metrics)
> kfree(fi->fib_metrics);
> +
> + ip_tunnel_info_put(fi->fib_tunnel);
> +
> kfree(fi);
> }
>
> @@ -760,6 +763,7 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
> struct fib_info *ofi;
> int nhs = 1;
> struct net *net = cfg->fc_nlinfo.nl_net;
> + struct ip_tunnel_info *tun_info = NULL;
>
> if (cfg->fc_type > RTN_MAX)
> goto err_inval;
> @@ -856,6 +860,19 @@ struct fib_info *fib_create_info(struct fib_config *cfg)
> }
> }
>
> + if (cfg->fc_tunnel.mode) {
> + /* TODO: Allow specification of options */
> + tun_info = ip_tunnel_info_alloc(0, GFP_KERNEL);
> + if (!tun_info) {
> + err = -ENOMEM;
> + goto failure;
> + }
> +
> + memcpy(tun_info, &cfg->fc_tunnel, sizeof(*tun_info));
> + ip_tunnel_info_get(tun_info);
> + fi->fib_tunnel = tun_info;
> + }
> +
> if (cfg->fc_mp) {
> #ifdef CONFIG_IP_ROUTE_MULTIPATH
> err = fib_get_nhs(fi, cfg->fc_mp, cfg->fc_mp_len, cfg);
> @@ -975,6 +992,8 @@ err_inval:
> err = -EINVAL;
>
> failure:
> + kfree(tun_info);
> +
> if (fi) {
> fi->fib_dead = 1;
> free_fib_info(fi);
> @@ -983,6 +1002,29 @@ failure:
> return ERR_PTR(err);
> }
>
> +int fib_dump_tun_info(struct sk_buff *skb, struct ip_tunnel_info *tun_info)
> +{
> + struct nlattr *tun_attr;
> +
> + tun_attr = nla_nest_start(skb, RTA_TUNNEL);
> + if (!tun_attr)
> + return -ENOMEM;
> +
> + if (nla_put_u64(skb, RTA_TUN_ID, tun_info->key.tun_id) ||
> + nla_put_be32(skb, RTA_TUN_DST, tun_info->key.ipv4_dst) ||
> + nla_put_be32(skb, RTA_TUN_SRC, tun_info->key.ipv4_src) ||
> + nla_put_u8(skb, RTA_TUN_TOS, tun_info->key.ipv4_tos) ||
> + nla_put_u8(skb, RTA_TUN_TTL, tun_info->key.ipv4_ttl) ||
> + nla_put_u16(skb, RTA_TUN_SPORT, tun_info->key.tp_src) ||
> + nla_put_u16(skb, RTA_TUN_DPORT, tun_info->key.tp_dst) ||
> + nla_put_u16(skb, RTA_TUN_FLAGS, tun_info->key.tun_flags))
> + return -ENOMEM;
> +
> + nla_nest_end(skb, tun_attr);
> +
> + return 0;
> +}
> +
> int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
> u32 tb_id, u8 type, __be32 dst, int dst_len, u8 tos,
> struct fib_info *fi, unsigned int flags)
> @@ -1068,6 +1110,9 @@ int fib_dump_info(struct sk_buff *skb, u32 portid, u32 seq, int event,
> nla_nest_end(skb, mp);
> }
> #endif
> + if (fi->fib_tunnel && fib_dump_tun_info(skb, fi->fib_tunnel))
> + goto nla_put_failure;
> +
> nlmsg_end(skb, nlh);
> return 0;
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 6e8e1be..f53c62f 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1356,6 +1356,8 @@ static void ipv4_dst_destroy(struct dst_entry *dst)
> list_del(&rt->rt_uncached);
> spin_unlock_bh(&ul->lock);
> }
> +
> + ip_tunnel_info_put(rt->rt_tun_info);
> }
>
> void rt_flush_dev(struct net_device *dev)
> @@ -1489,6 +1491,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
> rth->rt_gateway = 0;
> rth->rt_uses_gateway = 0;
> INIT_LIST_HEAD(&rth->rt_uncached);
> + rth->rt_tun_info = NULL;
> if (our) {
> rth->dst.input= ip_local_deliver;
> rth->rt_flags |= RTCF_LOCAL;
> @@ -1543,6 +1546,7 @@ static int __mkroute_input(struct sk_buff *skb,
> struct in_device *in_dev,
> __be32 daddr, __be32 saddr, u32 tos)
> {
> + struct fib_info *fi = res->fi;
> struct fib_nh_exception *fnhe;
> struct rtable *rth;
> int err;
> @@ -1590,7 +1594,7 @@ static int __mkroute_input(struct sk_buff *skb,
> }
>
> fnhe = find_exception(&FIB_RES_NH(*res), daddr);
> - if (do_cache) {
> + if (do_cache && !(fi && fi->fib_tunnel)) {
> if (fnhe)
> rth = rcu_dereference(fnhe->fnhe_rth_input);
> else
> @@ -1621,6 +1625,13 @@ static int __mkroute_input(struct sk_buff *skb,
> INIT_LIST_HEAD(&rth->rt_uncached);
> RT_CACHE_STAT_INC(in_slow_tot);
>
> + if (fi && fi->fib_tunnel) {
> + ip_tunnel_info_get(fi->fib_tunnel);
> + rth->rt_tun_info = fi->fib_tunnel;
> + } else {
> + rth->rt_tun_info = NULL;
> + }
> +
> rth->dst.input = ip_forward;
> rth->dst.output = ip_output;
>
> @@ -1794,6 +1805,7 @@ local_input:
> rth->rt_gateway = 0;
> rth->rt_uses_gateway = 0;
> INIT_LIST_HEAD(&rth->rt_uncached);
> + rth->rt_tun_info = NULL;
> RT_CACHE_STAT_INC(in_slow_tot);
> if (res.type == RTN_UNREACHABLE) {
> rth->dst.input= ip_error;
> @@ -1940,6 +1952,11 @@ static struct rtable *__mkroute_output(const struct fib_result *res,
>
> fnhe = NULL;
> do_cache &= fi != NULL;
> +
> + /* Force dst for flows with tunnel encapsulation */
> + if (fi && fi->fib_tunnel)
> + goto add;
> +
> if (do_cache) {
> struct rtable __rcu **prth;
> struct fib_nh *nh = &FIB_RES_NH(*res);
> @@ -1984,6 +2001,13 @@ add:
> rth->rt_uses_gateway = 0;
> INIT_LIST_HEAD(&rth->rt_uncached);
>
> + if (fi && fi->fib_tunnel) {
> + ip_tunnel_info_get(fi->fib_tunnel);
> + rth->rt_tun_info = fi->fib_tunnel;
> + } else {
> + rth->rt_tun_info = NULL;
> + }
> +
> RT_CACHE_STAT_INC(out_slow_tot);
>
> if (flags & RTCF_LOCAL)
> @@ -2263,6 +2287,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
> rt->rt_uses_gateway = ort->rt_uses_gateway;
>
> INIT_LIST_HEAD(&rt->rt_uncached);
> + rt->rt_tun_info = NULL;
>
> dst_free(new);
> }
> @@ -2394,6 +2419,9 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
> if (rtnl_put_cacheinfo(skb, &rt->dst, 0, expires, error) < 0)
> goto nla_put_failure;
>
> + if (rt->rt_tun_info && fib_dump_tun_info(skb, rt->rt_tun_info))
> + goto nla_put_failure;
> +
> nlmsg_end(skb, nlh);
> return 0;
>
> diff --git a/net/openvswitch/vport.h b/net/openvswitch/vport.h
> index 4750fb6..75d6824 100644
> --- a/net/openvswitch/vport.h
> +++ b/net/openvswitch/vport.h
> @@ -27,6 +27,7 @@
> #include <linux/skbuff.h>
> #include <linux/spinlock.h>
> #include <linux/u64_stats_sync.h>
> +#include <net/route.h>
>
> #include "datapath.h"
>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL
[not found] ` <556C8D95.7030008-43mecJUBy8ZBDgjK7y7TUQ@public.gmane.org>
@ 2015-06-01 23:26 ` Thomas Graf
0 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-01 23:26 UTC (permalink / raw)
To: Robert Shearman
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
marcelo.leitner-Re5JQEeQqe8AvxtiuMwx3w,
jiri-rHqAuBHg3fBzbRFIqnYvSA, daniel-FeC+5ew28dpmcu3hnIyYJQ,
netdev-u79uwXL29TY76Z2rM5mHXA, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
kaber-dcUjhNyLwpNeoWH0uzbU5w,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r,
tom-BjP2VixgY4xUbtYUoyoikg, davem-fT/PcQaiUtIeIZ0/mPfg9Q
On 06/01/15 at 05:51pm, Robert Shearman wrote:
> On 01/06/15 15:27, Thomas Graf wrote:
> >Introduces a new Netlink attribute RTA_TUNNEL which allows routes
> >to set tunnel transmit metadata and specify the tunnel endpoint or
> >tunnel id on a per route basis. The route must point to a tunnel
> >device which understands per skb tunnel metadata and has been put
> >into the respective mode.
>
> We've been discussing something similar for the purposes of IP over MPLS,
> but most of the attributes for IP tunnels aren't relevant for MPLS. It be
> great if we can come up with something general enough that can serve both
> purposes. I've just sent a patch series ("[RFC net-next 0/3] IP imposition
> of per-nh MPLS encap") which I believe would allow this.
Nice! On a first glance, your series looks like an excellent complement
to this series. I'll comment directly in your series.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [ovs-dev] [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
` (13 preceding siblings ...)
2015-06-01 14:27 ` [net-next RFC 14/14] arp: Associate ARP requests with tunnel info Thomas Graf
@ 2015-06-02 17:52 ` Flavio Leitner
14 siblings, 0 replies; 21+ messages in thread
From: Flavio Leitner @ 2015-06-02 17:52 UTC (permalink / raw)
To: Thomas Graf
Cc: netdev, dev, marcelo.leitner, jiri, daniel, tom, edumazet, kaber,
stephen, hannes, davem
It seems patch 01 didn't make it to ovs dev mailing list,
but it is available on netdev mailing list.
fbl
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices
[not found] ` <cover.1433167295.git.tgraf-G/eBtMaohhA@public.gmane.org>
2015-06-01 14:27 ` [net-next RFC 10/14] openvswitch: Abstract vport name through ovs_vport_name() Thomas Graf
@ 2015-06-02 19:02 ` Eric W. Biederman
1 sibling, 0 replies; 21+ messages in thread
From: Eric W. Biederman @ 2015-06-02 19:02 UTC (permalink / raw)
To: Thomas Graf
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
marcelo.leitner-Re5JQEeQqe8AvxtiuMwx3w,
jiri-rHqAuBHg3fBzbRFIqnYvSA, daniel-FeC+5ew28dpmcu3hnIyYJQ,
netdev-u79uwXL29TY76Z2rM5mHXA, roopa,
edumazet-hpIqsD4AKlfQT0dZR+AlfA, kaber-dcUjhNyLwpNeoWH0uzbU5w,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r,
tom-BjP2VixgY4xUbtYUoyoikg, Robert Shearman,
davem-fT/PcQaiUtIeIZ0/mPfg9Q
Thomas Graf <tgraf@suug.ch> writes:
> This is the first series in a greater effort to bring the scalability
> and programmability advantages of OVS to the rest of the network
> stack and to get rid of as much OVS specific code as possible.
>
> This first series focuses on getting rid of OVS tunnel vports and use
> regular tunnel net_devices instead. As part of this effort, the
> routing subsystem is extended with support for flow based tunneling.
> In this new tunneling mode, the route is able to match on tunnel
> information as well as set tunnel encapsulation parameters per route.
> This allows to perform L3 forwarding for a large number of tunnel
> endpoints and virtual networks using a single tunnel net_device.
This is a different direction than I was imagining things evolving when
I was looking at mpls. However there is a lot of overlap.
I get the imperession there are two directions you are looking at:
- Allowing more configurable keeps in route based lookup.
- Reducing the costs of the tunnels.
We already have a similar subsystem xfrm.
If we are going to use more flexible keys when lookup up routes, if it
is reasonably possible (while maintaining performance) I suggest we use
the xfrm data structure or more likely rework xfrm on top of the new
data structures. That way there is less code to maintain overall.
Certainly any work that plays with tunnels a new way to do tunnels in
the kernel needs to answer the question. Why not xfrm. As xfrm already
exists to do exactly that job.
I think a clumsy api and excess flexibility start to be an answer for
mpls ingress. Just using the existing routing table can result in
cleaner faster code with a better userspace API. But I still think the
mpls case where we attach labels needs to answer that case.
If you are using flow based flexibility from openvswitch I think why not
use xfrm becomes a more challenge question to answer.
Eric
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [net-next RFC 08/14] openvswitch: Allocate & attach ip_tunnel_info for tunnel set action
2015-06-01 14:27 ` [net-next RFC 08/14] openvswitch: Allocate & attach ip_tunnel_info for tunnel set action Thomas Graf
@ 2015-06-03 15:29 ` Jiri Benc
2015-06-03 22:07 ` Thomas Graf
0 siblings, 1 reply; 21+ messages in thread
From: Jiri Benc @ 2015-06-03 15:29 UTC (permalink / raw)
To: Thomas Graf
Cc: netdev, pshelar, jesse, davem, daniel, dev, tom, edumazet, jiri,
hannes, marcelo.leitner, stephen, jpettit, kaber
On Mon, 1 Jun 2015 16:27:32 +0200, Thomas Graf wrote:
> --- a/net/openvswitch/flow.h
> +++ b/net/openvswitch/flow.h
> @@ -45,6 +45,11 @@ struct sk_buff;
> #define TUN_METADATA_OPTS(flow_key, opt_len) \
> ((void *)((flow_key)->tun_opts + TUN_METADATA_OFFSET(opt_len)))
>
> +struct ovs_tunnel_info
> +{
> + struct ip_tunnel_info *info;
> +};
Why do you keep this structure? It doesn't seem it's useful.
Jiri
--
Jiri Benc
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [net-next RFC 08/14] openvswitch: Allocate & attach ip_tunnel_info for tunnel set action
2015-06-03 15:29 ` Jiri Benc
@ 2015-06-03 22:07 ` Thomas Graf
0 siblings, 0 replies; 21+ messages in thread
From: Thomas Graf @ 2015-06-03 22:07 UTC (permalink / raw)
To: Jiri Benc
Cc: dev-yBygre7rU0TnMu66kgdUjQ,
marcelo.leitner-Re5JQEeQqe8AvxtiuMwx3w,
jiri-rHqAuBHg3fBzbRFIqnYvSA, daniel-FeC+5ew28dpmcu3hnIyYJQ,
netdev-u79uwXL29TY76Z2rM5mHXA, edumazet-hpIqsD4AKlfQT0dZR+AlfA,
kaber-dcUjhNyLwpNeoWH0uzbU5w,
stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
hannes-tFNcAqjVMyqKXQKiL6tip0B+6BGkLq7r,
tom-BjP2VixgY4xUbtYUoyoikg, davem-fT/PcQaiUtIeIZ0/mPfg9Q
On 06/03/15 at 05:29pm, Jiri Benc wrote:
> On Mon, 1 Jun 2015 16:27:32 +0200, Thomas Graf wrote:
> > --- a/net/openvswitch/flow.h
> > +++ b/net/openvswitch/flow.h
> > @@ -45,6 +45,11 @@ struct sk_buff;
> > #define TUN_METADATA_OPTS(flow_key, opt_len) \
> > ((void *)((flow_key)->tun_opts + TUN_METADATA_OFFSET(opt_len)))
> >
> > +struct ovs_tunnel_info
> > +{
> > + struct ip_tunnel_info *info;
> > +};
>
> Why do you keep this structure? It doesn't seem it's useful.
It's the structure which defines the payload of the OVS set
tunnel action. Those actions are configured as Netlink attributes
only visible to the kernel. Since we allocate the ip_tunnel_info
to attach it to the packets passing by, we only need to keep a
pointer as the config to the set action. We could also do
(struct ip_tunnel_info *) nla_data(...) and store the pointer
without defining a struct. I found this more readable. Happy to
change if you like it better without a struct around it.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2015-06-03 22:07 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-06-01 14:27 [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Thomas Graf
2015-06-01 14:27 ` [net-next RFC 01/14] ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic Thomas Graf
2015-06-01 14:27 ` [net-next RFC 02/14] ip_tunnel: support per packet tunnel metadata Thomas Graf
2015-06-01 14:27 ` [net-next RFC 03/14] vxlan: Flow based tunneling Thomas Graf
2015-06-01 14:27 ` [net-next RFC 04/14] route: Extend flow representation with tunnel key Thomas Graf
2015-06-01 14:27 ` [net-next RFC 05/14] route: Per route tunnel metadata with RTA_TUNNEL Thomas Graf
2015-06-01 16:51 ` Robert Shearman
[not found] ` <556C8D95.7030008-43mecJUBy8ZBDgjK7y7TUQ@public.gmane.org>
2015-06-01 23:26 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 06/14] fib: Add fib rule match on tunnel id Thomas Graf
2015-06-01 14:27 ` [net-next RFC 07/14] vxlan: Factor out device configuration Thomas Graf
2015-06-01 14:27 ` [net-next RFC 08/14] openvswitch: Allocate & attach ip_tunnel_info for tunnel set action Thomas Graf
2015-06-03 15:29 ` Jiri Benc
2015-06-03 22:07 ` Thomas Graf
2015-06-01 14:27 ` [net-next RFC 09/14] openvswitch: Move dev pointer into vport itself Thomas Graf
[not found] ` <cover.1433167295.git.tgraf-G/eBtMaohhA@public.gmane.org>
2015-06-01 14:27 ` [net-next RFC 10/14] openvswitch: Abstract vport name through ovs_vport_name() Thomas Graf
2015-06-02 19:02 ` [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Eric W. Biederman
2015-06-01 14:27 ` [net-next RFC 11/14] openvswitch: Use regular VXLAN net_device device Thomas Graf
2015-06-01 14:27 ` [net-next RFC 12/14] vxlan: remove indirect call to vxlan_rcv() and vni member Thomas Graf
2015-06-01 14:27 ` [net-next RFC 13/14] openvswitch: Use regular GRE net_device instead of vport Thomas Graf
2015-06-01 14:27 ` [net-next RFC 14/14] arp: Associate ARP requests with tunnel info Thomas Graf
2015-06-02 17:52 ` [ovs-dev] [net-next RFC 00/14] Convert OVS tunnel vports to use regular net_devices Flavio Leitner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).