* [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
@ 2016-08-14 14:06 Amir Vadai
2016-08-14 14:06 ` [RFC net-next 1/2] net/sched: cls_flower: Introduce classify by vxlan outer headers Amir Vadai
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Amir Vadai @ 2016-08-14 14:06 UTC (permalink / raw)
To: Jamal Hadi Salim, Jiri Pirko
Cc: netdev, Or Gerlitz, Hadar Har-Zion, Oded Shanoon, Amir Vadai
From: Amir Vadai <amirva@mellanox.com>>
Hi,
I would like to make it possible to manage VXLAN encap/decap using the flower
classifier, mirred action and vxlan device.
In order to make the solution scaleable, I'm using a shared vxlan device, with
encapsulation information packed in the metadata - by the mirred action in the
encap flow, and used in the decap flow, by the flower classifier.
For example for virt use case:
# [uplink NIC] --{cls_flower & mirred}--> [vxlan dev] --{udp/ip stack}--> [tap]
# [tap dev] --{udp/ip stack}--> [vxlan dev] --{cls_flower & mirred}--> [uplink NIC]
# In the example, vxlan tunnel ip's are 11.11.11.* and the real devices ip's
# are: 11.11.0.*
ip link add $VXLAN type vxlan dstport 4789 external
ifconfig $VXLAN up
tc qdisc add dev $ETH ingress
# ENCAP rule for ARP
tc filter add dev $ETH protocol 0x806 parent ffff: prio 11 \
flower \
action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
# ENCAP rule for ICMP
tc filter add dev $ETH protocol ip parent ffff: prio 10 \
flower ip_proto 1 \
action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
tc qdisc add dev $VXLAN ingress
# DECAP rule for ARP
tc filter add dev $VXLAN protocol 0x806 parent ffff: prio 11 \
flower enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
action mirred egress redirect dev $ETH
# DECAP rule for ICMP
tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
flower enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
action mirred egress redirect dev $ETH
Next step will be to enable offloading of those rules.
Following two patches to cls_flower and act_mirred were used to validate and
test this approach, and supplied to make things clearer, they will be modified
before the actual submission.
Thanks,
Amir
Amir Vadai (2):
net/sched: cls_flower: Introduce classify by vxlan outer headers
net/sched: act_mirred: Introduce vxlan support
include/net/tc_act/tc_mirred.h | 5 +++
include/uapi/linux/pkt_cls.h | 11 +++++
include/uapi/linux/tc_act/tc_mirred.h | 7 ++++
net/sched/act_mirred.c | 79 +++++++++++++++++++++++++++++++++++
net/sched/cls_flower.c | 53 +++++++++++++++++++++++
5 files changed, 155 insertions(+)
--
2.9.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [RFC net-next 1/2] net/sched: cls_flower: Introduce classify by vxlan outer headers
2016-08-14 14:06 [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Amir Vadai
@ 2016-08-14 14:06 ` Amir Vadai
2016-08-14 14:06 ` [RFC net-next 2/2] net/sched: act_mirred: Introduce vxlan support Amir Vadai
2016-08-14 17:53 ` [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Cong Wang
2 siblings, 0 replies; 20+ messages in thread
From: Amir Vadai @ 2016-08-14 14:06 UTC (permalink / raw)
To: Jamal Hadi Salim, Jiri Pirko
Cc: netdev, Or Gerlitz, Hadar Har-Zion, Oded Shanoon, Amir Vadai
From: Amir Vadai <amirva@mellanox.com>>
Signed-off-by: Amir Vadai <amirva@mellanox.com>>
---
include/uapi/linux/pkt_cls.h | 11 +++++++++
net/sched/cls_flower.c | 53 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 64 insertions(+)
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index d1c1ccaba787..a192195a5516 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -428,6 +428,17 @@ enum {
TCA_FLOWER_KEY_UDP_DST, /* be16 */
TCA_FLOWER_FLAGS,
+
+ TCA_FLOWER_KEY_ENC_IPV4_SRC, /* be32 */
+ TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,/* be32 */
+ TCA_FLOWER_KEY_ENC_IPV4_DST, /* be32 */
+ TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,/* be32 */
+ TCA_FLOWER_KEY_ENC_IPV6_SRC, /* struct in6_addr */
+ TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK, /* struct in6_addr */
+ TCA_FLOWER_KEY_ENC_IPV6_DST, /* struct in6_addr */
+ TCA_FLOWER_KEY_ENC_IPV6_DST_MASK, /* struct in6_addr */
+ TCA_FLOWER_KEY_ENC_KEY_ID, /* be32 */
+
__TCA_FLOWER_MAX,
};
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 5060801a2f6d..26436dd34e21 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -23,12 +23,18 @@
#include <net/ip.h>
#include <net/flow_dissector.h>
+#include <net/dst.h>
+#include <net/dst_metadata.h>
+#include <net/vxlan.h>
+
struct fl_flow_key {
int indev_ifindex;
struct flow_dissector_key_control control;
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_addrs ipaddrs;
+ struct flow_dissector_key_ipv4_addrs enc_ipv4;
+ struct flow_dissector_key_keyid enc_key_id;
union {
struct flow_dissector_key_ipv4_addrs ipv4;
struct flow_dissector_key_ipv6_addrs ipv6;
@@ -123,11 +129,27 @@ static int fl_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct cls_fl_filter *f;
struct fl_flow_key skb_key;
struct fl_flow_key skb_mkey;
+ struct ip_tunnel_info *info;
if (!atomic_read(&head->ht.nelems))
return -1;
fl_clear_masked_range(&skb_key, &head->mask);
+
+ info = skb_tunnel_info(skb);
+ if (info) {
+ struct ip_tunnel_key *key = &info->key;
+ netdev_err(skb->dev, "%s:%d saddr: %pI4, daddr: %pI4 vni: %d tos: %#x ttl: %#x src_port: %d dst_port: %d\n",
+ __func__, __LINE__,
+ &key->u.ipv4.src, &key->u.ipv4.dst,
+ be32_to_cpu(vxlan_tun_id_to_vni(key->tun_id)),
+ key->tos, key->ttl,
+ ntohs(key->tp_src), ntohs(key->tp_dst));
+ skb_key.enc_ipv4.src = key->u.ipv4.src;
+ skb_key.enc_ipv4.dst = key->u.ipv4.dst;
+ skb_key.enc_key_id.keyid = vxlan_tun_id_to_vni(key->tun_id);
+ }
+
skb_key.indev_ifindex = skb->skb_iif;
/* skb_flow_dissect() does not set n_proto in case an unknown protocol,
* so do it rather here.
@@ -293,6 +315,12 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 1] = {
[TCA_FLOWER_KEY_TCP_DST] = { .type = NLA_U16 },
[TCA_FLOWER_KEY_UDP_SRC] = { .type = NLA_U16 },
[TCA_FLOWER_KEY_UDP_DST] = { .type = NLA_U16 },
+
+ [TCA_FLOWER_KEY_ENC_IPV4_SRC] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV4_DST] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] = { .type = NLA_U32 },
+ [TCA_FLOWER_KEY_ENC_KEY_ID] = { .type = NLA_U32 },
};
static void fl_set_key_val(struct nlattr **tb,
@@ -373,6 +401,20 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
sizeof(key->tp.dst));
}
+ if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
+ tb[TCA_FLOWER_KEY_ENC_IPV4_DST] ||
+ tb[TCA_FLOWER_KEY_ENC_KEY_ID]) {
+ fl_set_key_val(tb, &key->enc_ipv4.src, TCA_FLOWER_KEY_ENC_IPV4_SRC,
+ &mask->enc_ipv4.src, TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+ sizeof(key->enc_ipv4.src));
+ fl_set_key_val(tb, &key->enc_ipv4.dst, TCA_FLOWER_KEY_ENC_IPV4_DST,
+ &mask->enc_ipv4.dst, TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+ sizeof(key->enc_ipv4.dst));
+ fl_set_key_val(tb, &key->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+ &mask->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+ sizeof(key->enc_key_id));
+ }
+
return 0;
}
@@ -753,6 +795,17 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
sizeof(key->tp.dst))))
goto nla_put_failure;
+ if (fl_dump_key_val(skb, &key->enc_ipv4.src, TCA_FLOWER_KEY_ENC_IPV4_SRC,
+ &mask->enc_ipv4.src, TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+ sizeof(key->enc_ipv4.src)) ||
+ fl_dump_key_val(skb, &key->enc_ipv4.dst, TCA_FLOWER_KEY_ENC_IPV4_DST,
+ &mask->enc_ipv4.dst, TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+ sizeof(key->enc_ipv4.dst)) ||
+ fl_dump_key_val(skb, &key->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+ &mask->enc_key_id, TCA_FLOWER_KEY_ENC_KEY_ID,
+ sizeof(key->enc_key_id)))
+ goto nla_put_failure;
+
nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
if (tcf_exts_dump(skb, &f->exts))
--
2.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [RFC net-next 2/2] net/sched: act_mirred: Introduce vxlan support
2016-08-14 14:06 [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Amir Vadai
2016-08-14 14:06 ` [RFC net-next 1/2] net/sched: cls_flower: Introduce classify by vxlan outer headers Amir Vadai
@ 2016-08-14 14:06 ` Amir Vadai
2016-08-14 17:53 ` [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Cong Wang
2 siblings, 0 replies; 20+ messages in thread
From: Amir Vadai @ 2016-08-14 14:06 UTC (permalink / raw)
To: Jamal Hadi Salim, Jiri Pirko
Cc: netdev, Or Gerlitz, Hadar Har-Zion, Oded Shanoon, Amir Vadai
From: Amir Vadai <amirva@mellanox.com>>
Signed-off-by: Amir Vadai <amirva@mellanox.com>>
---
include/net/tc_act/tc_mirred.h | 5 +++
include/uapi/linux/tc_act/tc_mirred.h | 7 ++++
net/sched/act_mirred.c | 79 +++++++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+)
diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
index 62770add15bd..43704c5550ab 100644
--- a/include/net/tc_act/tc_mirred.h
+++ b/include/net/tc_act/tc_mirred.h
@@ -11,6 +11,11 @@ struct tcf_mirred {
int tcfm_ok_push;
struct net_device __rcu *tcfm_dev;
struct list_head tcfm_list;
+ struct metadata_dst *tun_dst;
+ __be32 tcf_enc_saddr;
+ __be32 tcf_enc_daddr;
+ __be32 tcf_enc_key_id;
+ __be16 tcf_enc_port;
};
#define to_mirred(a) ((struct tcf_mirred *)a)
diff --git a/include/uapi/linux/tc_act/tc_mirred.h b/include/uapi/linux/tc_act/tc_mirred.h
index 3d7a2b352a62..89ae754d8f5e 100644
--- a/include/uapi/linux/tc_act/tc_mirred.h
+++ b/include/uapi/linux/tc_act/tc_mirred.h
@@ -21,6 +21,13 @@ enum {
TCA_MIRRED_TM,
TCA_MIRRED_PARMS,
TCA_MIRRED_PAD,
+
+ TCA_MIRRED_ENC_IPV4_SRC, /* be32 */
+ TCA_MIRRED_ENC_IPV4_DST, /* be32 */
+ TCA_MIRRED_ENC_IPV6_SRC, /* struct in6_addr */
+ TCA_MIRRED_ENC_IPV6_DST, /* struct in6_addr */
+ TCA_MIRRED_ENC_KEY_ID, /* be32 */
+ TCA_MIRRED_ENC_DST_PORT, /* be16 */
__TCA_MIRRED_MAX
};
#define TCA_MIRRED_MAX (__TCA_MIRRED_MAX - 1)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 6038c85d92f5..3aff8d8b2744 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -26,6 +26,9 @@
#include <net/pkt_sched.h>
#include <linux/tc_act/tc_mirred.h>
#include <net/tc_act/tc_mirred.h>
+#include <net/dst.h>
+#include <net/dst_metadata.h>
+#include <net/vxlan.h>
#include <linux/if_arp.h>
@@ -38,6 +41,11 @@ static void tcf_mirred_release(struct tc_action *a, int bind)
struct tcf_mirred *m = to_mirred(a);
struct net_device *dev;
+ if (m->tun_dst) {
+ printk("%s:%d - releasing dst: %p\n", __func__, __LINE__, m->tun_dst);
+ dst_release((struct dst_entry *)m->tun_dst);
+ }
+
/* We could be called either in a RCU callback or with RTNL lock held. */
spin_lock_bh(&mirred_list_lock);
list_del(&m->tcfm_list);
@@ -49,11 +57,67 @@ static void tcf_mirred_release(struct tc_action *a, int bind)
static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = {
[TCA_MIRRED_PARMS] = { .len = sizeof(struct tc_mirred) },
+ [TCA_MIRRED_ENC_IPV4_SRC] = { .type = NLA_U32 },
+ [TCA_MIRRED_ENC_IPV4_DST] = { .type = NLA_U32 },
+ [TCA_MIRRED_ENC_KEY_ID] = { .type = NLA_U32 },
+ [TCA_MIRRED_ENC_DST_PORT] = { .type = NLA_U16 },
};
static int mirred_net_id;
static struct tc_action_ops act_mirred_ops;
+static int tunnel_alloc(struct tcf_mirred *m, struct nlattr **tb)
+{
+ struct ip_tunnel_info *tun_info;
+ struct metadata_dst *tun_dst;
+ struct vxlan_metadata md = { 0 };
+ u8 tos = 0;
+ u8 ttl = 0;
+ __be16 tun_flags = TUNNEL_VXLAN_OPT;
+ int err;
+
+ m->tcf_enc_saddr = nla_get_be32(tb[TCA_MIRRED_ENC_IPV4_SRC]);
+ m->tcf_enc_daddr = nla_get_be32(tb[TCA_MIRRED_ENC_IPV4_DST]);
+ m->tcf_enc_key_id = nla_get_be32(tb[TCA_MIRRED_ENC_KEY_ID]);
+ m->tcf_enc_port = nla_get_be32(tb[TCA_MIRRED_ENC_DST_PORT]);
+
+ if (!m->tcf_enc_saddr || !m->tcf_enc_daddr ||
+ !m->tcf_enc_key_id || !m->tcf_enc_port)
+ return 0;
+
+ tun_dst = metadata_dst_alloc(sizeof(md), GFP_KERNEL);
+ if (!tun_dst)
+ return -ENOMEM;
+ printk("%s:%d allocated dst: %p\n", __func__, __LINE__, tun_dst);
+
+ printk("%s:%d mirred vxlan saddr: %pI4 daddr: %pI4 key_id: %d port: %d\n",
+ __func__, __LINE__,
+ &m->tcf_enc_saddr, &m->tcf_enc_daddr,
+ be32_to_cpu(m->tcf_enc_key_id), be16_to_cpu(m->tcf_enc_port));
+
+ err = dst_cache_init(&tun_dst->u.tun_info.dst_cache, GFP_KERNEL);
+ if (err) {
+ dst_release((struct dst_entry *)tun_dst);
+ return err;
+ }
+
+ tun_info = &tun_dst->u.tun_info;
+ tun_info->mode = IP_TUNNEL_INFO_TX;
+
+ ip_tunnel_key_init(&tun_info->key,
+ m->tcf_enc_saddr, m->tcf_enc_daddr,
+ tos, ttl,
+ 0, 0,
+ m->tcf_enc_port,
+ vxlan_vni_to_tun_id(m->tcf_enc_key_id),
+ tun_flags);
+ ip_tunnel_info_opts_set(tun_info, &md, sizeof(md));
+
+ m->tun_dst = tun_dst;
+
+ return 0;
+}
+
static int tcf_mirred_init(struct net *net, struct nlattr *nla,
struct nlattr *est, struct tc_action **a, int ovr,
int bind)
@@ -139,6 +203,13 @@ static int tcf_mirred_init(struct net *net, struct nlattr *nla,
m->tcfm_ok_push = ok_push;
}
+ /* Should not use ret here !!! */
+ if (tunnel_alloc(m, tb)) {
+ printk("%s:%d - error allocating tunnel info\n",
+ __func__, __LINE__);
+ }
+
+
if (ret == ACT_P_CREATED) {
spin_lock_bh(&mirred_list_lock);
list_add(&m->tcfm_list, &mirred_list);
@@ -180,6 +251,9 @@ static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
if (!skb2)
goto out;
+ if (m->tun_dst)
+ skb_dst_set_noref(skb2, &m->tun_dst->dst);
+
if (!(at & AT_EGRESS)) {
if (m->tcfm_ok_push)
skb_push_rcsum(skb2, skb->mac_len);
@@ -221,6 +295,11 @@ static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind, i
if (nla_put(skb, TCA_MIRRED_PARMS, sizeof(opt), &opt))
goto nla_put_failure;
+ nla_put_be32(skb, TCA_MIRRED_ENC_IPV4_SRC, m->tcf_enc_saddr);
+ nla_put_be32(skb, TCA_MIRRED_ENC_IPV4_DST, m->tcf_enc_daddr);
+ nla_put_be32(skb, TCA_MIRRED_ENC_KEY_ID, m->tcf_enc_key_id);
+ nla_put_be32(skb, TCA_MIRRED_ENC_DST_PORT, m->tcf_enc_port);
+
tcf_tm_dump(&t, &m->tcf_tm);
if (nla_put_64bit(skb, TCA_MIRRED_TM, sizeof(t), &t, TCA_MIRRED_PAD))
goto nla_put_failure;
--
2.9.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-14 14:06 [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Amir Vadai
2016-08-14 14:06 ` [RFC net-next 1/2] net/sched: cls_flower: Introduce classify by vxlan outer headers Amir Vadai
2016-08-14 14:06 ` [RFC net-next 2/2] net/sched: act_mirred: Introduce vxlan support Amir Vadai
@ 2016-08-14 17:53 ` Cong Wang
2016-08-15 5:05 ` John Fastabend
2016-08-15 7:11 ` Jiri Pirko
2 siblings, 2 replies; 20+ messages in thread
From: Cong Wang @ 2016-08-14 17:53 UTC (permalink / raw)
To: Amir Vadai
Cc: Jamal Hadi Salim, Jiri Pirko, Linux Kernel Network Developers,
Or Gerlitz, Hadar Har-Zion, Oded Shanoon, Amir Vadai
On Sun, Aug 14, 2016 at 7:06 AM, Amir Vadai <amir@vadai.me> wrote:
> tc qdisc add dev $ETH ingress
>
> # ENCAP rule for ARP
> tc filter add dev $ETH protocol 0x806 parent ffff: prio 11 \
> flower \
> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>
> # ENCAP rule for ICMP
> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> flower ip_proto 1 \
> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>
I don't like this. This makes mirred action unnecessarily
complex, it should really just mirror or redirect packets as
it is, why it should be aware of tunnel information?
I think you probably need to introduce a new tc action
for these tunnel information and pipe it to mirred.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-14 17:53 ` [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Cong Wang
@ 2016-08-15 5:05 ` John Fastabend
2016-08-15 7:11 ` Jiri Pirko
1 sibling, 0 replies; 20+ messages in thread
From: John Fastabend @ 2016-08-15 5:05 UTC (permalink / raw)
To: Cong Wang, Amir Vadai
Cc: Jamal Hadi Salim, Jiri Pirko, Linux Kernel Network Developers,
Or Gerlitz, Hadar Har-Zion, Oded Shanoon, Amir Vadai
On 16-08-14 10:53 AM, Cong Wang wrote:
> On Sun, Aug 14, 2016 at 7:06 AM, Amir Vadai <amir@vadai.me> wrote:
>> tc qdisc add dev $ETH ingress
>>
>> # ENCAP rule for ARP
>> tc filter add dev $ETH protocol 0x806 parent ffff: prio 11 \
>> flower \
>> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>>
>> # ENCAP rule for ICMP
>> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
>> flower ip_proto 1 \
>> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>>
>
> I don't like this. This makes mirred action unnecessarily
> complex, it should really just mirror or redirect packets as
> it is, why it should be aware of tunnel information?
>
> I think you probably need to introduce a new tc action
> for these tunnel information and pipe it to mirred.
>
I agree how about a set_tunnel_key() action it could be very
similar to the bpf helper routine. Then you can string it
together with other actions easily.
.John
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-14 17:53 ` [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Cong Wang
2016-08-15 5:05 ` John Fastabend
@ 2016-08-15 7:11 ` Jiri Pirko
2016-08-15 8:17 ` Amir Vadai
1 sibling, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2016-08-15 7:11 UTC (permalink / raw)
To: Cong Wang
Cc: Amir Vadai, Jamal Hadi Salim, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
>On Sun, Aug 14, 2016 at 7:06 AM, Amir Vadai <amir@vadai.me> wrote:
>> tc qdisc add dev $ETH ingress
>>
>> # ENCAP rule for ARP
>> tc filter add dev $ETH protocol 0x806 parent ffff: prio 11 \
>> flower \
>> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>>
>> # ENCAP rule for ICMP
>> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
>> flower ip_proto 1 \
>> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>>
>
>I don't like this. This makes mirred action unnecessarily
>complex, it should really just mirror or redirect packets as
>it is, why it should be aware of tunnel information?
>
>I think you probably need to introduce a new tc action
>for these tunnel information and pipe it to mirred.
that is the first thing that I thinked of when I saw the patch. I think
you can introduce act_vxlan similar to act_vlan.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 7:11 ` Jiri Pirko
@ 2016-08-15 8:17 ` Amir Vadai
2016-08-15 9:08 ` Amir Vadai
0 siblings, 1 reply; 20+ messages in thread
From: Amir Vadai @ 2016-08-15 8:17 UTC (permalink / raw)
To: Jiri Pirko
Cc: Cong Wang, John Fastabend, Jamal Hadi Salim, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On Mon, Aug 15, 2016 at 09:11:22AM +0200, Jiri Pirko wrote:
> Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
> >On Sun, Aug 14, 2016 at 7:06 AM, Amir Vadai <amir@vadai.me> wrote:
> >> tc qdisc add dev $ETH ingress
> >>
> >> # ENCAP rule for ARP
> >> tc filter add dev $ETH protocol 0x806 parent ffff: prio 11 \
> >> flower \
> >> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
> >>
> >> # ENCAP rule for ICMP
> >> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> >> flower ip_proto 1 \
> >> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
> >>
> >
> >I don't like this. This makes mirred action unnecessarily
> >complex, it should really just mirror or redirect packets as
> >it is, why it should be aware of tunnel information?
> >
> >I think you probably need to introduce a new tc action
> >for these tunnel information and pipe it to mirred.
>
> that is the first thing that I thinked of when I saw the patch. I think
> you can introduce act_vxlan similar to act_vlan.
introducing a new action was the first thing I thought of, but it felt
problematic because the actual encap is done by the redirection to the
vxlan device. This action is only responsible to supply the metadata and
work tightly with the mirred. It is not exactly like vlan that the
push/pop actions can live without mirroring/redirecting.
But still as all of you said, it makes mirred complex with stuff that
shouldn't be there. And between the two options it is better to
introduce a new action.
I will go in this direction.
Thanks,
Amir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 8:17 ` Amir Vadai
@ 2016-08-15 9:08 ` Amir Vadai
2016-08-15 9:48 ` Jiri Pirko
` (2 more replies)
0 siblings, 3 replies; 20+ messages in thread
From: Amir Vadai @ 2016-08-15 9:08 UTC (permalink / raw)
To: Jiri Pirko
Cc: Cong Wang, John Fastabend, Jamal Hadi Salim, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On Mon, Aug 15, 2016 at 11:17:40AM +0300, Amir Vadai wrote:
> On Mon, Aug 15, 2016 at 09:11:22AM +0200, Jiri Pirko wrote:
> > Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
> > >On Sun, Aug 14, 2016 at 7:06 AM, Amir Vadai <amir@vadai.me> wrote:
> > >> tc qdisc add dev $ETH ingress
> > >>
> > >> # ENCAP rule for ARP
> > >> tc filter add dev $ETH protocol 0x806 parent ffff: prio 11 \
> > >> flower \
> > >> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
> > >>
> > >> # ENCAP rule for ICMP
> > >> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> > >> flower ip_proto 1 \
> > >> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
> > >>
> > >
> > >I don't like this. This makes mirred action unnecessarily
> > >complex, it should really just mirror or redirect packets as
> > >it is, why it should be aware of tunnel information?
> > >
> > >I think you probably need to introduce a new tc action
> > >for these tunnel information and pipe it to mirred.
> >
> > that is the first thing that I thinked of when I saw the patch. I think
> > you can introduce act_vxlan similar to act_vlan.
>
> introducing a new action was the first thing I thought of, but it felt
> problematic because the actual encap is done by the redirection to the
> vxlan device. This action is only responsible to supply the metadata and
> work tightly with the mirred. It is not exactly like vlan that the
> push/pop actions can live without mirroring/redirecting.
> But still as all of you said, it makes mirred complex with stuff that
> shouldn't be there. And between the two options it is better to
> introduce a new action.
>
> I will go in this direction.
>
> Thanks,
> Amir
Any objection to the following?
# ENCAP rule
tc filter add dev $ETH protocol ip parent ffff: prio 10 \
flower ip_proto 1 \
action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
action mirred egress redirect dev $VXLAN
# DECAP rule
tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
flower \
enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
ip_proto 1 \
action mirred egress redirect dev $ETH
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 9:08 ` Amir Vadai
@ 2016-08-15 9:48 ` Jiri Pirko
2016-08-15 9:50 ` Shmulik Ladkani
2016-08-15 10:08 ` Jamal Hadi Salim
2 siblings, 0 replies; 20+ messages in thread
From: Jiri Pirko @ 2016-08-15 9:48 UTC (permalink / raw)
To: Amir Vadai
Cc: Cong Wang, John Fastabend, Jamal Hadi Salim, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
Mon, Aug 15, 2016 at 11:08:04AM CEST, amir@vadai.me wrote:
>On Mon, Aug 15, 2016 at 11:17:40AM +0300, Amir Vadai wrote:
>> On Mon, Aug 15, 2016 at 09:11:22AM +0200, Jiri Pirko wrote:
>> > Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
>> > >On Sun, Aug 14, 2016 at 7:06 AM, Amir Vadai <amir@vadai.me> wrote:
>> > >> tc qdisc add dev $ETH ingress
>> > >>
>> > >> # ENCAP rule for ARP
>> > >> tc filter add dev $ETH protocol 0x806 parent ffff: prio 11 \
>> > >> flower \
>> > >> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>> > >>
>> > >> # ENCAP rule for ICMP
>> > >> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
>> > >> flower ip_proto 1 \
>> > >> action mirred egress redirect dev $VXLAN enc_src_ip 11.11.0.1 enc_dst_ip 11.11.0.2 enc_key_id 11 enc_dst_port 4789
>> > >>
>> > >
>> > >I don't like this. This makes mirred action unnecessarily
>> > >complex, it should really just mirror or redirect packets as
>> > >it is, why it should be aware of tunnel information?
>> > >
>> > >I think you probably need to introduce a new tc action
>> > >for these tunnel information and pipe it to mirred.
>> >
>> > that is the first thing that I thinked of when I saw the patch. I think
>> > you can introduce act_vxlan similar to act_vlan.
>>
>> introducing a new action was the first thing I thought of, but it felt
>> problematic because the actual encap is done by the redirection to the
>> vxlan device. This action is only responsible to supply the metadata and
>> work tightly with the mirred. It is not exactly like vlan that the
>> push/pop actions can live without mirroring/redirecting.
>> But still as all of you said, it makes mirred complex with stuff that
>> shouldn't be there. And between the two options it is better to
>> introduce a new action.
>>
>> I will go in this direction.
>>
>> Thanks,
>> Amir
>
>Any objection to the following?
>
># ENCAP rule
>tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> flower ip_proto 1 \
> action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
Looks fine to me.
> action mirred egress redirect dev $VXLAN
>
># DECAP rule
>tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
> flower \
> enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
> ip_proto 1 \
> action mirred egress redirect dev $ETH
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 9:08 ` Amir Vadai
2016-08-15 9:48 ` Jiri Pirko
@ 2016-08-15 9:50 ` Shmulik Ladkani
2016-08-15 9:58 ` Amir Vadai
2016-08-15 10:08 ` Jamal Hadi Salim
2 siblings, 1 reply; 20+ messages in thread
From: Shmulik Ladkani @ 2016-08-15 9:50 UTC (permalink / raw)
To: Amir Vadai
Cc: Jiri Pirko, Cong Wang, John Fastabend, Jamal Hadi Salim,
Jiri Pirko, Linux Kernel Network Developers, Or Gerlitz,
Hadar Har-Zion, Oded Shanoon, Amir Vadai
On Mon, 15 Aug 2016 12:08:04 +0300, amir@vadai.me wrote:
>
> Any objection to the following?
>
> # ENCAP rule
> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> flower ip_proto 1 \
> action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
Ability to control few tun_flags (e.g. TUNNEL_CSUM, TUNNEL_DONT_FRAGMENT)
might be useful too.
> # DECAP rule
> tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
> flower \
> enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
> ip_proto 1 \
You might want to match the tunnel's udp port as well, for symmetry.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 9:50 ` Shmulik Ladkani
@ 2016-08-15 9:58 ` Amir Vadai
2016-08-15 10:42 ` Shmulik Ladkani
0 siblings, 1 reply; 20+ messages in thread
From: Amir Vadai @ 2016-08-15 9:58 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: Jiri Pirko, Cong Wang, John Fastabend, Jamal Hadi Salim,
Jiri Pirko, Linux Kernel Network Developers, Or Gerlitz,
Hadar Har-Zion, Oded Shanoon, Amir Vadai
On Mon, Aug 15, 2016 at 12:50:39PM +0300, Shmulik Ladkani wrote:
> On Mon, 15 Aug 2016 12:08:04 +0300, amir@vadai.me wrote:
> >
> > Any objection to the following?
> >
> > # ENCAP rule
> > tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> > flower ip_proto 1 \
> > action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
>
> Ability to control few tun_flags (e.g. TUNNEL_CSUM, TUNNEL_DONT_FRAGMENT)
> might be useful too.
I guess it should be added when needed. Currenly I don't have a use case
for that.
>
> > # DECAP rule
> > tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
> > flower \
> > enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
> > ip_proto 1 \
>
> You might want to match the tunnel's udp port as well, for symmetry.
actually, now that you raise it, the udp port is already an attribute of
the vxlan device. So I think it should be ommitted in both encap and
decap. Selecting the udp port will be done when creating the vxlan
device.
Thanks,
Amir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 9:08 ` Amir Vadai
2016-08-15 9:48 ` Jiri Pirko
2016-08-15 9:50 ` Shmulik Ladkani
@ 2016-08-15 10:08 ` Jamal Hadi Salim
2016-08-15 10:24 ` Shmulik Ladkani
2016-08-15 12:34 ` Jiri Pirko
2 siblings, 2 replies; 20+ messages in thread
From: Jamal Hadi Salim @ 2016-08-15 10:08 UTC (permalink / raw)
To: Amir Vadai, Jiri Pirko
Cc: Cong Wang, John Fastabend, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On 16-08-15 05:08 AM, Amir Vadai wrote:
> On Mon, Aug 15, 2016 at 11:17:40AM +0300, Amir Vadai wrote:
>> On Mon, Aug 15, 2016 at 09:11:22AM +0200, Jiri Pirko wrote:
>>> Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
>>
>> Thanks,
>> Amir
>
> Any objection to the following?
>
> # ENCAP rule
> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> flower ip_proto 1 \
> action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
> action mirred egress redirect dev $VXLAN
Assuming $VXLAN is actually not a linux netdev of type vxlan?
then the action does vxlan encap redirect sends it to the $VXLAN
dev with encapsulation in place.
Sounds to me like a name like "vxlan" would be more usable. Example:
tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
action vxlan encap src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 ....
action mirred egress redirect dev eth0
>
> # DECAP rule
> tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
> flower \
> enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
> ip_proto 1 \
> action mirred egress redirect dev $ETH
>
And a decap would be of the form:
tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
action vxlan decap
i.e there is no redirect needed here, no?
cheers,
jamal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 10:08 ` Jamal Hadi Salim
@ 2016-08-15 10:24 ` Shmulik Ladkani
2016-08-15 10:41 ` Jamal Hadi Salim
2016-08-15 12:34 ` Jiri Pirko
1 sibling, 1 reply; 20+ messages in thread
From: Shmulik Ladkani @ 2016-08-15 10:24 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Amir Vadai, Jiri Pirko, Cong Wang, John Fastabend, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On Mon, 15 Aug 2016 06:08:10 -0400, jhs@mojatatu.com wrote:
> On 16-08-15 05:08 AM, Amir Vadai wrote:
> > # ENCAP rule
> > tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> > flower ip_proto 1 \
> > action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
> > action mirred egress redirect dev $VXLAN
>
> Assuming $VXLAN is actually not a linux netdev of type vxlan?
> then the action does vxlan encap redirect sends it to the $VXLAN
> dev with encapsulation in place.
I assume Amir refers to vxlan netdev in VXLAN_F_COLLECT_METADATA mode,
using the tun_info metadata found in skb_metadata_dst.
The action is supposed to assign the tun metadata.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 10:24 ` Shmulik Ladkani
@ 2016-08-15 10:41 ` Jamal Hadi Salim
2016-08-15 11:36 ` Amir Vadai
0 siblings, 1 reply; 20+ messages in thread
From: Jamal Hadi Salim @ 2016-08-15 10:41 UTC (permalink / raw)
To: Shmulik Ladkani
Cc: Amir Vadai, Jiri Pirko, Cong Wang, John Fastabend, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On 16-08-15 06:24 AM, Shmulik Ladkani wrote:
> On Mon, 15 Aug 2016 06:08:10 -0400, jhs@mojatatu.com wrote:
>> Assuming $VXLAN is actually not a linux netdev of type vxlan?
>> then the action does vxlan encap redirect sends it to the $VXLAN
>> dev with encapsulation in place.
>
> I assume Amir refers to vxlan netdev in VXLAN_F_COLLECT_METADATA mode,
> using the tun_info metadata found in skb_metadata_dst.
> The action is supposed to assign the tun metadata.
>
I see - so you let the vxlan netdev do the encap?
Would it still scale to a _very large_ number of tunnels?
How many netdevs are you going to use? I am assuming you will hit
a nasty lock somewhere(qdisc?) if you use only one.
cheers,
jamal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 9:58 ` Amir Vadai
@ 2016-08-15 10:42 ` Shmulik Ladkani
0 siblings, 0 replies; 20+ messages in thread
From: Shmulik Ladkani @ 2016-08-15 10:42 UTC (permalink / raw)
To: Amir Vadai
Cc: Jiri Pirko, Cong Wang, John Fastabend, Jamal Hadi Salim,
Jiri Pirko, Linux Kernel Network Developers, Or Gerlitz,
Hadar Har-Zion, Oded Shanoon, Amir Vadai
On Mon, 15 Aug 2016 12:58:09 +0300, amir@vadai.me wrote:
> On Mon, Aug 15, 2016 at 12:50:39PM +0300, Shmulik Ladkani wrote:
> > On Mon, 15 Aug 2016 12:08:04 +0300, amir@vadai.me wrote:
> > >
> > > Any objection to the following?
> > >
> > > # ENCAP rule
> > > tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> > > flower ip_proto 1 \
> > > action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
> >
> > Ability to control few tun_flags (e.g. TUNNEL_CSUM, TUNNEL_DONT_FRAGMENT)
> > might be useful too.
>
> I guess it should be added when needed. Currenly I don't have a use case
> for that.
Sure.
> > > # DECAP rule
> > > tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
> > > flower \
> > > enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
> > > ip_proto 1 \
> >
> > You might want to match the tunnel's udp port as well, for symmetry.
>
> actually, now that you raise it, the udp port is already an attribute of
> the vxlan device. So I think it should be ommitted in both encap and
> decap. Selecting the udp port will be done when creating the vxlan
> device.
Sounds better. Manual port override can be added if needed.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 10:41 ` Jamal Hadi Salim
@ 2016-08-15 11:36 ` Amir Vadai
2016-08-15 16:35 ` John Fastabend
0 siblings, 1 reply; 20+ messages in thread
From: Amir Vadai @ 2016-08-15 11:36 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Shmulik Ladkani, Jiri Pirko, Cong Wang, John Fastabend,
Jiri Pirko, Linux Kernel Network Developers, Or Gerlitz,
Hadar Har-Zion, Oded Shanoon, Amir Vadai
On Mon, Aug 15, 2016 at 06:41:14AM -0400, Jamal Hadi Salim wrote:
> On 16-08-15 06:24 AM, Shmulik Ladkani wrote:
> > On Mon, 15 Aug 2016 06:08:10 -0400, jhs@mojatatu.com wrote:
>
> > > Assuming $VXLAN is actually not a linux netdev of type vxlan?
> > > then the action does vxlan encap redirect sends it to the $VXLAN
> > > dev with encapsulation in place.
> >
> > I assume Amir refers to vxlan netdev in VXLAN_F_COLLECT_METADATA mode,
> > using the tun_info metadata found in skb_metadata_dst.
> > The action is supposed to assign the tun metadata.
> >
>
> I see - so you let the vxlan netdev do the encap?
> Would it still scale to a _very large_ number of tunnels?
> How many netdevs are you going to use? I am assuming you will hit
> a nasty lock somewhere(qdisc?) if you use only one.
Having a netdev per tunnel is problematic in its memory use [1].
User can take each of the approaches. Can have a shared netdev, but will
have some contention on the qdisc lock, or create a vxlan dev per VNI
and increase memory use.
When offloading will be added, shared netdev will enjoy all worlds - low
memory use and no lock contention.
[1] - http://www.netdevconf.org/1.1/proceedings/slides/ahern-aleksandrov-prabhu-scaling-network-cumulus.pdf
>
> cheers,
> jamal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 10:08 ` Jamal Hadi Salim
2016-08-15 10:24 ` Shmulik Ladkani
@ 2016-08-15 12:34 ` Jiri Pirko
2016-08-15 12:59 ` Amir Vadai
1 sibling, 1 reply; 20+ messages in thread
From: Jiri Pirko @ 2016-08-15 12:34 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: Amir Vadai, Cong Wang, John Fastabend, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
Mon, Aug 15, 2016 at 12:08:10PM CEST, jhs@mojatatu.com wrote:
>On 16-08-15 05:08 AM, Amir Vadai wrote:
>> On Mon, Aug 15, 2016 at 11:17:40AM +0300, Amir Vadai wrote:
>> > On Mon, Aug 15, 2016 at 09:11:22AM +0200, Jiri Pirko wrote:
>> > > Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
>
>> >
>> > Thanks,
>> > Amir
>>
>> Any objection to the following?
>>
>> # ENCAP rule
>> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
>> flower ip_proto 1 \
>> action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
>> action mirred egress redirect dev $VXLAN
>
>Assuming $VXLAN is actually not a linux netdev of type vxlan?
>then the action does vxlan encap redirect sends it to the $VXLAN
>dev with encapsulation in place.
>Sounds to me like a name like "vxlan" would be more usable. Example:
I believe those are generic tunelling data
>
>tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
>action vxlan encap src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 ....
>action mirred egress redirect dev eth0
>
>>
>> # DECAP rule
>> tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
>> flower \
>> enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
>> ip_proto 1 \
>> action mirred egress redirect dev $ETH
>>
>
>And a decap would be of the form:
>tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
>action vxlan decap
That's right. Amir, don't you need decap here to drop the tunnel
metadata?
>
>i.e there is no redirect needed here, no?
>
>cheers,
>jamal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 12:34 ` Jiri Pirko
@ 2016-08-15 12:59 ` Amir Vadai
2016-08-15 16:37 ` John Fastabend
0 siblings, 1 reply; 20+ messages in thread
From: Amir Vadai @ 2016-08-15 12:59 UTC (permalink / raw)
To: Jiri Pirko
Cc: Jamal Hadi Salim, Cong Wang, John Fastabend, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On Mon, Aug 15, 2016 at 02:34:00PM +0200, Jiri Pirko wrote:
> Mon, Aug 15, 2016 at 12:08:10PM CEST, jhs@mojatatu.com wrote:
> >On 16-08-15 05:08 AM, Amir Vadai wrote:
> >> On Mon, Aug 15, 2016 at 11:17:40AM +0300, Amir Vadai wrote:
> >> > On Mon, Aug 15, 2016 at 09:11:22AM +0200, Jiri Pirko wrote:
> >> > > Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
> >
> >> >
> >> > Thanks,
> >> > Amir
> >>
> >> Any objection to the following?
> >>
> >> # ENCAP rule
> >> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
> >> flower ip_proto 1 \
> >> action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
> >> action mirred egress redirect dev $VXLAN
> >
> >Assuming $VXLAN is actually not a linux netdev of type vxlan?
> >then the action does vxlan encap redirect sends it to the $VXLAN
> >dev with encapsulation in place.
> >Sounds to me like a name like "vxlan" would be more usable. Example:
>
> I believe those are generic tunelling data
>
>
> >
> >tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
> >action vxlan encap src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 ....
> >action mirred egress redirect dev eth0
> >
> >>
> >> # DECAP rule
> >> tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
> >> flower \
> >> enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
> >> ip_proto 1 \
> >> action mirred egress redirect dev $ETH
> >>
> >
> >And a decap would be of the form:
> >tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
> >action vxlan decap
>
> That's right. Amir, don't you need decap here to drop the tunnel
> metadata?
Right. will add a decap that will release it.
>
>
> >
> >i.e there is no redirect needed here, no?
> >
> >cheers,
> >jamal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 11:36 ` Amir Vadai
@ 2016-08-15 16:35 ` John Fastabend
0 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2016-08-15 16:35 UTC (permalink / raw)
To: Amir Vadai, Jamal Hadi Salim
Cc: Shmulik Ladkani, Jiri Pirko, Cong Wang, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On 16-08-15 04:36 AM, Amir Vadai wrote:
> On Mon, Aug 15, 2016 at 06:41:14AM -0400, Jamal Hadi Salim wrote:
>> On 16-08-15 06:24 AM, Shmulik Ladkani wrote:
>>> On Mon, 15 Aug 2016 06:08:10 -0400, jhs@mojatatu.com wrote:
>>
>>>> Assuming $VXLAN is actually not a linux netdev of type vxlan?
>>>> then the action does vxlan encap redirect sends it to the $VXLAN
>>>> dev with encapsulation in place.
>>>
>>> I assume Amir refers to vxlan netdev in VXLAN_F_COLLECT_METADATA mode,
>>> using the tun_info metadata found in skb_metadata_dst.
>>> The action is supposed to assign the tun metadata.
>>>
>>
>> I see - so you let the vxlan netdev do the encap?
>> Would it still scale to a _very large_ number of tunnels?
>> How many netdevs are you going to use? I am assuming you will hit
>> a nasty lock somewhere(qdisc?) if you use only one.
> Having a netdev per tunnel is problematic in its memory use [1].
> User can take each of the approaches. Can have a shared netdev, but will
> have some contention on the qdisc lock, or create a vxlan dev per VNI
> and increase memory use.
> When offloading will be added, shared netdev will enjoy all worlds - low
> memory use and no lock contention.
>
vxlan devices are lockless if your worried about many netdevs using
shared netdev with metadata is a good approach.
static void vxlan_setup(struct net_device *dev)
{
struct vxlan_dev *vxlan = netdev_priv(dev);
unsigned int h;
eth_hw_addr_random(dev);
ether_setup(dev);
dev->destructor = free_netdev;
SET_NETDEV_DEVTYPE(dev, &vxlan_type);
dev->features |= NETIF_F_LLTX; <--- ;) here
dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM;
dev->features |= NETIF_F_RXCSUM;
dev->features |= NETIF_F_GSO_SOFTWARE;
>
> [1] - http://www.netdevconf.org/1.1/proceedings/slides/ahern-aleksandrov-prabhu-scaling-network-cumulus.pdf
>
>>
>> cheers,
>> jamal
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC
2016-08-15 12:59 ` Amir Vadai
@ 2016-08-15 16:37 ` John Fastabend
0 siblings, 0 replies; 20+ messages in thread
From: John Fastabend @ 2016-08-15 16:37 UTC (permalink / raw)
To: Amir Vadai, Jiri Pirko
Cc: Jamal Hadi Salim, Cong Wang, Jiri Pirko,
Linux Kernel Network Developers, Or Gerlitz, Hadar Har-Zion,
Oded Shanoon, Amir Vadai
On 16-08-15 05:59 AM, Amir Vadai wrote:
> On Mon, Aug 15, 2016 at 02:34:00PM +0200, Jiri Pirko wrote:
>> Mon, Aug 15, 2016 at 12:08:10PM CEST, jhs@mojatatu.com wrote:
>>> On 16-08-15 05:08 AM, Amir Vadai wrote:
>>>> On Mon, Aug 15, 2016 at 11:17:40AM +0300, Amir Vadai wrote:
>>>>> On Mon, Aug 15, 2016 at 09:11:22AM +0200, Jiri Pirko wrote:
>>>>>> Sun, Aug 14, 2016 at 07:53:30PM CEST, xiyou.wangcong@gmail.com wrote:
>>>
>>>>>
>>>>> Thanks,
>>>>> Amir
>>>>
>>>> Any objection to the following?
>>>>
>>>> # ENCAP rule
>>>> tc filter add dev $ETH protocol ip parent ffff: prio 10 \
>>>> flower ip_proto 1 \
>>>> action set_tunnel_key src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 dst_port 4789 \
>>>> action mirred egress redirect dev $VXLAN
>>>
>>> Assuming $VXLAN is actually not a linux netdev of type vxlan?
>>> then the action does vxlan encap redirect sends it to the $VXLAN
>>> dev with encapsulation in place.
>>> Sounds to me like a name like "vxlan" would be more usable. Example:
>>
>> I believe those are generic tunelling data
>>
>>
>>>
>>> tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
>>> action vxlan encap src_ip 11.11.0.1 dst_ip 11.11.0.2 key_id 11 ....
>>> action mirred egress redirect dev eth0
>>>
>>>>
>>>> # DECAP rule
>>>> tc filter add dev $VXLAN protocol ip parent ffff: prio 10 \
>>>> flower \
>>>> enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
>>>> ip_proto 1 \
>>>> action mirred egress redirect dev $ETH
>>>>
>>>
>>> And a decap would be of the form:
>>> tc filter add dev $ETH protocol ip parent ffff: prio 10 ..
>>> action vxlan decap
>>
>> That's right. Amir, don't you need decap here to drop the tunnel
>> metadata?
> Right. will add a decap that will release it.
>
FWIW this new approach looks good to me.
Thanks,
John
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-08-15 16:38 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-14 14:06 [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Amir Vadai
2016-08-14 14:06 ` [RFC net-next 1/2] net/sched: cls_flower: Introduce classify by vxlan outer headers Amir Vadai
2016-08-14 14:06 ` [RFC net-next 2/2] net/sched: act_mirred: Introduce vxlan support Amir Vadai
2016-08-14 17:53 ` [RFC net-next 0/2] net/sched: cls_flower, act_mirred: VXLAN redirect using TC Cong Wang
2016-08-15 5:05 ` John Fastabend
2016-08-15 7:11 ` Jiri Pirko
2016-08-15 8:17 ` Amir Vadai
2016-08-15 9:08 ` Amir Vadai
2016-08-15 9:48 ` Jiri Pirko
2016-08-15 9:50 ` Shmulik Ladkani
2016-08-15 9:58 ` Amir Vadai
2016-08-15 10:42 ` Shmulik Ladkani
2016-08-15 10:08 ` Jamal Hadi Salim
2016-08-15 10:24 ` Shmulik Ladkani
2016-08-15 10:41 ` Jamal Hadi Salim
2016-08-15 11:36 ` Amir Vadai
2016-08-15 16:35 ` John Fastabend
2016-08-15 12:34 ` Jiri Pirko
2016-08-15 12:59 ` Amir Vadai
2016-08-15 16:37 ` John Fastabend
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).