* [PATCH iproute2 V3 0/3] tc: Support for ip tunnel metadata set/unset/classify
@ 2016-11-30 7:38 Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 1/3] libnetlink: Introduce rta_getattr_be*() Amir Vadai
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Amir Vadai @ 2016-11-30 7:38 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
Jiri Benc, Amir Vadai
Hi,
This short series adds support for matching and setting metadata for ip tunnel
shared device using the TC system, introduced in kernel 4.9 [1].
Applied and tested on top of commit f3f339e9590a ("cleanup debris from revert")
Example usage:
$ tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 11.11.0.2 \
enc_dst_ip 11.11.0.1 \
enc_key_id 11 \
dst_ip 11.11.11.1 \
action mirred egress redirect dev vnet0
$ tc filter add dev net0 protocol ip parent ffff: \
flower \
ip_proto 1 \
dst_ip 11.11.11.2 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0
[1] - d1ba24feb466 ("Merge branch 'act_tunnel_key'")
Thanks,
Amir
Changes from V2:
- Use const where needed
- Don't lose return value
- Introduce rta_getattr_be16() and rta_getattr_be32()
Changes from V1:
- Updated Patch 2/2 ("tc/act_tunnel: Introduce ip tunnel action") commit log
and the man page tc-tunnel_key to reflect the fact that 'unset' operation is
no mandatory.
And describe when it might be needed.
- Rename the 'release' operation to 'unset'
Amir Vadai (3):
libnetlink: Introduce rta_getattr_be*()
tc/cls_flower: Classify packet in ip tunnels
tc/act_tunnel: Introduce ip tunnel action
bridge/fdb.c | 4 +-
include/libnetlink.h | 9 ++
include/linux/tc_act/tc_tunnel_key.h | 42 ++++++
ip/iplink_geneve.c | 2 +-
ip/iplink_vxlan.c | 2 +-
man/man8/tc-flower.8 | 17 ++-
man/man8/tc-tunnel_key.8 | 113 +++++++++++++++
tc/Makefile | 1 +
tc/f_flower.c | 84 +++++++++++-
tc/m_tunnel_key.c | 258 +++++++++++++++++++++++++++++++++++
10 files changed, 523 insertions(+), 9 deletions(-)
create mode 100644 include/linux/tc_act/tc_tunnel_key.h
create mode 100644 man/man8/tc-tunnel_key.8
create mode 100644 tc/m_tunnel_key.c
--
2.10.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH iproute2 V3 1/3] libnetlink: Introduce rta_getattr_be*()
2016-11-30 7:38 [PATCH iproute2 V3 0/3] tc: Support for ip tunnel metadata set/unset/classify Amir Vadai
@ 2016-11-30 7:38 ` Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 2/3] tc/cls_flower: Classify packet in ip tunnels Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action Amir Vadai
2 siblings, 0 replies; 6+ messages in thread
From: Amir Vadai @ 2016-11-30 7:38 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
Jiri Benc, Amir Vadai
Add the utility functions rta_getattr_be16() and rta_getattr_be32(), and
change existing code to use it.
Signed-off-by: Amir Vadai <amir@vadai.me>
---
bridge/fdb.c | 4 ++--
include/libnetlink.h | 9 +++++++++
ip/iplink_geneve.c | 2 +-
ip/iplink_vxlan.c | 2 +-
4 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/bridge/fdb.c b/bridge/fdb.c
index 90f4b154c5dc..a91521776e99 100644
--- a/bridge/fdb.c
+++ b/bridge/fdb.c
@@ -168,10 +168,10 @@ int print_fdb(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
if (tb[NDA_PORT]) {
if (jw_global)
jsonw_uint_field(jw_global, "port",
- ntohs(rta_getattr_u16(tb[NDA_PORT])));
+ rta_getattr_be16(tb[NDA_PORT]));
else
fprintf(fp, "port %d ",
- ntohs(rta_getattr_u16(tb[NDA_PORT])));
+ rta_getattr_be16(tb[NDA_PORT]));
}
if (tb[NDA_VNI]) {
diff --git a/include/libnetlink.h b/include/libnetlink.h
index 483509ca9635..751ebf186dd4 100644
--- a/include/libnetlink.h
+++ b/include/libnetlink.h
@@ -10,6 +10,7 @@
#include <linux/if_addr.h>
#include <linux/neighbour.h>
#include <linux/netconf.h>
+#include <arpa/inet.h>
struct rtnl_handle {
int fd;
@@ -140,10 +141,18 @@ static inline __u16 rta_getattr_u16(const struct rtattr *rta)
{
return *(__u16 *)RTA_DATA(rta);
}
+static inline __be16 rta_getattr_be16(const struct rtattr *rta)
+{
+ return ntohs(rta_getattr_u16(rta));
+}
static inline __u32 rta_getattr_u32(const struct rtattr *rta)
{
return *(__u32 *)RTA_DATA(rta);
}
+static inline __be32 rta_getattr_be32(const struct rtattr *rta)
+{
+ return ntohl(rta_getattr_u32(rta));
+}
static inline __u64 rta_getattr_u64(const struct rtattr *rta)
{
__u64 tmp;
diff --git a/ip/iplink_geneve.c b/ip/iplink_geneve.c
index 3bfba91c644c..1e6669d07d60 100644
--- a/ip/iplink_geneve.c
+++ b/ip/iplink_geneve.c
@@ -234,7 +234,7 @@ static void geneve_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
if (tb[IFLA_GENEVE_PORT])
fprintf(f, "dstport %u ",
- ntohs(rta_getattr_u16(tb[IFLA_GENEVE_PORT])));
+ rta_getattr_be16(tb[IFLA_GENEVE_PORT]));
if (tb[IFLA_GENEVE_COLLECT_METADATA])
fputs("external ", f);
diff --git a/ip/iplink_vxlan.c b/ip/iplink_vxlan.c
index 93af979a1e97..6d02bb47b2f0 100644
--- a/ip/iplink_vxlan.c
+++ b/ip/iplink_vxlan.c
@@ -413,7 +413,7 @@ static void vxlan_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
if (tb[IFLA_VXLAN_PORT])
fprintf(f, "dstport %u ",
- ntohs(rta_getattr_u16(tb[IFLA_VXLAN_PORT])));
+ rta_getattr_be16(tb[IFLA_VXLAN_PORT]));
if (tb[IFLA_VXLAN_LEARNING] &&
!rta_getattr_u8(tb[IFLA_VXLAN_LEARNING]))
--
2.10.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH iproute2 V3 2/3] tc/cls_flower: Classify packet in ip tunnels
2016-11-30 7:38 [PATCH iproute2 V3 0/3] tc: Support for ip tunnel metadata set/unset/classify Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 1/3] libnetlink: Introduce rta_getattr_be*() Amir Vadai
@ 2016-11-30 7:38 ` Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action Amir Vadai
2 siblings, 0 replies; 6+ messages in thread
From: Amir Vadai @ 2016-11-30 7:38 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
Jiri Benc, Amir Vadai
Introduce classifying by metadata extracted by the tunnel device.
Outer header fields - source/dest ip and tunnel id, are extracted from
the metadata when classifying.
For example, the following will add a filter on the ingress Qdisc of shared
vxlan device named 'vxlan0'. To forward packets with outer src ip
11.11.0.2, dst ip 11.11.0.1 and tunnel id 11. The packets will be
forwarded to tap device 'vnet0':
$ tc filter add dev vxlan0 protocol ip parent ffff: \
flower \
enc_src_ip 11.11.0.2 \
enc_dst_ip 11.11.0.1 \
enc_key_id 11 \
dst_ip 11.11.11.1 \
action mirred egress redirect dev vnet0
Signed-off-by: Amir Vadai <amir@vadai.me>
---
man/man8/tc-flower.8 | 17 ++++++++++-
tc/f_flower.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 96 insertions(+), 5 deletions(-)
diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 74f76647753b..0e0b0cf4bb72 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -36,7 +36,11 @@ flower \- flow based traffic control filter
.BR dst_ip " | " src_ip " } { "
.IR ipv4_address " | " ipv6_address " } | { "
.BR dst_port " | " src_port " } "
-.IR port_number " }"
+.IR port_number " } | "
+.B enc_key_id
+.IR KEY-ID " | {"
+.BR enc_dst_ip " | " enc_src_ip " } { "
+.IR ipv4_address " | " ipv6_address " } | "
.SH DESCRIPTION
The
.B flower
@@ -121,6 +125,17 @@ which has to be specified in beforehand.
Match on layer 4 protocol source or destination port number. Only available for
.BR ip_proto " values " udp " and " tcp ,
which has to be specified in beforehand.
+.TP
+.BI enc_key_id " NUMBER"
+.TQ
+.BI enc_dst_ip " ADDRESS"
+.TQ
+.BI enc_src_ip " ADDRESS"
+Match on IP tunnel metadata. Key id
+.I NUMBER
+is a 32 bit tunnel key id (e.g. VNI for VXLAN tunnel).
+.I ADDRESS
+must be a valid IPv4 or IPv6 address.
.SH NOTES
As stated above where applicable, matches of a certain layer implicitly depend
on the matches of the next lower layer. Precisely, layer one and two matches (
diff --git a/tc/f_flower.c b/tc/f_flower.c
index 2d31d1aa832d..173cfc20f90b 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -41,7 +41,10 @@ static void explain(void)
fprintf(stderr, " dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
fprintf(stderr, " src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
fprintf(stderr, " dst_port PORT-NUMBER |\n");
- fprintf(stderr, " src_port PORT-NUMBER }\n");
+ fprintf(stderr, " src_port PORT-NUMBER |\n");
+ fprintf(stderr, " enc_dst_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
+ fprintf(stderr, " enc_src_ip [ IPV4-ADDR | IPV6-ADDR ] |\n");
+ fprintf(stderr, " enc_key_id [ KEY-ID ] }\n");
fprintf(stderr, " FILTERID := X:Y:Z\n");
fprintf(stderr, " ACTION-SPEC := ... look at individual actions\n");
fprintf(stderr, "\n");
@@ -121,8 +124,9 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
family = AF_INET;
} else if (eth_type == htons(ETH_P_IPV6)) {
family = AF_INET6;
+ } else if (!eth_type) {
+ family = AF_UNSPEC;
} else {
- fprintf(stderr, "Illegal \"eth_type\" for ip address\n");
return -1;
}
@@ -130,8 +134,10 @@ static int flower_parse_ip_addr(char *str, __be16 eth_type,
if (ret)
return -1;
- if (addr.family != family)
+ if (family && (addr.family != family)) {
+ fprintf(stderr, "Illegal \"eth_type\" for ip address\n");
return -1;
+ }
addattr_l(n, MAX_MSG, addr.family == AF_INET ? addr4_type : addr6_type,
addr.data, addr.bytelen);
@@ -181,6 +187,18 @@ static int flower_parse_port(char *str, __u8 ip_port,
return 0;
}
+static int flower_parse_key_id(const char *str, int type, struct nlmsghdr *n)
+{
+ int ret;
+ __be32 key_id;
+
+ ret = get_be32(&key_id, str, 10);
+ if (!ret)
+ addattr32(n, MAX_MSG, type, key_id);
+
+ return ret;
+}
+
static int flower_parse_opt(struct filter_util *qu, char *handle,
int argc, char **argv, struct nlmsghdr *n)
{
@@ -339,6 +357,38 @@ static int flower_parse_opt(struct filter_util *qu, char *handle,
fprintf(stderr, "Illegal \"src_port\"\n");
return -1;
}
+ } else if (matches(*argv, "enc_dst_ip") == 0) {
+ NEXT_ARG();
+ ret = flower_parse_ip_addr(*argv, 0,
+ TCA_FLOWER_KEY_ENC_IPV4_DST,
+ TCA_FLOWER_KEY_ENC_IPV4_DST_MASK,
+ TCA_FLOWER_KEY_ENC_IPV6_DST,
+ TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,
+ n);
+ if (ret < 0) {
+ fprintf(stderr, "Illegal \"enc_dst_ip\"\n");
+ return -1;
+ }
+ } else if (matches(*argv, "enc_src_ip") == 0) {
+ NEXT_ARG();
+ ret = flower_parse_ip_addr(*argv, 0,
+ TCA_FLOWER_KEY_ENC_IPV4_SRC,
+ TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK,
+ TCA_FLOWER_KEY_ENC_IPV6_SRC,
+ TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK,
+ n);
+ if (ret < 0) {
+ fprintf(stderr, "Illegal \"enc_src_ip\"\n");
+ return -1;
+ }
+ } else if (matches(*argv, "enc_key_id") == 0) {
+ NEXT_ARG();
+ ret = flower_parse_key_id(*argv,
+ TCA_FLOWER_KEY_ENC_KEY_ID, n);
+ if (ret < 0) {
+ fprintf(stderr, "Illegal \"enc_key_id\"\n");
+ return -1;
+ }
} else if (matches(*argv, "action") == 0) {
NEXT_ARG();
ret = parse_action(&argc, &argv, TCA_FLOWER_ACT, n);
@@ -506,7 +556,14 @@ static void flower_print_port(FILE *f, char *name, __u8 ip_proto,
return;
if (!attr)
return;
- fprintf(f, "\n %s %d", name, ntohs(rta_getattr_u16(attr)));
+ fprintf(f, "\n %s %d", name, rta_getattr_be16(attr));
+}
+
+static void flower_print_key_id(FILE *f, const char *name,
+ struct rtattr *attr)
+{
+ if (attr)
+ fprintf(f, "\n %s %d", name, rta_getattr_be32(attr));
}
static int flower_print_opt(struct filter_util *qu, FILE *f,
@@ -577,6 +634,25 @@ static int flower_print_opt(struct filter_util *qu, FILE *f,
tb[TCA_FLOWER_KEY_TCP_SRC],
tb[TCA_FLOWER_KEY_UDP_SRC]);
+ flower_print_ip_addr(f, "enc_dst_ip",
+ tb[TCA_FLOWER_KEY_ENC_IPV4_DST_MASK] ?
+ htons(ETH_P_IP) : htons(ETH_P_IPV6),
+ tb[TCA_FLOWER_KEY_ENC_IPV4_DST],
+ tb[TCA_FLOWER_KEY_ENC_IPV4_DST_MASK],
+ tb[TCA_FLOWER_KEY_ENC_IPV6_DST],
+ tb[TCA_FLOWER_KEY_ENC_IPV6_DST_MASK]);
+
+ flower_print_ip_addr(f, "enc_src_ip",
+ tb[TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK] ?
+ htons(ETH_P_IP) : htons(ETH_P_IPV6),
+ tb[TCA_FLOWER_KEY_ENC_IPV4_SRC],
+ tb[TCA_FLOWER_KEY_ENC_IPV4_SRC_MASK],
+ tb[TCA_FLOWER_KEY_ENC_IPV6_SRC],
+ tb[TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK]);
+
+ flower_print_key_id(f, "enc_key_id",
+ tb[TCA_FLOWER_KEY_ENC_KEY_ID]);
+
if (tb[TCA_FLOWER_FLAGS]) {
__u32 flags = rta_getattr_u32(tb[TCA_FLOWER_FLAGS]);
--
2.10.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action
2016-11-30 7:38 [PATCH iproute2 V3 0/3] tc: Support for ip tunnel metadata set/unset/classify Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 1/3] libnetlink: Introduce rta_getattr_be*() Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 2/3] tc/cls_flower: Classify packet in ip tunnels Amir Vadai
@ 2016-11-30 7:38 ` Amir Vadai
2016-11-30 14:44 ` Jiri Benc
2 siblings, 1 reply; 6+ messages in thread
From: Amir Vadai @ 2016-11-30 7:38 UTC (permalink / raw)
To: Stephen Hemminger
Cc: netdev, David S. Miller, Or Gerlitz, Hadar Har-Zion, Jiri Pirko,
Jiri Benc, Amir Vadai
This action could be used before redirecting packets to a shared tunnel
device, or when redirecting packets arriving from a such a device.
The 'unset' action is optional. It is used to explicitly unset the
metadata created by the tunnel device during decap. If not used, the
metadata will be released automatically by the kernel.
The 'set' operation, will set the metadata with the specified values for
the encap.
For example, the following flower filter will forward all ICMP packets
destined to 11.11.11.2 through the shared vxlan device 'vxlan0'. Before
redirecting, a metadata for the vxlan tunnel is created using the
tunnel_key action and it's arguments:
$ tc filter add dev net0 protocol ip parent ffff: \
flower \
ip_proto 1 \
dst_ip 11.11.11.2 \
action tunnel_key set \
src_ip 11.11.0.1 \
dst_ip 11.11.0.2 \
id 11 \
action mirred egress redirect dev vxlan0
Signed-off-by: Amir Vadai <amir@vadai.me>
---
include/linux/tc_act/tc_tunnel_key.h | 42 ++++++
man/man8/tc-tunnel_key.8 | 113 +++++++++++++++
tc/Makefile | 1 +
tc/m_tunnel_key.c | 258 +++++++++++++++++++++++++++++++++++
4 files changed, 414 insertions(+)
create mode 100644 include/linux/tc_act/tc_tunnel_key.h
create mode 100644 man/man8/tc-tunnel_key.8
create mode 100644 tc/m_tunnel_key.c
diff --git a/include/linux/tc_act/tc_tunnel_key.h b/include/linux/tc_act/tc_tunnel_key.h
new file mode 100644
index 000000000000..f9ddf5369a45
--- /dev/null
+++ b/include/linux/tc_act/tc_tunnel_key.h
@@ -0,0 +1,42 @@
+/*
+ * Copyright (c) 2016, Amir Vadai <amir@vadai.me>
+ * Copyright (c) 2016, Mellanox Technologies. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#ifndef __LINUX_TC_TUNNEL_KEY_H
+#define __LINUX_TC_TUNNEL_KEY_H
+
+#include <linux/pkt_cls.h>
+
+#define TCA_ACT_TUNNEL_KEY 17
+
+#define TCA_TUNNEL_KEY_ACT_SET 1
+#define TCA_TUNNEL_KEY_ACT_RELEASE 2
+
+struct tc_tunnel_key {
+ tc_gen;
+ int t_action;
+};
+
+enum {
+ TCA_TUNNEL_KEY_UNSPEC,
+ TCA_TUNNEL_KEY_TM,
+ TCA_TUNNEL_KEY_PARMS,
+ TCA_TUNNEL_KEY_ENC_IPV4_SRC, /* be32 */
+ TCA_TUNNEL_KEY_ENC_IPV4_DST, /* be32 */
+ TCA_TUNNEL_KEY_ENC_IPV6_SRC, /* struct in6_addr */
+ TCA_TUNNEL_KEY_ENC_IPV6_DST, /* struct in6_addr */
+ TCA_TUNNEL_KEY_ENC_KEY_ID, /* be64 */
+ TCA_TUNNEL_KEY_PAD,
+ __TCA_TUNNEL_KEY_MAX,
+};
+
+#define TCA_TUNNEL_KEY_MAX (__TCA_TUNNEL_KEY_MAX - 1)
+
+#endif
+
diff --git a/man/man8/tc-tunnel_key.8 b/man/man8/tc-tunnel_key.8
new file mode 100644
index 000000000000..d0c333d27158
--- /dev/null
+++ b/man/man8/tc-tunnel_key.8
@@ -0,0 +1,113 @@
+.TH "Tunnel metadata manipulation action in tc" 8 "10 Nov 2016" "iproute2" "Linux"
+
+.SH NAME
+tunnel_key - Tunnel metadata manipulation
+.SH SYNOPSIS
+.in +8
+.ti -8
+.BR tc " ... " "action tunnel_key" " { " unset " | "
+.IR SET " }"
+
+.ti -8
+.IR SET " := "
+.BR set " " src_ip
+.IR ADDRESS
+.BR dst_ip
+.IR ADDRESS
+.BI id " KEY_ID"
+
+.SH DESCRIPTION
+The
+.B tunnel_key
+action combined with a shared IP tunnel device, allows to perform IP tunnel en-
+or decapsulation on a packet, reflected by
+the operation modes
+.IR UNSET " and " SET .
+The
+.I UNSET
+mode is optional - even without using it, the metadata information will be
+released automatically when packet processing will be finished.
+.IR UNSET
+function could be used in cases when traffic is forwarded between two tunnels,
+where the metadata from the first tunnel will be used for encapsulation done by
+the second tunnel.
+It must be used for offloaded filters, such that hardware drivers can
+realize they need to program the HW to do decapsulation.
+.IR SET
+mode requires the source and destination ip
+.I ADDRESS
+and the tunnel key id
+.I KEY_ID
+which will be used by the ip tunnel shared device to create the tunnel header. The
+.B tunnel_key
+action is useful only in combination with a
+.B mirred redirect
+action to a shared IP tunnel device which will use the metadata (for
+.I SET
+) and unset the metadata created by it (for
+.I UNSET
+).
+
+.SH OPTIONS
+.TP
+.B unset
+Decapsulation mode, no further arguments allowed. This function is not
+mandatory and might be used only in some specific use cases.
+.TP
+.B set
+Encapsulation mode. Requires
+.B id
+,
+.B src_ip
+and
+.B dst_ip
+options.
+.RS
+.TP
+.B id
+Tunnel ID (for example VNI in VXLAN tunnel)
+.TP
+.B src_ip
+Outer header source IP address (IPv4 or IPv6)
+.TP
+.B dst_ip
+Outer header destination IP address (IPv4 or IPv6)
+.RE
+.SH EXAMPLES
+The following example encapsulates incoming ICMP packets on eth0 into a vxlan
+tunnel by setting metadata to VNI 11, source IP 11.11.0.1 and destination IP
+11.11.0.2 by forwarding the skb with the metadata to device vxlan0, which will
+prepare the VXLAN headers:
+
+.RS
+.EX
+#tc qdisc add dev eth0 handle ffff: ingress
+#tc filter add dev eth0 protocol ip parent ffff: \\
+ flower \\
+ ip_proto icmp \\
+ action tunnel_key set \\
+ src_ip 11.11.0.1 \\
+ dst_ip 11.11.0.2 \\
+ id 11 \\
+ action mirred egress redirect dev vxlan0
+.EE
+.RE
+
+Here is an example of the
+.B unset
+function: Incoming VXLAN packets on vxlan0 with specific outer IP's and VNI 11
+in the metadata are decapsulated and redirected to eth0:
+
+.RS
+.EX
+#tc qdisc add dev eth0 handle ffff: ingress
+#tc filter add dev vxlan0 protocol ip parent ffff: \
+ flower \\
+ enc_src_ip 11.11.0.2 enc_dst_ip 11.11.0.1 enc_key_id 11 \
+ action tunnel_key unset \
+ action mirred egress redirect dev eth0
+.EE
+.RE
+
+.SH SEE ALSO
+.BR tc (8)
diff --git a/tc/Makefile b/tc/Makefile
index dfa875b5edaf..f6f41ca2bb3d 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -50,6 +50,7 @@ TCMODULES += m_simple.o
TCMODULES += m_vlan.o
TCMODULES += m_connmark.o
TCMODULES += m_bpf.o
+TCMODULES += m_tunnel_key.o
TCMODULES += p_ip.o
TCMODULES += p_icmp.o
TCMODULES += p_tcp.o
diff --git a/tc/m_tunnel_key.c b/tc/m_tunnel_key.c
new file mode 100644
index 000000000000..f4a20e24e0bf
--- /dev/null
+++ b/tc/m_tunnel_key.c
@@ -0,0 +1,258 @@
+/*
+ * m_tunnel_key.c ip tunnel manipulation module
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors: Amir Vadai <amir@vadai.me>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/if_ether.h>
+#include "utils.h"
+#include "rt_names.h"
+#include "tc_util.h"
+#include <linux/tc_act/tc_tunnel_key.h>
+
+static void explain(void)
+{
+ fprintf(stderr, "Usage: tunnel_key unset\n");
+ fprintf(stderr, " tunnel_key set id TUNNELID src_ip IP dst_ip IP\n");
+}
+
+static void usage(void)
+{
+ explain();
+ exit(-1);
+}
+
+static int tunnel_key_parse_ip_addr(const char *str, int addr4_type,
+ int addr6_type, struct nlmsghdr *n)
+{
+ inet_prefix addr;
+ int ret;
+
+ ret = get_addr(&addr, str, AF_UNSPEC);
+ if (ret)
+ return ret;
+
+ addattr_l(n, MAX_MSG, addr.family == AF_INET ? addr4_type : addr6_type,
+ addr.data, addr.bytelen);
+
+ return 0;
+}
+
+static int tunnel_key_parse_key_id(const char *str, int type,
+ struct nlmsghdr *n)
+{
+ __be32 key_id;
+ int ret;
+
+ ret = get_be32(&key_id, str, 10);
+ if (!ret)
+ addattr32(n, MAX_MSG, type, key_id);
+
+ return ret;
+}
+
+static int parse_tunnel_key(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n)
+{
+ struct tc_tunnel_key parm = { .action = TC_ACT_PIPE };
+ char **argv = *argv_p;
+ int argc = *argc_p;
+ struct rtattr *tail;
+ int action = 0;
+ int ret;
+ int has_src_ip = 0;
+ int has_dst_ip = 0;
+ int has_key_id = 0;
+
+ if (matches(*argv, "tunnel_key") != 0)
+ return -1;
+
+ tail = NLMSG_TAIL(n);
+ addattr_l(n, MAX_MSG, tca_id, NULL, 0);
+
+ NEXT_ARG();
+
+ while (argc > 0) {
+ if (matches(*argv, "unset") == 0) {
+ if (action) {
+ fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+ *argv);
+ explain();
+ return -1;
+ }
+ action = TCA_TUNNEL_KEY_ACT_RELEASE;
+ } else if (matches(*argv, "set") == 0) {
+ if (action) {
+ fprintf(stderr, "unexpected \"%s\" - action already specified\n",
+ *argv);
+ explain();
+ return -1;
+ }
+ action = TCA_TUNNEL_KEY_ACT_SET;
+ } else if (matches(*argv, "src_ip") == 0) {
+ NEXT_ARG();
+ ret = tunnel_key_parse_ip_addr(*argv,
+ TCA_TUNNEL_KEY_ENC_IPV4_SRC,
+ TCA_TUNNEL_KEY_ENC_IPV6_SRC,
+ n);
+ if (ret < 0) {
+ fprintf(stderr, "Illegal \"src_ip\"\n");
+ return -1;
+ }
+ has_src_ip = 1;
+ } else if (matches(*argv, "dst_ip") == 0) {
+ NEXT_ARG();
+ ret = tunnel_key_parse_ip_addr(*argv,
+ TCA_TUNNEL_KEY_ENC_IPV4_DST,
+ TCA_TUNNEL_KEY_ENC_IPV6_DST,
+ n);
+ if (ret < 0) {
+ fprintf(stderr, "Illegal \"dst_ip\"\n");
+ return -1;
+ }
+ has_dst_ip = 1;
+ } else if (matches(*argv, "id") == 0) {
+ NEXT_ARG();
+ ret = tunnel_key_parse_key_id(*argv, TCA_TUNNEL_KEY_ENC_KEY_ID, n);
+ if (ret < 0) {
+ fprintf(stderr, "Illegal \"id\"\n");
+ return -1;
+ }
+ has_key_id = 1;
+ } else if (matches(*argv, "help") == 0) {
+ usage();
+ } else {
+ break;
+ }
+ NEXT_ARG_FWD();
+ }
+
+ if (argc && !action_a2n(*argv, &parm.action, false))
+ NEXT_ARG_FWD();
+
+ if (argc) {
+ if (matches(*argv, "index") == 0) {
+ NEXT_ARG();
+ if (get_u32(&parm.index, *argv, 10)) {
+ fprintf(stderr, "tunnel_key: Illegal \"index\"\n");
+ return -1;
+ }
+
+ NEXT_ARG_FWD();
+ }
+ }
+
+ if (action == TCA_TUNNEL_KEY_ACT_SET &&
+ (!has_src_ip || !has_dst_ip || !has_key_id)) {
+ fprintf(stderr, "set needs tunnel_key parameters\n");
+ explain();
+ return -1;
+ }
+
+ parm.t_action = action;
+ addattr_l(n, MAX_MSG, TCA_TUNNEL_KEY_PARMS, &parm, sizeof(parm));
+ tail->rta_len = (char *)NLMSG_TAIL(n) - (char *)tail;
+
+ *argc_p = argc;
+ *argv_p = argv;
+
+ return 0;
+}
+
+static void tunnel_key_print_ip_addr(FILE *f, const char *name,
+ struct rtattr *attr)
+{
+ int family;
+ size_t len;
+
+ if (!attr)
+ return;
+
+ len = RTA_PAYLOAD(attr);
+
+ if (len == 4)
+ family = AF_INET;
+ else if (len == 16)
+ family = AF_INET6;
+ else
+ return;
+
+ fprintf(f, "\n\t%s %s", name, rt_addr_n2a_rta(family, attr));
+}
+
+static void tunnel_key_print_key_id(FILE *f, const char *name,
+ struct rtattr *attr)
+{
+ if (!attr)
+ return;
+ fprintf(f, "\n\t%s %d", name, rta_getattr_be32(attr));
+}
+
+static int print_tunnel_key(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+ struct rtattr *tb[TCA_TUNNEL_KEY_MAX + 1];
+ struct tc_tunnel_key *parm;
+
+ if (!arg)
+ return -1;
+
+ parse_rtattr_nested(tb, TCA_TUNNEL_KEY_MAX, arg);
+
+ if (!tb[TCA_TUNNEL_KEY_PARMS]) {
+ fprintf(f, "[NULL tunnel_key parameters]");
+ return -1;
+ }
+ parm = RTA_DATA(tb[TCA_TUNNEL_KEY_PARMS]);
+
+ fprintf(f, "tunnel_key");
+
+ switch (parm->t_action) {
+ case TCA_TUNNEL_KEY_ACT_RELEASE:
+ fprintf(f, " unset");
+ break;
+ case TCA_TUNNEL_KEY_ACT_SET:
+ fprintf(f, " set");
+ tunnel_key_print_ip_addr(f, "src_ip",
+ tb[TCA_TUNNEL_KEY_ENC_IPV4_SRC]);
+ tunnel_key_print_ip_addr(f, "dst_ip",
+ tb[TCA_TUNNEL_KEY_ENC_IPV4_DST]);
+ tunnel_key_print_ip_addr(f, "src_ip",
+ tb[TCA_TUNNEL_KEY_ENC_IPV6_SRC]);
+ tunnel_key_print_ip_addr(f, "dst_ip",
+ tb[TCA_TUNNEL_KEY_ENC_IPV6_DST]);
+ tunnel_key_print_key_id(f, "key_id",
+ tb[TCA_TUNNEL_KEY_ENC_KEY_ID]);
+ break;
+ }
+ fprintf(f, " %s", action_n2a(parm->action));
+
+ fprintf(f, "\n\tindex %d ref %d bind %d", parm->index, parm->refcnt,
+ parm->bindcnt);
+
+ if (show_stats) {
+ if (tb[TCA_TUNNEL_KEY_TM]) {
+ struct tcf_t *tm = RTA_DATA(tb[TCA_TUNNEL_KEY_TM]);
+
+ print_tm(f, tm);
+ }
+ }
+
+ fprintf(f, "\n ");
+
+ return 0;
+}
+
+struct action_util tunnel_key_action_util = {
+ .id = "tunnel_key",
+ .parse_aopt = parse_tunnel_key,
+ .print_aopt = print_tunnel_key,
+};
--
2.10.2
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action
2016-11-30 7:38 ` [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action Amir Vadai
@ 2016-11-30 14:44 ` Jiri Benc
2016-12-01 9:57 ` Amir Vadai
0 siblings, 1 reply; 6+ messages in thread
From: Jiri Benc @ 2016-11-30 14:44 UTC (permalink / raw)
To: Amir Vadai
Cc: Stephen Hemminger, netdev, David S. Miller, Or Gerlitz,
Hadar Har-Zion, Jiri Pirko
On Wed, 30 Nov 2016 09:38:40 +0200, Amir Vadai wrote:
> +The
> +.I UNSET
> +mode is optional - even without using it, the metadata information will be
> +released automatically when packet processing will be finished.
> +.IR UNSET
> +function could be used in cases when traffic is forwarded between two tunnels,
> +where the metadata from the first tunnel will be used for encapsulation done by
> +the second tunnel.
This looks good.
> +It must be used for offloaded filters, such that hardware drivers can
> +realize they need to program the HW to do decapsulation.
However, this is wrong. The hardware offloading must be transparent.
The same configuration that works when processed in software must work
in hardware if the hardware has the necessary capabilities. Requiring
the user to alter the configuration to accommodate hardware
peculiarities is not acceptable.
Or maybe I'm misunderstanding what you mean here. In which case it's
not documented properly :-)
> +.IR SET
> +mode requires the source and destination ip
> +.I ADDRESS
> +and the tunnel key id
> +.I KEY_ID
> +which will be used by the ip tunnel shared device to create the tunnel header. The
> +.B tunnel_key
> +action is useful only in combination with a
> +.B mirred redirect
> +action to a shared IP tunnel device which will use the metadata (for
> +.I SET
> +) and unset the metadata created by it (for
> +.I UNSET
> +).
> +
> +.SH OPTIONS
> +.TP
> +.B unset
> +Decapsulation mode, no further arguments allowed. This function is not
> +mandatory and might be used only in some specific use cases.
This is NOT decapsulation. The packet is decapsulated at this point in
any case, whether or not set/unset or whatever is used. These actions
are only and solely about metadata associated with the packet. The
actual encapsulation and decapsulation happens at the target netdevice.
Calling this "decapsulation" is wrong. And if it's implemented as such
in your hardware offloading, then it's doubly wrong as it doesn't match
software processing and hence you must not do that and you must change
that.
> +.TP
> +.B set
> +Encapsulation mode. Requires
Likewise, this is not encapsulation. It just sets metadata.
Jiri
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action
2016-11-30 14:44 ` Jiri Benc
@ 2016-12-01 9:57 ` Amir Vadai
0 siblings, 0 replies; 6+ messages in thread
From: Amir Vadai @ 2016-12-01 9:57 UTC (permalink / raw)
To: Jiri Benc
Cc: Stephen Hemminger, netdev, David S. Miller, Or Gerlitz,
Hadar Har-Zion, Jiri Pirko
On Wed, Nov 30, 2016 at 03:44:53PM +0100, Jiri Benc wrote:
> On Wed, 30 Nov 2016 09:38:40 +0200, Amir Vadai wrote:
> > +The
> > +.I UNSET
> > +mode is optional - even without using it, the metadata information will be
> > +released automatically when packet processing will be finished.
> > +.IR UNSET
> > +function could be used in cases when traffic is forwarded between two tunnels,
> > +where the metadata from the first tunnel will be used for encapsulation done by
> > +the second tunnel.
>
> This looks good.
:)
>
> > +It must be used for offloaded filters, such that hardware drivers can
> > +realize they need to program the HW to do decapsulation.
>
> However, this is wrong. The hardware offloading must be transparent.
> The same configuration that works when processed in software must work
> in hardware if the hardware has the necessary capabilities. Requiring
> the user to alter the configuration to accommodate hardware
> peculiarities is not acceptable.
>
> Or maybe I'm misunderstanding what you mean here. In which case it's
> not documented properly :-)
You understood it correctly. We should not force the user to use the
'unset' operation for offloading only. I will remove it from the text
here.
>
> > +.IR SET
> > +mode requires the source and destination ip
> > +.I ADDRESS
> > +and the tunnel key id
> > +.I KEY_ID
> > +which will be used by the ip tunnel shared device to create the tunnel header. The
> > +.B tunnel_key
> > +action is useful only in combination with a
> > +.B mirred redirect
> > +action to a shared IP tunnel device which will use the metadata (for
> > +.I SET
> > +) and unset the metadata created by it (for
> > +.I UNSET
> > +).
> > +
> > +.SH OPTIONS
> > +.TP
> > +.B unset
> > +Decapsulation mode, no further arguments allowed. This function is not
> > +mandatory and might be used only in some specific use cases.
>
> This is NOT decapsulation. The packet is decapsulated at this point in
> any case, whether or not set/unset or whatever is used. These actions
> are only and solely about metadata associated with the packet. The
> actual encapsulation and decapsulation happens at the target netdevice.
>
> Calling this "decapsulation" is wrong. And if it's implemented as such
> in your hardware offloading, then it's doubly wrong as it doesn't match
> software processing and hence you must not do that and you must change
> that.
Got it. Bad wording by me - will fix it and make sure offloading will
realy be transparent to user.
>
> > +.TP
> > +.B set
> > +Encapsulation mode. Requires
>
> Likewise, this is not encapsulation. It just sets metadata.
ack
Thanks,
Amir
>
> Jiri
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-12-01 9:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-30 7:38 [PATCH iproute2 V3 0/3] tc: Support for ip tunnel metadata set/unset/classify Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 1/3] libnetlink: Introduce rta_getattr_be*() Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 2/3] tc/cls_flower: Classify packet in ip tunnels Amir Vadai
2016-11-30 7:38 ` [PATCH iproute2 V3 3/3] tc/act_tunnel: Introduce ip tunnel action Amir Vadai
2016-11-30 14:44 ` Jiri Benc
2016-12-01 9:57 ` Amir Vadai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).