* [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels
@ 2014-09-26 16:22 Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 1/5] udp: Generalize skb_udp_segment Tom Herbert
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Tom Herbert @ 2014-09-26 16:22 UTC (permalink / raw)
To: davem, netdev
This patch set generalizes the UDP tunnel segmentation functions so
that they can work with various protocol encapsulations. The primary
change is to set the inner_protocol field in the skbuff when creating
the encapsulated packet, and then in skb_udp_tunnel_segment this data
is used to determine the function for segmenting the encapsulated
packet. The inner_protocol field is overloaded to take either an
Ethertype or IP protocol.
The inner_protocol is set on transmit using skb_set_inner_ipproto or
skb_set_inner_protocol functions. VXLAN and IP tunnels (for fou GSO)
were modified to call these.
Notes:
- GSO for GRE/UDP where GRE checksum is enabled does not work.
Handling this will require some special case code.
- Software GSO now supports many varieties of encapsulation with
SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
for device support of particular combinations (I intend to
add ndo_gso_check for that).
- MPLS seems to be the only previous user of inner_protocol. I don't
believe these patches can affect that. For supporting GSO with
MPLS over UDP, the inner_protocol should be set using the
helper functions in this patch.
- GSO for L2TP/UDP should also be straightforward now.
Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
done using 200 TCP_STREAMs in netperf.
GRE
IPv4, FOU, UDP checksum enabled
TCP_STREAM TSO enabled on tun interface
14.04% TX CPU utilization
13.17% RX CPU utilization
9211 Mbps
TCP_STREAM TSO disabled on tun interface
27.82% TX CPU utilization
25.41% RX CPU utilization
9336 Mbps
IPv4, FOU, UDP checksum disabled
TCP_STREAM TSO enabled on tun interface
13.14% TX CPU utilization
23.18% RX CPU utilization
9277 Mbps
TCP_STREAM TSO disabled on tun interface
30.00% TX CPU utilization
31.28% RX CPU utilization
9327 Mbps
IPIP
FOU, UDP checksum enabled
TCP_STREAM TSO enabled on tun interface
15.28% TX CPU utilization
13.92% RX CPU utilization
9342 Mbps
TCP_STREAM TSO disabled on tun interface
27.82% TX CPU utilization
25.41% RX CPU utilization
9336 Mbps
FOU, UDP checksum disabled
TCP_STREAM TSO enabled on tun interface
15.08% TX CPU utilization
24.64% RX CPU utilization
9226 Mbps
TCP_STREAM TSO disabled on tun interface
30.00% TX CPU utilization
31.28% RX CPU utilization
9327 Mbps
SIT
FOU, UDP checksum enabled
TCP_STREAM TSO enabled on tun interface
14.47% TX CPU utilization
14.58% RX CPU utilization
9106 Mbps
TCP_STREAM TSO disabled on tun interface
31.82% TX CPU utilization
30.82% RX CPU utilization
9204 Mbps
FOU, UDP checksum disabled
TCP_STREAM TSO enabled on tun interface
15.70% TX CPU utilization
27.93% RX CPU utilization
9097 Mbps
TCP_STREAM TSO disabled on tun interface
33.48% TX CPU utilization
37.36% RX CPU utilization
9197 Mbps
VXLAN
TCP_STREAM TSO enabled on tun interface
16.42% TX CPU utilization
23.66% RX CPU utilization
9081 Mbps
TCP_STREAM TSO disabled on tun interface
30.32% TX CPU utilization
30.55% RX CPU utilization
9185 Mbps
Baseline (no encp, TSO and LRO enabled)
TCP_STREAM
11.85% TX CPU utilization
15.13% RX CPU utilization
9452 Mbps
Tom Herbert (5):
udp: Generalize skb_udp_segment
sit: Set inner IP protocol in sit
ipip: Set inner IP protocol in ipip
gre: Set inner protocol in v4 and v6 GRE transmit
vxlan: Set inner protocol before transmit
drivers/net/vxlan.c | 4 ++++
include/linux/skbuff.h | 26 +++++++++++++++++++++++--
include/net/udp.h | 3 ++-
net/core/skbuff.c | 1 +
net/ipv4/ip_gre.c | 2 ++
net/ipv4/ipip.c | 2 ++
net/ipv4/udp_offload.c | 51 +++++++++++++++++++++++++++++++++++++++++++++-----
net/ipv6/ip6_gre.c | 8 ++++++--
net/ipv6/sit.c | 4 ++++
net/ipv6/udp_offload.c | 2 +-
10 files changed, 92 insertions(+), 11 deletions(-)
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/5] udp: Generalize skb_udp_segment
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
@ 2014-09-26 16:22 ` Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 2/5] sit: Set inner IP protocol in sit Tom Herbert
` (5 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Tom Herbert @ 2014-09-26 16:22 UTC (permalink / raw)
To: davem, netdev
skb_udp_segment is the function called from udp4_ufo_fragment to
segment a UDP tunnel packet. This function currently assumes
segmentation is transparent Ethernet bridging (i.e. VXLAN
encapsulation). This patch generalizes the function to
operate on either Ethertype or IP protocol.
The inner_protocol field must be set to the protocol of the inner
header. This can now be either an Ethertype or an IP protocol
(in a union). A new flag in the skbuff indicates which type is
effective. skb_set_inner_protocol and skb_set_inner_ipproto
helper functions were added to set the inner_protocol. These
functions are called from the point where the tunnel encapsulation
is occuring.
When skb_udp_tunnel_segment is called, the function to segment the
inner packet is selected based on the inner IP or Ethertype. In the
case of an IP protocol encapsulation, the function is derived from
inet[6]_offloads. In the case of Ethertype, skb->protocol is
set to the inner_protocol and skb_mac_gso_segment is called. (GRE
currently does this, but it might be possible to lookup the protocol
in offload_base and call the appropriate segmenation function
directly).
Signed-off-by: Tom Herbert <therbert@google.com>
---
include/linux/skbuff.h | 26 +++++++++++++++++++++++--
include/net/udp.h | 3 ++-
net/core/skbuff.c | 1 +
net/ipv4/udp_offload.c | 51 +++++++++++++++++++++++++++++++++++++++++++++-----
net/ipv6/udp_offload.c | 2 +-
5 files changed, 74 insertions(+), 9 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f1bfa37..7973fcb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -628,10 +628,15 @@ struct sk_buff {
kmemcheck_bitfield_begin(flags3);
__u8 csum_level:2;
__u8 csum_bad:1;
- /* 13 bit hole */
+ __u8 inner_protocol_type:1;
+ /* 12 bit hole */
kmemcheck_bitfield_end(flags3);
- __be16 inner_protocol;
+ union {
+ __be16 inner_protocol;
+ __u8 inner_ipproto;
+ };
+
__u16 inner_transport_header;
__u16 inner_network_header;
__u16 inner_mac_header;
@@ -1716,6 +1721,23 @@ static inline void skb_reserve(struct sk_buff *skb, int len)
skb->tail += len;
}
+#define ENCAP_TYPE_ETHER 0
+#define ENCAP_TYPE_IPPROTO 1
+
+static inline void skb_set_inner_protocol(struct sk_buff *skb,
+ __be16 protocol)
+{
+ skb->inner_protocol = protocol;
+ skb->inner_protocol_type = ENCAP_TYPE_ETHER;
+}
+
+static inline void skb_set_inner_ipproto(struct sk_buff *skb,
+ __u8 ipproto)
+{
+ skb->inner_ipproto = ipproto;
+ skb->inner_protocol_type = ENCAP_TYPE_IPPROTO;
+}
+
static inline void skb_reset_inner_headers(struct sk_buff *skb)
{
skb->inner_mac_header = skb->mac_header;
diff --git a/include/net/udp.h b/include/net/udp.h
index 16f4e80..07f9b70 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -239,7 +239,8 @@ int udp_ioctl(struct sock *sk, int cmd, unsigned long arg);
int udp_disconnect(struct sock *sk, int flags);
unsigned int udp_poll(struct file *file, struct socket *sock, poll_table *wait);
struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
- netdev_features_t features);
+ netdev_features_t features,
+ bool is_ipv6);
int udp_lib_getsockopt(struct sock *sk, int level, int optname,
char __user *optval, int __user *optlen);
int udp_lib_setsockopt(struct sock *sk, int level, int optname,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 06a8feb..f26e63b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -682,6 +682,7 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
new->network_header = old->network_header;
new->mac_header = old->mac_header;
new->inner_protocol = old->inner_protocol;
+ new->inner_protocol_type = old->inner_protocol_type;
new->inner_transport_header = old->inner_transport_header;
new->inner_network_header = old->inner_network_header;
new->inner_mac_header = old->inner_mac_header;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 19ebe6a..8c35f2c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -25,8 +25,11 @@ struct udp_offload_priv {
struct udp_offload_priv __rcu *next;
};
-struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
- netdev_features_t features)
+static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
+ netdev_features_t features,
+ struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
+ netdev_features_t features),
+ __be16 new_protocol)
{
struct sk_buff *segs = ERR_PTR(-EINVAL);
u16 mac_offset = skb->mac_header;
@@ -48,7 +51,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb_inner_network_offset(skb));
skb->mac_len = skb_inner_network_offset(skb);
- skb->protocol = htons(ETH_P_TEB);
+ skb->protocol = new_protocol;
need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM);
if (need_csum)
@@ -56,7 +59,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
/* segment inner packet. */
enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
- segs = skb_mac_gso_segment(skb, enc_features);
+ segs = gso_inner_segment(skb, enc_features);
if (IS_ERR_OR_NULL(segs)) {
skb_gso_error_unwind(skb, protocol, tnl_hlen, mac_offset,
mac_len);
@@ -101,6 +104,44 @@ out:
return segs;
}
+struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
+ netdev_features_t features,
+ bool is_ipv6)
+{
+ __be16 protocol = skb->protocol;
+ const struct net_offload **offloads;
+ const struct net_offload *ops;
+ struct sk_buff *segs = ERR_PTR(-EINVAL);
+ struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
+ netdev_features_t features);
+
+ rcu_read_lock();
+
+ switch (skb->inner_protocol_type) {
+ case ENCAP_TYPE_ETHER:
+ protocol = skb->inner_protocol;
+ gso_inner_segment = skb_mac_gso_segment;
+ break;
+ case ENCAP_TYPE_IPPROTO:
+ offloads = is_ipv6 ? inet6_offloads : inet_offloads;
+ ops = rcu_dereference(offloads[skb->inner_ipproto]);
+ if (!ops || !ops->callbacks.gso_segment)
+ goto out_unlock;
+ gso_inner_segment = ops->callbacks.gso_segment;
+ break;
+ default:
+ goto out_unlock;
+ }
+
+ segs = __skb_udp_tunnel_segment(skb, features, gso_inner_segment,
+ protocol);
+
+out_unlock:
+ rcu_read_unlock();
+
+ return segs;
+}
+
static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
netdev_features_t features)
{
@@ -113,7 +154,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
if (skb->encapsulation &&
(skb_shinfo(skb)->gso_type &
(SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))) {
- segs = skb_udp_tunnel_segment(skb, features);
+ segs = skb_udp_tunnel_segment(skb, features, false);
goto out;
}
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 212ebfc..8f96988 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -58,7 +58,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
if (skb->encapsulation && skb_shinfo(skb)->gso_type &
(SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))
- segs = skb_udp_tunnel_segment(skb, features);
+ segs = skb_udp_tunnel_segment(skb, features, true);
else {
const struct ipv6hdr *ipv6h;
struct udphdr *uh;
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 2/5] sit: Set inner IP protocol in sit
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 1/5] udp: Generalize skb_udp_segment Tom Herbert
@ 2014-09-26 16:22 ` Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 3/5] ipip: Set inner IP protocol in ipip Tom Herbert
` (4 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Tom Herbert @ 2014-09-26 16:22 UTC (permalink / raw)
To: davem, netdev
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV6
before tunnel_xmit. This is needed if UDP encapsulation (fou) is
being done.
Signed-off-by: Tom Herbert <therbert@google.com>
---
net/ipv6/sit.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index db75809..0d4e274 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -982,6 +982,8 @@ static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
goto tx_error;
}
+ skb_set_inner_ipproto(skb, IPPROTO_IPV6);
+
err = iptunnel_xmit(skb->sk, rt, skb, fl4.saddr, fl4.daddr,
protocol, tos, ttl, df,
!net_eq(tunnel->net, dev_net(dev)));
@@ -1006,6 +1008,8 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
if (IS_ERR(skb))
goto out;
+ skb_set_inner_ipproto(skb, IPPROTO_IPIP);
+
ip_tunnel_xmit(skb, dev, tiph, IPPROTO_IPIP);
return NETDEV_TX_OK;
out:
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 3/5] ipip: Set inner IP protocol in ipip
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 1/5] udp: Generalize skb_udp_segment Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 2/5] sit: Set inner IP protocol in sit Tom Herbert
@ 2014-09-26 16:22 ` Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 4/5] gre: Set inner protocol in v4 and v6 GRE transmit Tom Herbert
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Tom Herbert @ 2014-09-26 16:22 UTC (permalink / raw)
To: davem, netdev
Call skb_set_inner_ipproto to set inner IP protocol to IPPROTO_IPV4
before tunnel_xmit. This is needed if UDP encapsulation (fou) is
being done.
Signed-off-by: Tom Herbert <therbert@google.com>
---
net/ipv4/ipip.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index bfec31d..ea88ab3 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -224,6 +224,8 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
if (IS_ERR(skb))
goto out;
+ skb_set_inner_ipproto(skb, IPPROTO_IPIP);
+
ip_tunnel_xmit(skb, dev, tiph, tiph->protocol);
return NETDEV_TX_OK;
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 4/5] gre: Set inner protocol in v4 and v6 GRE transmit
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
` (2 preceding siblings ...)
2014-09-26 16:22 ` [PATCH net-next 3/5] ipip: Set inner IP protocol in ipip Tom Herbert
@ 2014-09-26 16:22 ` Tom Herbert
2014-09-30 5:02 ` Simon Horman
2014-09-26 16:22 ` [PATCH net-next 5/5] vxlan: Set inner protocol before transmit Tom Herbert
` (2 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Tom Herbert @ 2014-09-26 16:22 UTC (permalink / raw)
To: davem, netdev
Call skb_set_inner_protocol to set inner Ethernet protocol to
protocol being encapsulation by GRE before tunnel_xmit. This is
needed for GSO if UDP encapsulation (fou) is being done.
Signed-off-by: Tom Herbert <therbert@google.com>
---
net/ipv4/ip_gre.c | 2 ++
net/ipv6/ip6_gre.c | 8 ++++++--
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 829aff8b..0485ef1 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -241,6 +241,8 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
/* Push GRE header. */
gre_build_header(skb, &tpi, tunnel->tun_hlen);
+ skb_set_inner_protocol(skb, tpi.proto);
+
ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
}
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 5f19dfb..9a0a1aa 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -616,6 +616,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
int err = -1;
u8 proto;
struct sk_buff *new_skb;
+ __be16 protocol;
if (dev->type == ARPHRD_ETHER)
IPCB(skb)->flags = 0;
@@ -732,8 +733,9 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
ipv6h->daddr = fl6->daddr;
((__be16 *)(ipv6h + 1))[0] = tunnel->parms.o_flags;
- ((__be16 *)(ipv6h + 1))[1] = (dev->type == ARPHRD_ETHER) ?
- htons(ETH_P_TEB) : skb->protocol;
+ protocol = (dev->type == ARPHRD_ETHER) ?
+ htons(ETH_P_TEB) : skb->protocol;
+ ((__be16 *)(ipv6h + 1))[1] = protocol;
if (tunnel->parms.o_flags&(GRE_KEY|GRE_CSUM|GRE_SEQ)) {
__be32 *ptr = (__be32 *)(((u8 *)ipv6h) + tunnel->hlen - 4);
@@ -754,6 +756,8 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
}
}
+ skb_set_inner_protocol(skb, protocol);
+
ip6tunnel_xmit(skb, dev);
if (ndst)
ip6_tnl_dst_store(tunnel, ndst);
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 5/5] vxlan: Set inner protocol before transmit
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
` (3 preceding siblings ...)
2014-09-26 16:22 ` [PATCH net-next 4/5] gre: Set inner protocol in v4 and v6 GRE transmit Tom Herbert
@ 2014-09-26 16:22 ` Tom Herbert
2014-09-26 20:16 ` [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Or Gerlitz
2014-09-29 20:43 ` David Miller
6 siblings, 0 replies; 12+ messages in thread
From: Tom Herbert @ 2014-09-26 16:22 UTC (permalink / raw)
To: davem, netdev
Call skb_set_inner_protocol to set inner Ethernet protocol to
ETH_P_TEB before transmit. This is needed for GSO with UDP tunnels.
Signed-off-by: Tom Herbert <therbert@google.com>
---
drivers/net/vxlan.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 34e102e..2af795d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1610,6 +1610,8 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
vxh->vx_flags = htonl(VXLAN_FLAGS);
vxh->vx_vni = vni;
+ skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+
udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
ttl, src_port, dst_port);
return 0;
@@ -1652,6 +1654,8 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
vxh->vx_flags = htonl(VXLAN_FLAGS);
vxh->vx_vni = vni;
+ skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+
return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
ttl, df, src_port, dst_port, xnet);
}
--
2.1.0.rc2.206.gedb03e5
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
` (4 preceding siblings ...)
2014-09-26 16:22 ` [PATCH net-next 5/5] vxlan: Set inner protocol before transmit Tom Herbert
@ 2014-09-26 20:16 ` Or Gerlitz
2014-09-26 23:04 ` Tom Herbert
2014-09-29 20:43 ` David Miller
6 siblings, 1 reply; 12+ messages in thread
From: Or Gerlitz @ 2014-09-26 20:16 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, Linux Netdev List
On Fri, Sep 26, 2014 at 7:22 PM, Tom Herbert <therbert@google.com> wrote:
[...]
> Notes:
> - GSO for GRE/UDP where GRE checksum is enabled does not work.
> Handling this will require some special case code.
> - Software GSO now supports many varieties of encapsulation with
> SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
> for device support of particular combinations (I intend to
> add ndo_gso_check for that).
Tom,
As I wrote you earlier on another thread/s, fact is that there are
upstream drivers who advertize SKB_GSO_UDP_TUNNEL and aren't capable @
this point to issue proper HW segmentation of something which isn't
VXLAN.
Just to make sure, this series isn't expected to introduce a
regression, right? we don't expect the stack to attempt and xmit a
large 64KB UDP packet which isn't vxlan through these devices.
> - MPLS seems to be the only previous user of inner_protocol. I don't
> believe these patches can affect that. For supporting GSO with
> MPLS over UDP, the inner_protocol should be set using the
> helper functions in this patch.
> - GSO for L2TP/UDP should also be straightforward now.
> Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
> done using 200 TCP_STREAMs in netperf.
[...]
> VXLAN
> TCP_STREAM TSO enabled on tun interface
> 16.42% TX CPU utilization
> 23.66% RX CPU utilization
> 9081 Mbps
> TCP_STREAM TSO disabled on tun interface
> 30.32% TX CPU utilization
> 30.55% RX CPU utilization
> 9185 Mbps
so TSO disabled has better BW vs TSO enabled?
> Baseline (no encp, TSO and LRO enabled)
> TCP_STREAM
> 11.85% TX CPU utilization
> 15.13% RX CPU utilization
> 9452 Mbps
I would strongly recommend to have a far better baseline when
developing and testing these changes in the stack in the form of 40Gbs
NICs.
Or.
>
> Tom Herbert (5):
> udp: Generalize skb_udp_segment
> sit: Set inner IP protocol in sit
> ipip: Set inner IP protocol in ipip
> gre: Set inner protocol in v4 and v6 GRE transmit
> vxlan: Set inner protocol before transmit
>
> drivers/net/vxlan.c | 4 ++++
> include/linux/skbuff.h | 26 +++++++++++++++++++++++--
> include/net/udp.h | 3 ++-
> net/core/skbuff.c | 1 +
> net/ipv4/ip_gre.c | 2 ++
> net/ipv4/ipip.c | 2 ++
> net/ipv4/udp_offload.c | 51 +++++++++++++++++++++++++++++++++++++++++++++-----
> net/ipv6/ip6_gre.c | 8 ++++++--
> net/ipv6/sit.c | 4 ++++
> net/ipv6/udp_offload.c | 2 +-
> 10 files changed, 92 insertions(+), 11 deletions(-)
>
> --
> 2.1.0.rc2.206.gedb03e5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels
2014-09-26 20:16 ` [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Or Gerlitz
@ 2014-09-26 23:04 ` Tom Herbert
2014-09-27 19:26 ` Or Gerlitz
0 siblings, 1 reply; 12+ messages in thread
From: Tom Herbert @ 2014-09-26 23:04 UTC (permalink / raw)
To: Or Gerlitz; +Cc: David Miller, Linux Netdev List
On Fri, Sep 26, 2014 at 1:16 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Fri, Sep 26, 2014 at 7:22 PM, Tom Herbert <therbert@google.com> wrote:
> [...]
>> Notes:
>> - GSO for GRE/UDP where GRE checksum is enabled does not work.
>> Handling this will require some special case code.
>> - Software GSO now supports many varieties of encapsulation with
>> SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
>> for device support of particular combinations (I intend to
>> add ndo_gso_check for that).
>
> Tom,
>
> As I wrote you earlier on another thread/s, fact is that there are
> upstream drivers who advertize SKB_GSO_UDP_TUNNEL and aren't capable @
> this point to issue proper HW segmentation of something which isn't
> VXLAN.
>
> Just to make sure, this series isn't expected to introduce a
> regression, right? we don't expect the stack to attempt and xmit a
> large 64KB UDP packet which isn't vxlan through these devices.
>
I am planning to post ndo_gso_check shortly. These patches should not
cause a regression with currently deployed functionality (VXLAN).
>
>
>> - MPLS seems to be the only previous user of inner_protocol. I don't
>> believe these patches can affect that. For supporting GSO with
>> MPLS over UDP, the inner_protocol should be set using the
>> helper functions in this patch.
>> - GSO for L2TP/UDP should also be straightforward now.
>
>> Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
>> done using 200 TCP_STREAMs in netperf.
> [...]
>> VXLAN
>> TCP_STREAM TSO enabled on tun interface
>> 16.42% TX CPU utilization
>> 23.66% RX CPU utilization
>> 9081 Mbps
>> TCP_STREAM TSO disabled on tun interface
>> 30.32% TX CPU utilization
>> 30.55% RX CPU utilization
>> 9185 Mbps
>
> so TSO disabled has better BW vs TSO enabled?
>
Yes, I've noticed that on occasion, it does seem like TSO disabled
tends to get a little more throughput. I see this with plain GRE, so I
don't think it's directly related to fou or these patches. I suppose
there may be some subtle interactions with BQL or something like that.
I'd probably want to repro this on some other devices at some point to
dig deeper.
>> Baseline (no encp, TSO and LRO enabled)
>> TCP_STREAM
>> 11.85% TX CPU utilization
>> 15.13% RX CPU utilization
>> 9452 Mbps
>
> I would strongly recommend to have a far better baseline when
> developing and testing these changes in the stack in the form of 40Gbs
> NICs.
>
The only point of putting the baseline was to show that encapsulation
with GSO/GRO/checksum-unnec-conversion is in the ballpark of
performance with native traffic which was a goal. So I'm pretty happy
with this performance right now, although it probably does mean remote
checksum offload won't show so impressive results with this test (TX
csum with data in case isn't so expensive).
Out of curiosity, why do you think using 40Gbs is far better for a baseline?
> Or.
>
>
>>
>> Tom Herbert (5):
>> udp: Generalize skb_udp_segment
>> sit: Set inner IP protocol in sit
>> ipip: Set inner IP protocol in ipip
>> gre: Set inner protocol in v4 and v6 GRE transmit
>> vxlan: Set inner protocol before transmit
>>
>> drivers/net/vxlan.c | 4 ++++
>> include/linux/skbuff.h | 26 +++++++++++++++++++++++--
>> include/net/udp.h | 3 ++-
>> net/core/skbuff.c | 1 +
>> net/ipv4/ip_gre.c | 2 ++
>> net/ipv4/ipip.c | 2 ++
>> net/ipv4/udp_offload.c | 51 +++++++++++++++++++++++++++++++++++++++++++++-----
>> net/ipv6/ip6_gre.c | 8 ++++++--
>> net/ipv6/sit.c | 4 ++++
>> net/ipv6/udp_offload.c | 2 +-
>> 10 files changed, 92 insertions(+), 11 deletions(-)
>>
>> --
>> 2.1.0.rc2.206.gedb03e5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels
2014-09-26 23:04 ` Tom Herbert
@ 2014-09-27 19:26 ` Or Gerlitz
2014-09-29 3:59 ` Tom Herbert
0 siblings, 1 reply; 12+ messages in thread
From: Or Gerlitz @ 2014-09-27 19:26 UTC (permalink / raw)
To: Tom Herbert; +Cc: David Miller, Linux Netdev List
On Sat, Sep 27, 2014 at 2:04 AM, Tom Herbert <therbert@google.com> wrote:
> On Fri, Sep 26, 2014 at 1:16 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Fri, Sep 26, 2014 at 7:22 PM, Tom Herbert <therbert@google.com> wrote:
>> [...]
>>> Notes:
>>> - GSO for GRE/UDP where GRE checksum is enabled does not work.
>>> Handling this will require some special case code.
>>> - Software GSO now supports many varieties of encapsulation with
>>> SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
>>> for device support of particular combinations (I intend to
>>> add ndo_gso_check for that).
>>
>> Tom,
>>
>> As I wrote you earlier on another thread/s, fact is that there are
>> upstream drivers who advertize SKB_GSO_UDP_TUNNEL and aren't capable @
>> this point to issue proper HW segmentation of something which isn't
>> VXLAN.
>>
>> Just to make sure, this series isn't expected to introduce a
>> regression, right? we don't expect the stack to attempt and xmit a
>> large 64KB UDP packet which isn't vxlan through these devices.
> I am planning to post ndo_gso_check shortly. These patches should not
> cause a regression with currently deployed functionality (VXLAN).
Can you sum up (please) in 1-2 liner what is the trick to avoid such
regression? that is what/where is the knob that would prevent such
giant chunk to be sent down to a NIC driver which does advertize
SKB_GSO_UDP_TUNNEL?
>>> - MPLS seems to be the only previous user of inner_protocol. I don't
>>> believe these patches can affect that. For supporting GSO with
>>> MPLS over UDP, the inner_protocol should be set using the
>>> helper functions in this patch.
>>> - GSO for L2TP/UDP should also be straightforward now.
>>
>>> Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
>>> done using 200 TCP_STREAMs in netperf.
>> [...]
>>> VXLAN
>>> TCP_STREAM TSO enabled on tun interface
>>> 16.42% TX CPU utilization
>>> 23.66% RX CPU utilization
>>> 9081 Mbps
>>> TCP_STREAM TSO disabled on tun interface
>>> 30.32% TX CPU utilization
>>> 30.55% RX CPU utilization
>>> 9185 Mbps
>>
>> so TSO disabled has better BW vs TSO enabled?
>>
> Yes, I've noticed that on occasion, it does seem like TSO disabled
> tends to get a little more throughput. I see this with plain GRE, so I
> don't think it's directly related to fou or these patches. I suppose
> there may be some subtle interactions with BQL or something like that.
> I'd probably want to repro this on some other devices at some point to
> dig deeper.
>
>>> Baseline (no encp, TSO and LRO enabled)
>>> TCP_STREAM
>>> 11.85% TX CPU utilization
>>> 15.13% RX CPU utilization
>>> 9452 Mbps
>>
>> I would strongly recommend to have a far better baseline when
>> developing and testing these changes in the stack in the form of 40Gbs
>> NICs.
>>
> The only point of putting the baseline was to show that encapsulation
> with GSO/GRO/checksum-unnec-conversion is in the ballpark of
> performance with native traffic which was a goal.
under (over...) 10Gbs, in the ballpark indeed.
We know nothing what would happen with baseline of 38Gbs (SB 40Gbs
NIC) 56Gbs (two bonded ports of 40Gbs NIC on PCIe gen3) or 100Gbs
(tomorrow's NIC HW, probably coming up next year)
> So I'm pretty happy
> with this performance right now, although it probably does mean remote
> checksum offload won't show so impressive results with this test (TX
> csum with data in case isn't so expensive).
> Out of curiosity, why do you think using 40Gbs is far better for a baseline?
Oh, simply b/c with 40Gbs NICs, the baseline I expect for few sessions
(1,2,4 or 200 as you did) of plain TCP is four times better vs. your
current one (38Gbs vs 9.5Gbs) and this should pose a harder challenge
for the GSO/encapsulating stack to catch up with, agree?
Or.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels
2014-09-27 19:26 ` Or Gerlitz
@ 2014-09-29 3:59 ` Tom Herbert
0 siblings, 0 replies; 12+ messages in thread
From: Tom Herbert @ 2014-09-29 3:59 UTC (permalink / raw)
To: Or Gerlitz; +Cc: David Miller, Linux Netdev List
On Sat, Sep 27, 2014 at 12:26 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Sat, Sep 27, 2014 at 2:04 AM, Tom Herbert <therbert@google.com> wrote:
>> On Fri, Sep 26, 2014 at 1:16 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Fri, Sep 26, 2014 at 7:22 PM, Tom Herbert <therbert@google.com> wrote:
>>> [...]
>>>> Notes:
>>>> - GSO for GRE/UDP where GRE checksum is enabled does not work.
>>>> Handling this will require some special case code.
>>>> - Software GSO now supports many varieties of encapsulation with
>>>> SKB_GSO_UDP_TUNNEL{_CSUM}. We still need a mechanism to query
>>>> for device support of particular combinations (I intend to
>>>> add ndo_gso_check for that).
>>>
>>> Tom,
>>>
>>> As I wrote you earlier on another thread/s, fact is that there are
>>> upstream drivers who advertize SKB_GSO_UDP_TUNNEL and aren't capable @
>>> this point to issue proper HW segmentation of something which isn't
>>> VXLAN.
>>>
>>> Just to make sure, this series isn't expected to introduce a
>>> regression, right? we don't expect the stack to attempt and xmit a
>>> large 64KB UDP packet which isn't vxlan through these devices.
>
>> I am planning to post ndo_gso_check shortly. These patches should not
>> cause a regression with currently deployed functionality (VXLAN).
>
> Can you sum up (please) in 1-2 liner what is the trick to avoid such
> regression? that is what/where is the knob that would prevent such
> giant chunk to be sent down to a NIC driver which does advertize
> SKB_GSO_UDP_TUNNEL?
>
I posted patch for ndo_gso_check. Please let me know if you'll be able
to work with this. I'll also post the iproute changes soon so that the
FOU results can be repro'd.
>
>>>> - MPLS seems to be the only previous user of inner_protocol. I don't
>>>> believe these patches can affect that. For supporting GSO with
>>>> MPLS over UDP, the inner_protocol should be set using the
>>>> helper functions in this patch.
>>>> - GSO for L2TP/UDP should also be straightforward now.
>>>
>>>> Tested GRE, IPIP, and SIT over fou as well as VLXAN. This was
>>>> done using 200 TCP_STREAMs in netperf.
>>> [...]
>>>> VXLAN
>>>> TCP_STREAM TSO enabled on tun interface
>>>> 16.42% TX CPU utilization
>>>> 23.66% RX CPU utilization
>>>> 9081 Mbps
>>>> TCP_STREAM TSO disabled on tun interface
>>>> 30.32% TX CPU utilization
>>>> 30.55% RX CPU utilization
>>>> 9185 Mbps
>>>
>>> so TSO disabled has better BW vs TSO enabled?
>>>
>> Yes, I've noticed that on occasion, it does seem like TSO disabled
>> tends to get a little more throughput. I see this with plain GRE, so I
>> don't think it's directly related to fou or these patches. I suppose
>> there may be some subtle interactions with BQL or something like that.
>> I'd probably want to repro this on some other devices at some point to
>> dig deeper.
>>
>>>> Baseline (no encp, TSO and LRO enabled)
>>>> TCP_STREAM
>>>> 11.85% TX CPU utilization
>>>> 15.13% RX CPU utilization
>>>> 9452 Mbps
>>>
>>> I would strongly recommend to have a far better baseline when
>>> developing and testing these changes in the stack in the form of 40Gbs
>>> NICs.
>>>
>> The only point of putting the baseline was to show that encapsulation
>> with GSO/GRO/checksum-unnec-conversion is in the ballpark of
>> performance with native traffic which was a goal.
>
> under (over...) 10Gbs, in the ballpark indeed.
>
> We know nothing what would happen with baseline of 38Gbs (SB 40Gbs
> NIC) 56Gbs (two bonded ports of 40Gbs NIC on PCIe gen3) or 100Gbs
> (tomorrow's NIC HW, probably coming up next year)
>
>> So I'm pretty happy
>> with this performance right now, although it probably does mean remote
>> checksum offload won't show so impressive results with this test (TX
>> csum with data in case isn't so expensive).
>> Out of curiosity, why do you think using 40Gbs is far better for a baseline?
>
> Oh, simply b/c with 40Gbs NICs, the baseline I expect for few sessions
> (1,2,4 or 200 as you did) of plain TCP is four times better vs. your
> current one (38Gbs vs 9.5Gbs) and this should pose a harder challenge
> for the GSO/encapsulating stack to catch up with, agree?
>
Sure, I agree that it would be nice to have this tested on different
devices (40G, 1G, wireless, etc.)-- but right now I don't see anything
particularly obvious why performance shouldn't scale linearly.
> Or.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
` (5 preceding siblings ...)
2014-09-26 20:16 ` [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Or Gerlitz
@ 2014-09-29 20:43 ` David Miller
6 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2014-09-29 20:43 UTC (permalink / raw)
To: therbert; +Cc: netdev
From: Tom Herbert <therbert@google.com>
Date: Fri, 26 Sep 2014 09:22:29 -0700
> This patch set generalizes the UDP tunnel segmentation functions so
> that they can work with various protocol encapsulations. The primary
> change is to set the inner_protocol field in the skbuff when creating
> the encapsulated packet, and then in skb_udp_tunnel_segment this data
> is used to determine the function for segmenting the encapsulated
> packet. The inner_protocol field is overloaded to take either an
> Ethertype or IP protocol.
Tom, this series needs to be respun due to Eric Dumazet's sk_buff flags
rework in net-next.
Thanks.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 4/5] gre: Set inner protocol in v4 and v6 GRE transmit
2014-09-26 16:22 ` [PATCH net-next 4/5] gre: Set inner protocol in v4 and v6 GRE transmit Tom Herbert
@ 2014-09-30 5:02 ` Simon Horman
0 siblings, 0 replies; 12+ messages in thread
From: Simon Horman @ 2014-09-30 5:02 UTC (permalink / raw)
To: Tom Herbert; +Cc: davem, netdev
On Fri, Sep 26, 2014 at 09:22:33AM -0700, Tom Herbert wrote:
> Call skb_set_inner_protocol to set inner Ethernet protocol to
> protocol being encapsulation by GRE before tunnel_xmit. This is
> needed for GSO if UDP encapsulation (fou) is being done.
>
> Signed-off-by: Tom Herbert <therbert@google.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
> ---
> net/ipv4/ip_gre.c | 2 ++
> net/ipv6/ip6_gre.c | 8 ++++++--
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 829aff8b..0485ef1 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -241,6 +241,8 @@ static void __gre_xmit(struct sk_buff *skb, struct net_device *dev,
> /* Push GRE header. */
> gre_build_header(skb, &tpi, tunnel->tun_hlen);
>
> + skb_set_inner_protocol(skb, tpi.proto);
> +
> ip_tunnel_xmit(skb, dev, tnl_params, tnl_params->protocol);
> }
>
> diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
> index 5f19dfb..9a0a1aa 100644
> --- a/net/ipv6/ip6_gre.c
> +++ b/net/ipv6/ip6_gre.c
> @@ -616,6 +616,7 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
> int err = -1;
> u8 proto;
> struct sk_buff *new_skb;
> + __be16 protocol;
>
> if (dev->type == ARPHRD_ETHER)
> IPCB(skb)->flags = 0;
> @@ -732,8 +733,9 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
> ipv6h->daddr = fl6->daddr;
>
> ((__be16 *)(ipv6h + 1))[0] = tunnel->parms.o_flags;
> - ((__be16 *)(ipv6h + 1))[1] = (dev->type == ARPHRD_ETHER) ?
> - htons(ETH_P_TEB) : skb->protocol;
> + protocol = (dev->type == ARPHRD_ETHER) ?
> + htons(ETH_P_TEB) : skb->protocol;
> + ((__be16 *)(ipv6h + 1))[1] = protocol;
>
> if (tunnel->parms.o_flags&(GRE_KEY|GRE_CSUM|GRE_SEQ)) {
> __be32 *ptr = (__be32 *)(((u8 *)ipv6h) + tunnel->hlen - 4);
> @@ -754,6 +756,8 @@ static netdev_tx_t ip6gre_xmit2(struct sk_buff *skb,
> }
> }
>
> + skb_set_inner_protocol(skb, protocol);
> +
> ip6tunnel_xmit(skb, dev);
> if (ndst)
> ip6_tnl_dst_store(tunnel, ndst);
> --
> 2.1.0.rc2.206.gedb03e5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-09-30 5:02 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-26 16:22 [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 1/5] udp: Generalize skb_udp_segment Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 2/5] sit: Set inner IP protocol in sit Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 3/5] ipip: Set inner IP protocol in ipip Tom Herbert
2014-09-26 16:22 ` [PATCH net-next 4/5] gre: Set inner protocol in v4 and v6 GRE transmit Tom Herbert
2014-09-30 5:02 ` Simon Horman
2014-09-26 16:22 ` [PATCH net-next 5/5] vxlan: Set inner protocol before transmit Tom Herbert
2014-09-26 20:16 ` [PATCH net-next 0/5] udp: Generalize GSO for UDP tunnels Or Gerlitz
2014-09-26 23:04 ` Tom Herbert
2014-09-27 19:26 ` Or Gerlitz
2014-09-29 3:59 ` Tom Herbert
2014-09-29 20:43 ` David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).