From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Wei-Chun Chao <weichunc@plumgrid.com>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 3.12 35/77] ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC
Date: Mon, 13 Jan 2014 16:27:56 -0800 [thread overview]
Message-ID: <20140114002753.502383734@linuxfoundation.org> (raw)
In-Reply-To: <20140114002752.497010554@linuxfoundation.org>
3.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wei-Chun Chao <weichunc@plumgrid.com>
[ Upstream commit 7a7ffbabf99445704be01bff5d7e360da908cf8e ]
VM to VM GSO traffic is broken if it goes through VXLAN or GRE
tunnel and the physical NIC on the host supports hardware VXLAN/GRE
GSO offload (e.g. bnx2x and next-gen mlx4).
Two issues -
(VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
integrity check fails in udp4_ufo_fragment if inner protocol is
TCP. Also gso_segs is calculated incorrectly using skb->len that
includes tunnel header. Fix: robust check should only be applied
to the inner packet.
(VXLAN & GRE) Once GSO header integrity check passes, NULL segs
is returned and the original skb is sent to hardware. However the
tunnel header is already pulled. Fix: tunnel header needs to be
restored so that hardware can perform GSO properly on the original
packet.
Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/netdevice.h | 13 +++++++++++++
net/ipv4/gre_offload.c | 11 +++++++----
net/ipv4/udp.c | 6 +++++-
net/ipv4/udp_offload.c | 37 +++++++++++++++++++------------------
4 files changed, 44 insertions(+), 23 deletions(-)
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2954,6 +2954,19 @@ static inline void netif_set_gso_max_siz
dev->gso_max_size = size;
}
+static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
+ int pulled_hlen, u16 mac_offset,
+ int mac_len)
+{
+ skb->protocol = protocol;
+ skb->encapsulation = 1;
+ skb_push(skb, pulled_hlen);
+ skb_reset_transport_header(skb);
+ skb->mac_header = mac_offset;
+ skb->network_header = skb->mac_header + mac_len;
+ skb->mac_len = mac_len;
+}
+
static inline bool netif_is_bond_master(struct net_device *dev)
{
return dev->flags & IFF_MASTER && dev->priv_flags & IFF_BONDING;
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -28,6 +28,7 @@ static struct sk_buff *gre_gso_segment(s
netdev_features_t enc_features;
int ghl = GRE_HEADER_SECTION;
struct gre_base_hdr *greh;
+ u16 mac_offset = skb->mac_header;
int mac_len = skb->mac_len;
__be16 protocol = skb->protocol;
int tnl_hlen;
@@ -57,13 +58,13 @@ static struct sk_buff *gre_gso_segment(s
} else
csum = false;
+ if (unlikely(!pskb_may_pull(skb, ghl)))
+ goto out;
+
/* setup inner skb. */
skb->protocol = greh->protocol;
skb->encapsulation = 0;
- if (unlikely(!pskb_may_pull(skb, ghl)))
- goto out;
-
__skb_pull(skb, ghl);
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb_inner_network_offset(skb));
@@ -72,8 +73,10 @@ static struct sk_buff *gre_gso_segment(s
/* segment inner packet. */
enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
segs = skb_mac_gso_segment(skb, enc_features);
- if (!segs || IS_ERR(segs))
+ if (!segs || IS_ERR(segs)) {
+ skb_gso_error_unwind(skb, protocol, ghl, mac_offset, mac_len);
goto out;
+ }
skb = segs;
tnl_hlen = skb_tnl_header_len(skb);
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2294,6 +2294,7 @@ struct sk_buff *skb_udp_tunnel_segment(s
netdev_features_t features)
{
struct sk_buff *segs = ERR_PTR(-EINVAL);
+ u16 mac_offset = skb->mac_header;
int mac_len = skb->mac_len;
int tnl_hlen = skb_inner_mac_header(skb) - skb_transport_header(skb);
__be16 protocol = skb->protocol;
@@ -2313,8 +2314,11 @@ struct sk_buff *skb_udp_tunnel_segment(s
/* segment inner packet. */
enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
segs = skb_mac_gso_segment(skb, enc_features);
- if (!segs || IS_ERR(segs))
+ if (!segs || IS_ERR(segs)) {
+ skb_gso_error_unwind(skb, protocol, tnl_hlen, mac_offset,
+ mac_len);
goto out;
+ }
outer_hlen = skb_tnl_header_len(skb);
skb = segs;
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -41,6 +41,14 @@ static struct sk_buff *udp4_ufo_fragment
{
struct sk_buff *segs = ERR_PTR(-EINVAL);
unsigned int mss;
+ int offset;
+ __wsum csum;
+
+ if (skb->encapsulation &&
+ skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL) {
+ segs = skb_udp_tunnel_segment(skb, features);
+ goto out;
+ }
mss = skb_shinfo(skb)->gso_size;
if (unlikely(skb->len <= mss))
@@ -62,27 +70,20 @@ static struct sk_buff *udp4_ufo_fragment
goto out;
}
+ /* Do software UFO. Complete and fill in the UDP checksum as
+ * HW cannot do checksum of UDP packets sent as multiple
+ * IP fragments.
+ */
+ offset = skb_checksum_start_offset(skb);
+ csum = skb_checksum(skb, offset, skb->len - offset, 0);
+ offset += skb->csum_offset;
+ *(__sum16 *)(skb->data + offset) = csum_fold(csum);
+ skb->ip_summed = CHECKSUM_NONE;
+
/* Fragment the skb. IP headers of the fragments are updated in
* inet_gso_segment()
*/
- if (skb->encapsulation && skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL)
- segs = skb_udp_tunnel_segment(skb, features);
- else {
- int offset;
- __wsum csum;
-
- /* Do software UFO. Complete and fill in the UDP checksum as
- * HW cannot do checksum of UDP packets sent as multiple
- * IP fragments.
- */
- offset = skb_checksum_start_offset(skb);
- csum = skb_checksum(skb, offset, skb->len - offset, 0);
- offset += skb->csum_offset;
- *(__sum16 *)(skb->data + offset) = csum_fold(csum);
- skb->ip_summed = CHECKSUM_NONE;
-
- segs = skb_segment(skb, features);
- }
+ segs = skb_segment(skb, features);
out:
return segs;
}
next prev parent reply other threads:[~2014-01-14 0:44 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-14 0:27 [PATCH 3.12 00/77] 3.12.8-stable review Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 01/77] IPv6: Fixed support for blackhole and prohibit routes Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 02/77] net: do not pretend FRAGLIST support Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 03/77] rds: prevent BUG_ON triggered on congestion update to loopback Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 04/77] net: clear local_df when passing skb between namespaces Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 05/77] macvtap: update file current position Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 06/77] tun: " Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 07/77] tun: unbreak truncated packet signalling Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 08/77] macvtap: Do not double-count received packets Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 09/77] macvtap: signal truncated packets Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 10/77] virtio: delete napi structures from netdev before releasing memory Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 11/77] br: fix use of ->rx_handler_data in code executed on non-rx_handler path Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 12/77] packet: fix send path when running with proto == 0 Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 13/77] ipv6: dont count addrconf generated routes against gc limit Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 14/77] net: drop_monitor: fix the value of maxattr Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 15/77] inet: fix NULL pointer Oops in fib(6)_rule_suppress Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 16/77] net: unix: allow set_peek_off to fail Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 17/77] vxlan: release rt when found circular route Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 18/77] tg3: Initialize REG_BASE_ADDR at PCI config offset 120 to 0 Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 19/77] netvsc: dont flush peers notifying work during setting mtu Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 20/77] ipv6: fix illegal mac_header comparison on 32bit Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 21/77] net: unix: allow bind to fail on mutex lock Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 23/77] net: inet_diag: zero out uninitialized idiag_{src,dst} fields Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 24/77] drivers/net/hamradio: Integer overflow in hdlcdrv_ioctl() Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 26/77] net: fec: fix potential use after free Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 27/77] ipv6: always set the new created dsts from in ip6_rt_copy Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 28/77] rds: prevent dereference of a NULL device Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 29/77] arc_emac: fix potential use after free Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 30/77] net: rose: restore old recvmsg behavior Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 31/77] vlan: Fix header ops passthru when doing TX VLAN offload Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 32/77] virtio_net: fix error handling for mergeable buffers Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 33/77] virtio-net: make all RX paths handle errors consistently Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 34/77] virtio_net: dont leak memory or block when too many frags Greg Kroah-Hartman
2014-01-14 0:27 ` Greg Kroah-Hartman [this message]
2014-01-14 0:27 ` [PATCH 3.12 36/77] virtio-net: fix refill races during restore Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 37/77] net: llc: fix use after free in llc_ui_recvmsg Greg Kroah-Hartman
2014-01-14 0:27 ` [PATCH 3.12 38/77] netpoll: Fix missing TXQ unlock and and OOPS Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 39/77] bridge: use spin_lock_bh() in br_multicast_set_hash_max Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 40/77] sfc: Add length checks to efx_xmit_with_hwtstamp() and efx_ptp_is_ptp_tx() Greg Kroah-Hartman
2014-01-14 0:45 ` Ben Hutchings
2014-01-16 10:50 ` Luis Henriques
2014-01-16 19:42 ` David Miller
2014-01-16 20:51 ` Luis Henriques
2014-01-16 21:15 ` Ben Hutchings
2014-01-14 0:28 ` [PATCH 3.12 41/77] sfc: PTP: Moderate log message on event queue overflow Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 42/77] sfc: Rate-limit log message for PTP packets without a matching timestamp event Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 43/77] sfc: Stop/re-start PTP when stopping/starting the datapath Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 44/77] sfc: Maintain current frequency adjustment when applying a time offset Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 45/77] sfc: RX buffer allocation takes prefix size into account in IP header alignment Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 46/77] sfc: Refactor efx_mcdi_poll() by introducing efx_mcdi_poll_once() Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 47/77] sfc: Poll for MCDI completion once before timeout occurs Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 48/77] ARM: fix footbridge clockevent device Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 49/77] ARM: fix "bad mode in ... handler" message for undefined instructions Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 50/77] ARM: 7923/1: mm: fix dcache flush logic for compound high pages Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 51/77] ARM: dts: exynos5250: Fix MDMA0 clock number Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 52/77] ARM: shmobile: kzm9g: Fix coherent DMA mask Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 53/77] ARM: shmobile: armadillo: " Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 54/77] ARM: shmobile: mackerel: " Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 56/77] parisc: Ensure full cache coherency for kmap/kunmap Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 57/77] ahci: add PCI ID for Marvell 88SE9170 SATA controller Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 58/77] clk: clk-divider: fix divisor > 255 bug Greg Kroah-Hartman
2014-01-14 0:28 ` Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 59/77] clk: samsung: exynos4: Correct SRC_MFC register Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 60/77] clk: samsung: exynos5250: Fix ACP gate register offset Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 61/77] clk: samsung: exynos5250: Add MDMA0 clocks Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 62/77] clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 63/77] clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 65/77] mfd: rtsx_pcr: Disable interrupts before cancelling delayed works Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 66/77] ACPI / TPM: fix memory leak when walking ACPI namespace Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 67/77] ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 68/77] mac80211: move "bufferable MMPDU" check to fix AP mode scan Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 69/77] intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 70/77] SCSI: sd: Reduce buffer size for vpd request Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 71/77] netfilter: fix wrong byte order in nf_ct_seqadj_set internal information Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 72/77] netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 73/77] x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 74/77] sched: Fix race on toggling cfs_bandwidth_used Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 75/77] sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 76/77] sched: Fix hrtimer_cancel()/rq->lock deadlock Greg Kroah-Hartman
2014-01-14 0:28 ` [PATCH 3.12 77/77] sched: Guarantee new group-entities always have weight Greg Kroah-Hartman
2014-01-14 3:03 ` [PATCH 3.12 00/77] 3.12.8-stable review Guenter Roeck
2014-01-14 14:42 ` Satoru Takeuchi
2014-01-14 23:12 ` Greg Kroah-Hartman
2014-01-14 19:31 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140114002753.502383734@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=davem@davemloft.net \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=weichunc@plumgrid.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.