From: Vijay Pandurangan <vijayp@vijayp.ca>
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Vijay Pandurangan <vijayp@vijayp.ca>,
Evan Jones <ej@evanjones.ca>,
Nicolas Dichtel <nicolas.dichtel@6wind.com>,
Phil Sutter <phil@nwl.cc>,
Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH] veth: don’t modify ip_summed; doing so treats packets with bad checksums as good.
Date: Fri, 18 Dec 2015 14:34:59 -0500 [thread overview]
Message-ID: <1450467299-7188-1-git-send-email-vijayp@vijayp.ca> (raw)
In-Reply-To: <CAM_iQpUOKGoNNDF2Nbut8a0q2rGY6pCXBm3NVkBMe=63K7L9bg@mail.gmail.com>
Packets that arrive from real hardware devices have ip_summed ==
CHECKSUM_UNNECESSARY if the hardware verified the checksums, or
CHECKSUM_NONE if the packet is bad or it was unable to verify it. The
current version of veth will replace CHECKSUM_NONE with
CHECKSUM_UNNECESSARY, which causes corrupt packets routed from hardware to
a veth device to be delivered to the application. This caused applications
at Twitter to receive corrupt data when network hardware was corrupting
packets.
We believe this was added as an optimization to skip computing and
verifying checksums for communication between containers. However, locally
generated packets have ip_summed == CHECKSUM_PARTIAL, so the code as
written does nothing for them. As far as we can tell, after removing this
code, these packets are transmitted from one stack to another unmodified
(tcpdump shows invalid checksums on both sides, as expected), and they are
delivered correctly to applications. We didn’t test every possible network
configuration, but we tried a few common ones such as bridging containers,
using NAT between the host and a container, and routing from hardware
devices to containers. We have effectively deployed this in production at
Twitter (by disabling RX checksum offloading on veth devices).
This code dates back to the first version of the driver, commit
<e314dbdc1c0dc6a548ecf> ("[NET]: Virtual ethernet device driver"), so I
suspect this bug occurred mostly because the driver API has evolved
significantly since then. Commit <0b7967503dc97864f283a> ("net/veth: Fix
packet checksumming") (in December 2010) fixed this for packets that get
created locally and sent to hardware devices, by not changing
CHECKSUM_PARTIAL. However, the same issue still occurs for packets coming
in from hardware devices.
Co-authored-by: Evan Jones <ej@evanjones.ca>
Signed-off-by: Evan Jones <ej@evanjones.ca>
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Cc: Phil Sutter <phil@nwl.cc>
Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Vijay Pandurangan <vijayp@vijayp.ca>
---
drivers/net/veth.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0ef4a5a..ba21d07 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -117,12 +117,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
kfree_skb(skb);
goto drop;
}
- /* don't change ip_summed == CHECKSUM_PARTIAL, as that
- * will cause bad checksum on forwarded packets
- */
- if (skb->ip_summed == CHECKSUM_NONE &&
- rcv->features & NETIF_F_RXCSUM)
- skb->ip_summed = CHECKSUM_UNNECESSARY;
if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
--
2.5.0
next prev parent reply other threads:[~2015-12-18 19:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-12-18 17:54 [PATCH] veth: don't modify ip-summed; doing so treats packets with bad checksums as good Vijay Pandurangan
2015-12-18 19:00 ` Cong Wang
2015-12-18 19:34 ` Vijay Pandurangan [this message]
2015-12-19 21:41 ` [PATCH] veth: don’t modify ip_summed; " Cong Wang
2015-12-22 20:16 ` David Miller
[not found] ` <CAKUBDd83j1yL+HR24K+qdQBcN66BZ4U50Dj7H_STRrjHb85-zA@mail.gmail.com>
2015-12-23 17:57 ` Cong Wang
2016-02-11 19:19 ` Vijay Pandurangan
2015-12-18 19:42 ` [PATCH] veth: don't modify ip-summed; " Vijay Pandurangan
2015-12-19 21:01 ` Cong Wang
2015-12-19 21:34 ` Cong Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1450467299-7188-1-git-send-email-vijayp@vijayp.ca \
--to=vijayp@vijayp.ca \
--cc=ej@evanjones.ca \
--cc=linux-kernel@vger.kernel.org \
--cc=makita.toshiaki@lab.ntt.co.jp \
--cc=netdev@vger.kernel.org \
--cc=nicolas.dichtel@6wind.com \
--cc=phil@nwl.cc \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).