From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: [Patch net] net: invert the check of detecting hardware RX checksum fault Date: Thu, 15 Nov 2018 20:52:23 -0800 Message-ID: References: <20181115231602.18328-1-xiyou.wangcong@gmail.com> <20181116015242.wmm6mooubjnum7qv@gondor.apana.org.au> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Cc: Linux Kernel Network Developers , Tom Herbert , Eric Dumazet , Saeed Mahameed To: Cong Wang , Herbert Xu Return-path: Received: from mail-pl1-f173.google.com ([209.85.214.173]:46226 "EHLO mail-pl1-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727272AbeKPPDQ (ORCPT ); Fri, 16 Nov 2018 10:03:16 -0500 Received: by mail-pl1-f173.google.com with SMTP id t13so7576311ply.13 for ; Thu, 15 Nov 2018 20:52:27 -0800 (PST) In-Reply-To: Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 11/15/2018 06:23 PM, Cong Wang wrote: > On Thu, Nov 15, 2018 at 5:52 PM Herbert Xu wrote: >> >> On Thu, Nov 15, 2018 at 03:16:02PM -0800, Cong Wang wrote: >>> The following evidences indicate this check is likely wrong: >>> >>> 1. In the assignment "skb->csum_valid = !sum", sum==0 indicates a valid checksum. >>> >>> 2. __skb_checksum_complete() always returns sum, and TCP packets are dropped >>> only when it returns non-zero. So non-zero indicates a failure. >>> >>> 3. In __skb_checksum_validate_complete(), we have a nearly same check, where >>> zero is considered as success. >>> >>> 4. csum_fold() already does the one’s complement, this indicates 0 should >>> be considered as a successful validation. >>> >>> 5. We have triggered this fault for many times, but InCsumErrors field in >>> /proc/net/snmp remains 0. >>> >>> Base on the above, non-zero should be used as a checksum mismatch. >>> >>> I tested this with mlx5 driver, no warning or InCsumErrors after 1 hour. >>> >>> Fixes: fb286bb2990a ("[NET]: Detect hardware rx checksum faults correctly") >>> Cc: Herbert Xu >>> Cc: Tom Herbert >>> Cc: Eric Dumazet >>> Signed-off-by: Cong Wang >>> --- >>> net/core/datagram.c | 4 ++-- >>> net/core/dev.c | 2 +- >>> 2 files changed, 3 insertions(+), 3 deletions(-) >>> >>> diff --git a/net/core/datagram.c b/net/core/datagram.c >>> index 57f3a6fcfc1e..e542a9a212a7 100644 >>> --- a/net/core/datagram.c >>> +++ b/net/core/datagram.c >>> @@ -733,7 +733,7 @@ __sum16 __skb_checksum_complete_head(struct sk_buff *skb, int len) >>> __sum16 sum; >>> >>> sum = csum_fold(skb_checksum(skb, 0, len, skb->csum)); >>> - if (likely(!sum)) { >>> + if (unlikely(sum)) { >>> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) && >>> !skb->csum_complete_sw) >>> netdev_rx_csum_fault(skb->dev); >> >> Normally if the hardware's partial checksum is valid then we just >> trust it and send the packet along. However, if the partial >> checksum is invalid we don't trust it and we will compute the >> whole checksum manually which is what ends up in sum. > > Not sure if I understand partial checksum here, but it is the > CHECKSUM_COMPLETE case which I am trying to fix, not > CHECKSUM_PARTIAL. > > Or you mean the checksum returned by skb_checksum(), that is, > checksum from skb->data to skb->data+skb->len. > > If neither, I am confused. > >> >> netdev_rx_csum_fault is meant to warn about the situation where >> a packet with a valid checksum (i.e., sum == 0) was given to us >> by the hardware with a partial checksum that was invalid. >> >> So changing it to sum here is wrong. >> > > So, in other word, a checksum *match* is the intended to detect > this HW RX checksum fault? > > What has been changed in between skb_checksum_init() and > tcp_checksum_complete() so that the logic is inverted? > > Looks like I miss something too obvious to understand the logic. :-/ > > > >> Can you give more information as to how you got the warnings with >> mlx5? It sounds like there may be a real bug there because if you >> are getting the warning then it means that a packet with an invalid >> hardware-computed partial checksum passed the manual check and >> was actually valid. This implies that either the hardware or the >> driver is broken. > > Sure, my case is nearly same with Pawel's, except I have no vlan: > https://marc.info/?l=linux-netdev&m=154086647601721&w=2 > > None of us has RXFCS, if you are curious whether Eric's fix works > for us. > > There are also a few other reports with conntrack involved: > https://marc.info/?l=linux-netdev&m=154134983130200&w=2 > https://marc.info/?l=linux-netdev&m=154070099731902&w=2 It is very possible NIC provides an incorrect CHECKSUM_COMPLETE, in the case non zero trailer bytes were added by a buggy switch (or host) Saeed can comment/confirm, but the theory is that the NIC does header analysis and computes a checksum only on the bytes of the IP frame, not including the tail bytes that were added by a switch. You could use trafgen to cook such a frame and confirm the theory. Something like : { 0x00, 0x1a, 0x11, 0xc3, 0x0d, 0x45, # MAC Destination 0x00, 0x12, 0xc0, 0x02, 0xac, 0x5a, # MAC Source const16(0x0800), /* IPv4 Version, IHL, TOS */ 0b01000101, 0, /* IPv4 Total Len */ const16(40), /* IPv4 Ident */ //drnd(2), const16(2), /* IPv4 Flags, Frag Off */ 0b01000000, 0, /* IPv4 TTL */ 64, /* Proto TCP */ 0x06, /* IPv4 Checksum (IP header from, to) */ csumip(14, 33), 7, drnd(3), # Source IP 10,246,7,152, # Dest IP /* TCP Source Port */ drnd(2), /* TCP Dest Port */ const16(80), /* TCP Sequence Number */ drnd(4), /* TCP Ackn. Number */ c32(0), /* TCP Header length + Flags */ const16((0x5 << 12) | 2) /* TCP SYN Flag */ /* Window Size */ const16(16), /* TCP Checksum (offset IP, offset TCP) */ csumtcp(14, 34), 11,22,33,44,55,66, /* PAD */ }