From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: TCP-MD5 checksum failure on x86_64 SMP Date: Wed, 05 May 2010 20:53:18 +0200 Message-ID: <1273085598.2367.233.camel@edumazet-laptop> References: <1272972722.2097.1.camel@achroite.uk.solarflarecom.com> <20100504091215.5a4a51f4@nehalam> <20100504101301.5f4dd9c2@nehalam> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Stephen Hemminger , Ben Hutchings , netdev@vger.kernel.org To: Bhaskar Dutta Return-path: Received: from mail-bw0-f225.google.com ([209.85.218.225]:48221 "EHLO mail-bw0-f225.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755053Ab0EESxY (ORCPT ); Wed, 5 May 2010 14:53:24 -0400 Received: by bwz25 with SMTP id 25so3215866bwz.28 for ; Wed, 05 May 2010 11:53:22 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Le mercredi 05 mai 2010 =C3=A0 23:33 +0530, Bhaskar Dutta a =C3=A9crit = : > Hi, >=20 > TSO, GSO and SG are already turned off. > rx/tx checksumming is on, but that shouldn't matter, right? >=20 > # ethtool -k eth0 > Offload parameters for eth0: > rx-checksumming: on > tx-checksumming: on > scatter-gather: off > tcp segmentation offload: off > udp fragmentation offload: off > generic segmentation offload: off >=20 > The bad packets are very small in size, most have no data at all (<30= 0 bytes). >=20 > After adding some logs to kernel 2.6.31-12, it seems that > tcp_v4_md5_hash_skb (function that calculates the md5 hash) is > (might?) getting corrupt. >=20 > The tcp4_pseudohdr (bp =3D &hp->md5_blk.ip4) structure's saddr, daddr > and len fields get modified to different values towards the end of th= e > tcp_v4_md5_hash_skb function whenever there is a checksum error. >=20 > The tcp4_pseudohdr (bp) is within the tcp_md5sig_pool (hp), which is > filled up by tcp_get_md5sig_pool (which calls per_cpu_ptr). >=20 > Using a local copy of the tcp4_pseudohdr in the same function > tcp_v4_md5_hash_skb (copied all fields from the original > tcp4_pseudohdr within the tcp_md5sig_pool) and calculating the md5 > checksum with the local tcp4_pseudohdr seems to solve the issue > (don't see bad packets for a hours in load tests, and without the > change I can see them instantaneously in the load tests). >=20 > I am still unable to figure out how this is happening. Please let me > know if you have any pointers. I am not familiar with this code, but I suspect same per_cpu data can b= e used at both time by a sender (process context) and by a receiver (softirq context). To trigger this, you need at least two active md5 sockets. tcp_get_md5sig_pool() should probably disable bh to make sure current cpu wont be preempted by softirq processing Something like : diff --git a/include/net/tcp.h b/include/net/tcp.h index fb5c66b..e232123 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1221,12 +1221,15 @@ struct tcp_md5sig_pool *tcp_get_md5sig= _pool(void) struct tcp_md5sig_pool *ret =3D __tcp_get_md5sig_pool(cpu); if (!ret) put_cpu(); + else + local_bh_disable(); return ret; } =20 static inline void tcp_put_md5sig_pool(void) { __tcp_put_md5sig_pool(); + local_bh_enable(); put_cpu(); }