From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: TCP-MD5 checksum failure on x86_64 SMP Date: Thu, 06 May 2010 14:06:26 +0200 Message-ID: <1273147586.2357.63.camel@edumazet-laptop> References: <1272972722.2097.1.camel@achroite.uk.solarflarecom.com> <20100504091215.5a4a51f4@nehalam> <20100504101301.5f4dd9c2@nehalam> <1273085598.2367.233.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Stephen Hemminger , Ben Hutchings , netdev@vger.kernel.org To: Bhaskar Dutta Return-path: Received: from mail-bw0-f219.google.com ([209.85.218.219]:46326 "EHLO mail-bw0-f219.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752156Ab0EFMGb (ORCPT ); Thu, 6 May 2010 08:06:31 -0400 Received: by bwz19 with SMTP id 19so448375bwz.21 for ; Thu, 06 May 2010 05:06:30 -0700 (PDT) In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: Le jeudi 06 mai 2010 =C3=A0 17:25 +0530, Bhaskar Dutta a =C3=A9crit : > I put in the above change and ran some load tests with around 50 > active TCP connections doing MD5. > I could see only 1 bad packet in 30 min (earlier the problem used to > occur instantaneously and repeatedly). >=20 > I think there is another possibility of being preempted when calling > tcp_alloc_md5sig_pool() > this function releases the spinlock when calling __tcp_alloc_md5sig_p= ool(). >=20 > I will run some more tests after changing the tcp_alloc_md5sig_pool > and see if the problem is completely resolved. >=20 This code should be completely rewritten for linux-2.6.35, its very ugl= y and over complex, yet it is not scalable. It could use true percpu data, with no central lock or refcount.