From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: Fw: [Bug 199995] New: Ramdomly sent TCP Reset from Kernel with bonding mode "brodcast" Date: Fri, 8 Jun 2018 14:38:59 -0700 Message-ID: <3cbd2c1f-4e03-1cb1-3731-4ce440778bb8@gmail.com> References: <20180608095954.4a0437e4@xeon-e3> <20180608210403.2moomjshtwszvsso@unicorn.suse.cz> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: Eric Dumazet , netdev@vger.kernel.org To: Michal Kubecek , Stephen Hemminger Return-path: Received: from mail-pl0-f53.google.com ([209.85.160.53]:36629 "EHLO mail-pl0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752889AbeFHVjC (ORCPT ); Fri, 8 Jun 2018 17:39:02 -0400 Received: by mail-pl0-f53.google.com with SMTP id a7-v6so6494578plp.3 for ; Fri, 08 Jun 2018 14:39:01 -0700 (PDT) In-Reply-To: <20180608210403.2moomjshtwszvsso@unicorn.suse.cz> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 06/08/2018 02:04 PM, Michal Kubecek wrote: > On Fri, Jun 08, 2018 at 09:59:54AM -0700, Stephen Hemminger wrote: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=199995 >> >> Bug ID: 199995 >> Summary: Ramdomly sent TCP Reset from Kernel with bonding mode >> "brodcast" >> >> after a dist upgrade from Ubuntu 17.10 (Kernel 4.13.x) to Ubuntu 18.04 (Kernel >> 4.15.0) I suffer from ramdomly generated TCP RST packets sent (presumably) by >> the Kernel >> on a bonding device that uses bonding mode "brodcast" with 2 physical NICs. >> >> With tcpdump/whireshark I can see that the kernel randomly sends TCP-RST >> packets after the SYN/ACK/ACK packet is received (see attached PCAP). >> This only happens if the kernel receives the initial SYN packet on both >> physical NICs (and therefore seeing it twice), before the connection is >> established by sending SYN/ACK. >> It's not happening in 100% of all cases and only, if the system can use two or >> more CPU cores/threads. With only one CPU available to the system, this >> behaviour is not reproducable. > > I have seen similar report earlier from one of our customers running > SLE12 SP2 (kernel 4.4). The problem is that if duplicated SYN packet is > received on both slaves, these two copies can be processed by the > lockless listener simultaneously on different CPUs and each can reply by > SYNACK with different sequence number which results in a reset. > > I tried to think of a way to prevent this race without losing the > performance gain of lockless listener but couldn't come with anything. > Eventually, I managed to persuade the customer that this setup (where > each packet is received twice under normal circumstances) is not what > broadcast mode was designed for (based on the description in > Documentation/networking/bonding.txt). > > However, the lockless listener was introduced in 4.4 so it's not clear > why reporter started encountering this after an upgrade from 4.13 to > 4.15. Yes, I do not buy this at all. If two identical SYN are received by two cpus, we should create one SYN_RECV and send two SYNACK. But it is a bit hard to test this :/ I will take a look, thanks.