From mboxrd@z Thu Jan 1 00:00:00 1970 From: Willy Tarreau Subject: Re: TCP many-connection regression between 4.7 and 4.13 kernels. Date: Mon, 22 Jan 2018 19:27:37 +0100 Message-ID: <20180122182737.GA18218@1wt.eu> References: <1516644966.3478.10.camel@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Ben Greear , netdev To: Eric Dumazet Return-path: Received: from wtarreau.pck.nerim.net ([62.212.114.60]:40043 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751102AbeAVS1p (ORCPT ); Mon, 22 Jan 2018 13:27:45 -0500 Content-Disposition: inline In-Reply-To: <1516644966.3478.10.camel@gmail.com> Sender: netdev-owner@vger.kernel.org List-ID: Hi Eric, On Mon, Jan 22, 2018 at 10:16:06AM -0800, Eric Dumazet wrote: > On Mon, 2018-01-22 at 09:28 -0800, Ben Greear wrote: > > My test case is to have 6 processes each create 5000 TCP IPv4 connections to each other > > on a system with 16GB RAM and send slow-speed data. This works fine on a 4.7 kernel, but > > will not work at all on a 4.13. The 4.13 first complains about running out of tcp memory, > > but even after forcing those values higher, the max connections we can get is around 15k. > > > > Both kernels have my out-of-tree patches applied, so it is possible it is my fault > > at this point. > > > > Any suggestions as to what this might be caused by, or if it is fixed in more recent kernels? > > > > I will start bisecting in the meantime... > > > > Hi Ben > > Unfortunately I have no idea. > > Are you using loopback flows, or have I misunderstood you ? > > How loopback connections can be slow-speed ? A few quick points : I have not noticed this on 4.9, which we use with pretty satisfying performance (typically around 100k conn/s). However during some recent tests I did around the meltdown fixes on 4.15, I noticed a high connect() or bind() cost to find a local port when running on the loopback, that I didn't have the time to compare to older kernels. However, strace clearly showed that bind() (or connect() if bind was not used) could take as much as 2-3 ms as source ports were filling up. To be clear, it was just a quick observation and anything could be wrong there, including my tests. I'm just saying this in case it matches anything Ben has observed. I can try to get more info if that helps, but it could be a different case. Cheers, Willy