From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Baron Subject: Re: [PATCH net-next v2] tcp: reduce cpu usage under tcp memory pressure when SO_SNDBUF is set Date: Fri, 21 Aug 2015 16:55:30 -0400 Message-ID: <55D79042.1050706@akamai.com> References: <20150811143846.672A92039@prod-mail-relay10.akamai.com> <1439304576.1084.24.camel@edumazet-glaptop2.roam.corp.google.com> <55CA0EC2.9030306@akamai.com> <1439309530.1084.31.camel@edumazet-glaptop2.roam.corp.google.com> <55CA37F5.8090108@akamai.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: davem@davemloft.net, netdev@vger.kernel.org To: Jason Baron , Eric Dumazet Return-path: Received: from a23-79-238-179.deploy.static.akamaitechnologies.com ([23.79.238.179]:57245 "EHLO prod-mail-xrelay05.akamai.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1752217AbbHUUzl (ORCPT ); Fri, 21 Aug 2015 16:55:41 -0400 In-Reply-To: <55CA37F5.8090108@akamai.com> Sender: netdev-owner@vger.kernel.org List-ID: On 08/11/2015 01:59 PM, Jason Baron wrote: > > > On 08/11/2015 12:12 PM, Eric Dumazet wrote: >> On Tue, 2015-08-11 at 11:03 -0400, Jason Baron wrote: >> >>> >>> Yes, so the test case I'm using to test against is somewhat contrived. >>> In that I am simply allocating around 40,000 sockets that are idle to >>> create a 'permanent' memory pressure in the background. Then, I have >>> just 1 flow that sets SO_SNDBUF, which results in the: poll(), write() loop. >>> >>> That said, we encountered this issue initially where we had 10,000+ >>> flows and whenever the system would get into memory pressure, we would >>> see all the cpus spin at 100%. >>> >>> So the testcase I wrote, was just a simplistic version for testing. But >>> I am going to try and test against the more realistic workload where >>> this issue was initially observed. >>> >> >> Note that I am still trying to understand why we need to increase socket >> structure, for something which is inherently a problem of sharing memory >> with an unknown (potentially big) number of sockets. >> > > I was trying to mirror the wakeups when SO_SNDBUF is not set, where we > continue to trigger on 1/3 of the buffer being available, as the > sk->sndbuf is shrunk. And I saw this value as dynamic depending on > number of sockets and read/write buffer usage. So that's where I was > coming from with it. > > Also, at least with the .config I have the tcp_sock structure didn't > increase in size (although struct sock did go up by 8 and not 4). > >> I suggested to use a flag (one bit). >> >> If set, then we should fallback to tcp_wmem[0] (each socket has 4096 >> bytes, so that we can avoid starvation) >> >> >> > > Ok, I will test this approach. Hi Eric, So I created a test here with 20,000 streams, and if I set SO_SNDBUF high enough on the server side, I can create tcp memory pressure above tcp_mem[2]. In this case, with the 'one bit' approach using tcp_wmem[0] as the wakeup threshold I can still observe the 100% cpu spinning issue, but with this v2 patch, cpu usage is minimal (1-2%). Since, we don't guarantee tcp_wmem[0], above tcp_mem[2]. So using the 'one bit' definitely alleviates the spinning between tcp_mem[1] and tcp_mem[2], but not above tcp_mem[2] in my testing. Maybe nobody cares about this case (you are getting what you ask for by using SO_SNDBUF), but it seems to me that it would be nice to avoid this sort of behavior. I also like the fact that with the sk_effective_sndbuf, we keep doing wakeups on 1/3 of the write buffer emptying, which keeps the wakeup behavior consistent. In theory this would matter for high latency and bandwidth link, but in the testing I did, I didn't observe any throughput differences between this v2 patch, and the 'one bit' approach. As I mentioned with this v2, the 'struct sock' grows by 4 bytes, but struct tcp_sock does not increase. So since this is tcp specific, we could add the sk_effective_sndbuf only to the struct tcp_sock. So the 'one bit' approach definitely seems to me to be an improvement, but I wanted to get feedback on this testing, before deciding how to proceed. Thanks, -Jason