From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rick Jones Subject: Re: [RFC] tcp: use order-3 pages in tcp_sendmsg() Date: Thu, 15 Nov 2012 10:33:18 -0800 Message-ID: <50A5356E.8030901@hp.com> References: <1347868144.26523.71.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Eric Dumazet , netdev To: "Yan, Zheng " Return-path: Received: from g5t0007.atlanta.hp.com ([15.192.0.44]:44894 "EHLO g5t0007.atlanta.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1423636Ab2KOSdU (ORCPT ); Thu, 15 Nov 2012 13:33:20 -0500 In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On 11/14/2012 11:52 PM, Yan, Zheng wrote: > This commit makes one of our test case on core 2 machine drop in > performance by about 60%. The test case runs 2048 instances of > netperf 64k stream test at the same time. I'm impressed that 2048 concurrent netperf TCP_STREAM tests ran to completion in the first place :) > Analysis showed using order-3 pages causes more LLC misses, most new > LLC misses happen when the senders copy data to the socket buffer. > If revert to use single page, the sender side only trigger a few LLC > misses, most LLC misses happen on the receiver size. It means most > pages allocated by the senders are cache hot. But when using order-3 > pages, 2048 * 32k = 64M, 64M is much larger than LLC size. Should > this regression be worried? or our test case is too unpractical? Even before the page change I would have expected the buffers that netperf itself uses would have exceeded the LLC. If you were not using test-specific -s and -S options to set an explicit socket buffer size, I believe that under Linux (most of the time) the default SO_SNDBUF size will be 86KB. Coupled with your statement that the send size was 64K it means the send ring being used by netperf will be 2, 64KB buffers, which would then be 256MB across 2048 concurrent netperfs. Even if we go with "only the one send buffer in play at a time matters" that is still 128 MB of space up in netperf itself even before one gets to the stack. Still, sharing the analysis tool output might be helpful. By the way the "default" size of the buffer netperf posts in recv() calls will depend on the initial value of SO_RCVBUF after the data socket is created and had any -s or -S option values applied to it. I cannot say that the scripts distributed with netperf are consistently good about doing it themselves, but I would suggest for the "canonical" bulk streak test something like: netperf -t TCP_STREAM -H -l 60 -- -s 1M -S 1M -m 64K -M 64K as that will reduce the number of variables. Those -s and -S values though will probably call for tweaking sysctl settings or they will be clipped by net.core.rmem_max and net.core.wmem_max. At a minimum I would suggest having the -m and -M option. I might also tack-on a "-o all" at the end, but that is a matter of preference - it will cause a great deal of output... Eric Dumazet later says: > Number of in flight bytes do not depend on the order of the pages, but > sizes of TCP buffers (receiver, sender) And unless you happened to use explicit -s and -S options, there is even more variability in how much may be inflight. If you do not add those you can at least get netperf to report what the socket buffer sizes became by the end of the test: netperf -t TCP_STREAM ... -- ... -o lss_size_end,rsr_size_end for "local socket send size" and "remote socket receive size" respectively. > If the sender is faster (because of this commit), but receiver is slow > to drain the receive queues, then you can have a situation where the > consumed memory on receiver is higher and the receiver might be actually > slower. Netperf can be told to report the number of receive calls and the bytes per receive - either by tacking-on a global "-v 2" or by requesting them explicitly via omni output selection. Presumably, if the receiving netserver processes are not keeping-up as well, that should manifest as the bytes per receive being larger in the "after" case than the "before" case. netperf ... -- ... -o remote_recv_size,remote_recv_calls,remote_bytes_per_recv happy benchmarking, rick jones