netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: Major network performance regression in 3.7
Date: Sun, 6 Jan 2013 16:51:23 +0100	[thread overview]
Message-ID: <20130106155123.GB16031@1wt.eu> (raw)
In-Reply-To: <1357484342.6919.61.camel@edumazet-glaptop>

Hi Eric,

On Sun, Jan 06, 2013 at 06:59:02AM -0800, Eric Dumazet wrote:
> On Sun, 2013-01-06 at 10:24 +0100, Willy Tarreau wrote:
> 
> > It does not change anything to the tests above unfortunately. It did not
> > even stabilize the unstable runs.
> > 
> > I'll check if I can spot the original commit which caused the regression
> > for MTUs that are not n*4096+52.
> 
> Since you don't post your program, I wont be able to help, just by
> guessing what it does...

Oh sorry, I didn't really want to pollute the list with links and configs,
especially during the initial report with various combined issues :-(

The client is my old "inject" tool, available here :

     http://git.1wt.eu/web?p=inject.git

The server is my "httpterm" tool, available here :

     http://git.1wt.eu/web?p=httpterm.git
     Use "-O3 -DENABLE_POLL -DENABLE_EPOLL -DENABLE_SPLICE" for CFLAGS.

I'm starting httpterm this way :
    httpterm -D -L :8000 -P 256
    => it starts a server on port 8000, and sets pipe size to 256 kB. It
       uses SPLICE_F_MORE on output data but removing it did not fix the
       issue one of the early tests.

Then I'm starting inject this way :
    inject -o 1 -u 1 -G 0:8000/?s=1g
    => 1 user, 1 object at a time, and fetch /?s=1g from the loopback.
       The server will then emit 1 GB of data using splice().

It's possible to disable splicing on the server using -dS. The client
"eats" data using recv(MSG_TRUNC) to avoid a useless copy.

> TCP has very low defaults concerning initial window, and it appears you
> set RCVBUF to even smaller values.

Yes, you're right, my bootup scripts still change the default value, though
I increase them to larger values during the tests (except the one where you
saw win 8030 due to the default rmem set to 16060). I've been using this
value in the past with older kernels because it allowed an integer number
of segments to fit into the default window, and offered optimal performance
with large numbers of concurrent connections. Since 2.6, tcp_moderate_rcvbuf
works very well and this is not needed anymore.

Anyway, it does not affect the test here. Good kernels are OK whatever the
default value, and bad kernels are bad whatever the default value too.

Hmmm finally it's this commit again :

   2f53384 tcp: allow splice() to build full TSO packets

I'm saying "again" because we already diagnosed a similar effect several
months ago that was revealed by this patch and we fixed it with the
following  one, though I remember that we weren't completely sure it
would fix everything :

   bad115c tcp: do_tcp_sendpages() must try to push data out on oom conditions

Just out of curiosity, I tried to re-apply the patch above just after the
first one but it did not change anything (after all it changed a symptom
which appeared in different conditions).

Interestingly, this commit (2f53384) significantly improved performance
on spliced data over the loopback (more than 50% in this test). In 3.7,
it seems to have no positive effect anymore. I reverted it using the
following patch and now the problem is fixed (mtu=64k works fine now) :

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e457c7a..61e4517 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -935,7 +935,7 @@ wait_for_memory:
 	}
 
 out:
-	if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
+	if (copied)
 		tcp_push(sk, flags, mss_now, tp->nonagle);
 	return copied;

Regards,
Willy

  reply	other threads:[~2013-01-06 15:51 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-05 21:49 Major network performance regression in 3.7 Willy Tarreau
2013-01-05 23:18 ` Eric Dumazet
2013-01-05 23:29   ` Willy Tarreau
2013-01-06  0:02     ` Eric Dumazet
2013-01-06  0:50       ` Willy Tarreau
2013-01-06  1:21         ` Eric Dumazet
2013-01-06  1:30           ` Willy Tarreau
2013-01-06  1:40             ` Eric Dumazet
2013-01-06  1:51               ` Eric Dumazet
2013-01-06  2:16                 ` Eric Dumazet
2013-01-06  2:18                   ` Willy Tarreau
2013-01-06  2:22                     ` Eric Dumazet
2013-01-06  2:32                       ` Willy Tarreau
2013-01-06  2:44                         ` Eric Dumazet
2013-01-06  2:52                   ` Willy Tarreau
2013-01-06  7:31                     ` [PATCH net-next] net: splice: avoid high order page splitting Eric Dumazet
2013-01-07  5:07                       ` David Miller
2013-01-06  7:35                     ` Major network performance regression in 3.7 Eric Dumazet
2013-01-06  9:24                       ` Willy Tarreau
2013-01-06 10:25                         ` Willy Tarreau
2013-01-06 11:46                           ` Romain Francoise
2013-01-06 11:53                             ` Willy Tarreau
2013-01-06 12:01                           ` Willy Tarreau
2013-01-06 14:59                         ` Eric Dumazet
2013-01-06 15:51                           ` Willy Tarreau [this message]
2013-01-06 16:39                             ` Eric Dumazet
2013-01-06 16:44                               ` Willy Tarreau
2013-01-06 17:10                                 ` Eric Dumazet
2013-01-06 17:35                                   ` Willy Tarreau
2013-01-06 18:39                                     ` Eric Dumazet
2013-01-06 18:43                                       ` Eric Dumazet
2013-01-06 18:51                                         ` Eric Dumazet
2013-01-06 19:00                                           ` Eric Dumazet
2013-01-06 19:34                                             ` Willy Tarreau
2013-01-06 19:39                                               ` Eric Dumazet
2013-01-06 19:53                                                 ` Willy Tarreau
2013-01-07  4:21                                                   ` [PATCH] tcp: fix MSG_SENDPAGE_NOTLAST logic Eric Dumazet
2013-01-07  4:59                                                     ` David Miller
2013-01-06 21:49                                               ` Major network performance regression in 3.7 John Stoffel
2013-01-06 21:52                                                 ` Willy Tarreau
2013-01-06 21:55                                                   ` John Stoffel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130106155123.GB16031@1wt.eu \
    --to=w@1wt.eu \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).