From: Eric Dumazet <eric.dumazet@gmail.com>
To: Willy Tarreau <w@1wt.eu>
Cc: netdev@vger.kernel.org
Subject: Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
Date: Thu, 17 May 2012 22:41:19 +0200 [thread overview]
Message-ID: <1337287279.3403.44.camel@edumazet-glaptop> (raw)
In-Reply-To: <20120517121800.GA18052@1wt.eu>
On Thu, 2012-05-17 at 14:18 +0200, Willy Tarreau wrote:
> Hi Eric,
>
> I'm facing a regression in stable 3.2.17 and 3.0.31 which is
> exhibited by your patch 'tcp: allow splice() to build full TSO
> packets' which unfortunately I am very interested in !
>
> What I'm observing is that TCP transmits using splice() stall
> quite quickly if I'm using pipes larger than 64kB (even 65537
> is enough to reliably observe the stall).
>
> I'm seeing this on haproxy running on a small ARM machine (a
> dockstar), which exchanges data through a gig switch with my
> development PC. The NIC (mv643xx) doesn't support TSO but has
> GSO enabled. If I disable GSO, the problem remains. I can however
> make the problem disappear by disabling SG or Tx checksumming.
> BTW, using recv/send() instead of splice() also gets rid of the
> problem.
>
> I can also reduce the risk of seeing the problem by increasing
> the default TCP buffer sizes in tcp_wmem. By default I'm running
> at 16kB, but if I increase the output buffer size above the pipe
> size, the problem *seems* to disappear though I can't be certain,
> since larger buffers generally means the problem takes longer to
> appear, probably due to the fact that the buffers don't need to
> be filled. Still I'm certain that with 64k TCP buffers and 128k
> pipes I'm still seeing it.
>
> With strace, I'm seeing data fill up the pipe with the splice()
> call responsible for pushing the data to the output socket returing
> -1 EAGAIN. During this time, the client receives no data.
>
> Something bugs me, I have tested with a dummy server of mine,
> httpterm, which uses tee+splice() to push data outside, and it
> has no problem filling the gig pipe, and correctly recoverers
> from the EAGAIN :
>
> send(13, "HTTP/1.1 200\r\nConnection: close\r"..., 160, MSG_DONTWAIT|MSG_NOSIGNAL) = 160
> tee(0x3, 0x6, 0x10000, 0x2) = 42552
> splice(0x5, 0, 0xd, 0, 0xa00000, 0x2) = 14440
> tee(0x3, 0x6, 0x10000, 0x2) = 13880
> splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2) = -1 EAGAIN (Resource temporarily unavailable)
> ...
> tee(0x3, 0x6, 0x10000, 0x2) = 13880
> splice(0x5, 0, 0xd, 0, 0x9fc798, 0x2) = 51100
> tee(0x3, 0x6, 0x10000, 0x2) = 50744
> splice(0x5, 0, 0xd, 0, 0x9efffc, 0x2) = 32120
> tee(0x3, 0x6, 0x10000, 0x2) = 30264
> splice(0x5, 0, 0xd, 0, 0x9e8284, 0x2) = -1 EAGAIN (Resource temporarily unavailable)
>
> etc...
>
> It's only with haproxy which uses splice() to copy data between
> two sockets that I'm getting the issue (data forwarded from fd 0xe
> to fd 0x6) :
>
> 16:03:17.797144 pipe([36, 37]) = 0
> 16:03:17.797318 fcntl64(36, 0x407 /* F_??? */, 0x20000) = 131072 ## note: fcntl(F_SETPIPE_SZ, 128k)
> 16:03:17.797473 splice(0xe, 0, 0x25, 0, 0x9f2234, 0x3) = 10220
> 16:03:17.797706 splice(0x24, 0, 0x6, 0, 0x27ec, 0x3) = 10220
> 16:03:17.802036 gettimeofday({1324652597, 802093}, NULL) = 0
> 16:03:17.802200 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 7
> 16:03:17.802363 gettimeofday({1324652597, 802419}, NULL) = 0
> 16:03:17.802530 splice(0xe, 0, 0x25, 0, 0x9efa48, 0x3) = 16060
> 16:03:17.802789 splice(0x24, 0, 0x6, 0, 0x3ebc, 0x3) = 16060
> 16:03:17.806593 gettimeofday({1324652597, 806651}, NULL) = 0
> 16:03:17.806759 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 4
> 16:03:17.806919 gettimeofday({1324652597, 806974}, NULL) = 0
> 16:03:17.807087 splice(0xe, 0, 0x25, 0, 0x9ebb8c, 0x3) = 17520
> 16:03:17.807356 splice(0x24, 0, 0x6, 0, 0x4470, 0x3) = 17520
> 16:03:17.809565 gettimeofday({1324652597, 809620}, NULL) = 0
> 16:03:17.809726 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
> 16:03:17.809883 gettimeofday({1324652597, 809937}, NULL) = 0
> 16:03:17.810047 splice(0xe, 0, 0x25, 0, 0x9e771c, 0x3) = 36500
> 16:03:17.810399 splice(0x24, 0, 0x6, 0, 0x8e94, 0x3) = 23360
> 16:03:17.810629 epoll_ctl(0x3, 0x1, 0x6, 0x85378) = 0 ## note: epoll_ctl(ADD, fd=6, dir=OUT).
> 16:03:17.810792 gettimeofday({1324652597, 810848}, NULL) = 0
> 16:03:17.810954 epoll_wait(0x3, 0x99250, 0x16, 0x3e8) = 1
> 16:03:17.811188 gettimeofday({1324652597, 811246}, NULL) = 0
> 16:03:17.811356 splice(0xe, 0, 0x25, 0, 0x9de888, 0x3) = 21900
> 16:03:17.811651 splice(0x24, 0, 0x6, 0, 0x88e0, 0x3) = -1 EAGAIN (Resource temporarily unavailable)
>
Willy you say output to fd 6 hangs, but splice() returns EAGAIN here ?
(because socket buffer is full)
> So output fd 6 hangs here and will not appear anymore until
> here where I pressed Ctrl-C to stop the test :
>
I just want to make sure its not a userland error that triggers now much
faster than with prior kernels.
You drain bytes from fd 0xe to pipe buffers, but I dont see you check
write ability on destination socket prior the splice(pipe -> socket)
next prev parent reply other threads:[~2012-05-17 20:41 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-17 12:18 Stable regression with 'tcp: allow splice() to build full TSO packets' Willy Tarreau
2012-05-17 15:01 ` Willy Tarreau
2012-05-17 15:43 ` Eric Dumazet
2012-05-17 15:56 ` Willy Tarreau
2012-05-17 16:33 ` Eric Dumazet
2012-05-17 16:40 ` Willy Tarreau
2012-05-17 16:47 ` Eric Dumazet
2012-05-17 16:49 ` Eric Dumazet
2012-05-17 17:22 ` Willy Tarreau
2012-05-17 17:34 ` [PATCH net-next] net: netdev_alloc_skb() use build_skb() Eric Dumazet
2012-05-17 17:45 ` Willy Tarreau
2012-06-04 12:39 ` Michael S. Tsirkin
2012-06-04 12:44 ` Willy Tarreau
2012-05-17 19:53 ` David Miller
2012-05-18 4:41 ` Eric Dumazet
2012-06-04 12:37 ` Michael S. Tsirkin
2012-06-04 13:06 ` Eric Dumazet
2012-06-04 13:41 ` Michael S. Tsirkin
2012-06-04 14:01 ` Eric Dumazet
2012-06-04 14:09 ` Eric Dumazet
2012-06-04 14:17 ` Michael S. Tsirkin
2012-06-04 15:01 ` Eric Dumazet
2012-06-04 17:20 ` Michael S. Tsirkin
2012-06-04 17:44 ` Eric Dumazet
2012-06-04 18:16 ` Michael S. Tsirkin
2012-06-04 19:24 ` Eric Dumazet
2012-06-04 19:48 ` Michael S. Tsirkin
2012-06-04 19:56 ` Eric Dumazet
2012-06-04 21:20 ` Michael S. Tsirkin
2012-06-05 2:50 ` Eric Dumazet
2012-06-04 18:16 ` Michael S. Tsirkin
2012-06-04 19:29 ` Eric Dumazet
2012-06-04 19:43 ` Michael S. Tsirkin
2012-06-04 19:52 ` Eric Dumazet
2012-06-04 21:54 ` Michael S. Tsirkin
2012-06-05 2:46 ` Eric Dumazet
2012-06-04 19:56 ` Michael S. Tsirkin
2012-06-04 20:05 ` Eric Dumazet
2012-05-17 18:38 ` Stable regression with 'tcp: allow splice() to build full TSO packets' Ben Hutchings
2012-05-17 19:55 ` David Miller
2012-05-17 20:04 ` Willy Tarreau
2012-05-17 20:07 ` David Miller
2012-05-17 20:41 ` Eric Dumazet [this message]
2012-05-17 21:14 ` Willy Tarreau
2012-05-17 21:40 ` Eric Dumazet
2012-05-17 21:50 ` Eric Dumazet
2012-05-17 21:57 ` Willy Tarreau
2012-05-17 22:01 ` Eric Dumazet
2012-05-17 22:10 ` Eric Dumazet
2012-05-17 22:16 ` Willy Tarreau
2012-05-17 22:22 ` Eric Dumazet
2012-05-17 22:24 ` Willy Tarreau
2012-05-17 22:25 ` David Miller
2012-05-17 22:30 ` Willy Tarreau
2012-05-17 22:35 ` David Miller
2012-05-17 22:49 ` Willy Tarreau
2012-05-17 22:27 ` Joe Perches
2012-05-17 21:54 ` Willy Tarreau
2012-05-17 21:47 ` Willy Tarreau
2012-05-17 22:14 ` Eric Dumazet
2012-05-17 22:29 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1337287279.3403.44.camel@edumazet-glaptop \
--to=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=w@1wt.eu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox