From: Willy Tarreau <w@1wt.eu>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: netdev@vger.kernel.org
Subject: Re: Stable regression with 'tcp: allow splice() to build full TSO packets'
Date: Fri, 18 May 2012 00:24:31 +0200 [thread overview]
Message-ID: <20120517222431.GT14498@1wt.eu> (raw)
In-Reply-To: <1337293351.3403.67.camel@edumazet-glaptop>
On Fri, May 18, 2012 at 12:22:31AM +0200, Eric Dumazet wrote:
> On Fri, 2012-05-18 at 00:16 +0200, Willy Tarreau wrote:
> > On Fri, May 18, 2012 at 12:01:59AM +0200, Eric Dumazet wrote:
> > > On Thu, 2012-05-17 at 23:50 +0200, Eric Dumazet wrote:
> > > > On Thu, 2012-05-17 at 23:40 +0200, Eric Dumazet wrote:
> > > >
> > > > > I dont understand why we should tcp_push() if we sent 0 bytes in this
> > > > > splice() call.
> > > > >
> > > > > The push() should have be done already by prior splice() call, dont you
> > > > > think ?
> > > > >
> > > > > out:
> > > > > if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > > > > tcp_push(sk, flags, mss_now, tp->nonagle);
> > > > >
> > > >
> > > > I think I now understand
> > > >
> > > > One splice() syscall actually calls do_tcp_sendpages() several times
> > > > (N).
> > > >
> > > > Problem is N-1 calls are done with MSG_SENDPAGE_NOTLAST set
> > > >
> > > > And last one with MSG_SENDPAGE_NOTLAST unset
> > > >
> > > > So maybe we should replace this test by :
> > > >
> > > > if (!(flags & MSG_SENDPAGE_NOTLAST))
> > > > tcp_push(...);
> > > >
> > > >
> > >
> > > Its more tricky than that.
> > >
> > > If we return 0 from do_tcp_sendpages(), __splice_from_pipe() wont call
> > > us again, but we need to flush(). (This bug is indeed very old)
> > >
> > > So maybe use :
> > >
> > > if ((copied && !(flags & MSG_SENDPAGE_NOTLAST)) ||
> > > !copied)
> > > tcp_push(...);
> >
> > That's what I wanted to do at first but was still worried about the
> > situations where we fail upon first call due to lack of memory. And
> > indeed that did not fix the issue either :-(
> >
> > At least now I've checked that we fail here in do_tcp_sendpages() :
> >
> > if (!sk_stream_memory_free(sk)) {
> > printk(KERN_DEBUG "%s:%d copied=%d\n", __FUNCTION__, __LINE__, copied);
> > goto wait_for_sndbuf;
> > }
> >
> > Well, finally I just put the same test as yours above for the call to
> > tcp_push() upon out of memory and it fixed it too. I have simplified
> > the expression, since ((A && !B) || !A) == !(A & B).
> >
> > With the updated patch here it works for me. I don't know if it's
> > better.
> >
> > Willy
> >
> > -----
> >
> > From 1f26263bd4f8827bbca4cef77a99fac128cf8ccc Mon Sep 17 00:00:00 2001
> > From: Willy Tarreau <w@1wt.eu>
> > Date: Thu, 17 May 2012 22:43:20 +0200
> > Subject: [PATCH] tcp: do_tcp_sendpages() must try to push data out on oom conditions
> >
> > Since recent changes on TCP splicing (starting with commits 2f533844
> > and 35f9c09f), I started seeing massive stalls when forwarding traffic
> > between two sockets using splice() when pipe buffers were larger than
> > socket buffers.
> >
> > Latest changes (net: netdev_alloc_skb() use build_skb()) made the
> > problem even more apparent.
> >
> > The reason seems to be that if do_tcp_sendpages() fails on out of memory
> > condition without being able to send at least one byte, tcp_push() is not
> > called and the buffers cannot be flushed.
> >
> > After applying the attached patch, I cannot reproduce the stalls at all
> > and the data rate it perfectly stable and steady under any condition
> > which previously caused the problem to be permanent.
> >
> > The issue seems to have been there since before the kernel migrated to
> > git, which makes me think that the stalls I occasionally experienced
> > with tux during stress-tests years ago were probably related to the
> > same issue.
> >
> > This issue was first encountered on 3.0.31 and 3.2.17, so please backport
> > to -stable.
> >
> > Signed-off-by: Willy Tarreau <w@1wt.eu>
> > Cc: <stable@vger.kernel.org>
> > ---
> > net/ipv4/tcp.c | 4 ++--
> > 1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 63ddaee..cfe47c1 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -917,7 +917,7 @@ new_segment:
> > wait_for_sndbuf:
> > set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
> > wait_for_memory:
> > - if (copied)
> > + if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
> > tcp_push(sk, flags & ~MSG_MORE, mss_now, TCP_NAGLE_PUSH);
> >
> > if ((err = sk_stream_wait_memory(sk, &timeo)) != 0)
> > @@ -927,7 +927,7 @@ wait_for_memory:
> > }
> >
> > out:
> > - if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
> > + if (!(copied && (flags & MSG_SENDPAGE_NOTLAST)))
> > tcp_push(sk, flags, mss_now, tp->nonagle);
> > return copied;
> >
>
>
> Hmm... I believe I prefer your prior patch ( the one I Acked)
I too, less changes are better :-)
Thanks,
Willy
next prev parent reply other threads:[~2012-05-17 22:24 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-17 12:18 Stable regression with 'tcp: allow splice() to build full TSO packets' Willy Tarreau
2012-05-17 15:01 ` Willy Tarreau
2012-05-17 15:43 ` Eric Dumazet
2012-05-17 15:56 ` Willy Tarreau
2012-05-17 16:33 ` Eric Dumazet
2012-05-17 16:40 ` Willy Tarreau
2012-05-17 16:47 ` Eric Dumazet
2012-05-17 16:49 ` Eric Dumazet
2012-05-17 17:22 ` Willy Tarreau
2012-05-17 17:34 ` [PATCH net-next] net: netdev_alloc_skb() use build_skb() Eric Dumazet
2012-05-17 17:45 ` Willy Tarreau
2012-06-04 12:39 ` Michael S. Tsirkin
2012-06-04 12:44 ` Willy Tarreau
2012-05-17 19:53 ` David Miller
2012-05-18 4:41 ` Eric Dumazet
2012-06-04 12:37 ` Michael S. Tsirkin
2012-06-04 13:06 ` Eric Dumazet
2012-06-04 13:41 ` Michael S. Tsirkin
2012-06-04 14:01 ` Eric Dumazet
2012-06-04 14:09 ` Eric Dumazet
2012-06-04 14:17 ` Michael S. Tsirkin
2012-06-04 15:01 ` Eric Dumazet
2012-06-04 17:20 ` Michael S. Tsirkin
2012-06-04 17:44 ` Eric Dumazet
2012-06-04 18:16 ` Michael S. Tsirkin
2012-06-04 19:24 ` Eric Dumazet
2012-06-04 19:48 ` Michael S. Tsirkin
2012-06-04 19:56 ` Eric Dumazet
2012-06-04 21:20 ` Michael S. Tsirkin
2012-06-05 2:50 ` Eric Dumazet
2012-06-04 18:16 ` Michael S. Tsirkin
2012-06-04 19:29 ` Eric Dumazet
2012-06-04 19:43 ` Michael S. Tsirkin
2012-06-04 19:52 ` Eric Dumazet
2012-06-04 21:54 ` Michael S. Tsirkin
2012-06-05 2:46 ` Eric Dumazet
2012-06-04 19:56 ` Michael S. Tsirkin
2012-06-04 20:05 ` Eric Dumazet
2012-05-17 18:38 ` Stable regression with 'tcp: allow splice() to build full TSO packets' Ben Hutchings
2012-05-17 19:55 ` David Miller
2012-05-17 20:04 ` Willy Tarreau
2012-05-17 20:07 ` David Miller
2012-05-17 20:41 ` Eric Dumazet
2012-05-17 21:14 ` Willy Tarreau
2012-05-17 21:40 ` Eric Dumazet
2012-05-17 21:50 ` Eric Dumazet
2012-05-17 21:57 ` Willy Tarreau
2012-05-17 22:01 ` Eric Dumazet
2012-05-17 22:10 ` Eric Dumazet
2012-05-17 22:16 ` Willy Tarreau
2012-05-17 22:22 ` Eric Dumazet
2012-05-17 22:24 ` Willy Tarreau [this message]
2012-05-17 22:25 ` David Miller
2012-05-17 22:30 ` Willy Tarreau
2012-05-17 22:35 ` David Miller
2012-05-17 22:49 ` Willy Tarreau
2012-05-17 22:27 ` Joe Perches
2012-05-17 21:54 ` Willy Tarreau
2012-05-17 21:47 ` Willy Tarreau
2012-05-17 22:14 ` Eric Dumazet
2012-05-17 22:29 ` Willy Tarreau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120517222431.GT14498@1wt.eu \
--to=w@1wt.eu \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).