From: Ingo Molnar <mingo@elte.hu>
To: David Miller <davem@davemloft.net>
Cc: kuznet@ms2.inr.ac.ru, vgusev@openvz.org, mcmanus@ducksong.com,
xemul@openvz.org, netdev@vger.kernel.org,
ilpo.jarvinen@helsinki.fi, linux-kernel@vger.kernel.org
Subject: Re: [TCP]: TCP_DEFER_ACCEPT causes leak sockets
Date: Fri, 13 Jun 2008 08:30:37 +0200 [thread overview]
Message-ID: <20080613063037.GA16943@elte.hu> (raw)
In-Reply-To: <20080612.163212.148965080.davem@davemloft.net>
* David Miller <davem@davemloft.net> wrote:
> From: David Miller <davem@davemloft.net>
> Date: Wed, 11 Jun 2008 16:52:55 -0700 (PDT)
>
> > More and more, the arguments are mounting to completely revert the
> > established code path changes, and frankly that is likely what I am
> > going to do by the end of today.
>
> Here is the revert patch I intend to send to Linus:
>
> tcp: Revert 'process defer accept as established' changes.
>
> This reverts two changesets, ec3c0982a2dd1e671bad8e9d26c28dcba0039d87
> ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and
> the follow-on bug fix 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38
> ("tcp: Fix slab corruption with ipv6 and tcp6fuzz").
>
> This change causes several problems, first reported by Ingo Molnar
> as a distcc-over-loopback regression where connections were getting
> stuck.
>
> Ilpo Järvinen first spotted the locking problems. The new function
> added by this code, tcp_defer_accept_check(), only has the
> child socket locked, yet it is modifying state of the parent
> listening socket.
>
> Fixing that is non-trivial at best, because we can't simply just grab
> the parent listening socket lock at this point, because it would
> create an ABBA deadlock. The normal ordering is parent listening
> socket --> child socket, but this code path would require the
> reverse lock ordering.
>
> Next is a problem noticed by Vitaliy Gusev, he noted:
>
> ----------------------------------------
> >--- a/net/ipv4/tcp_timer.c
> >+++ b/net/ipv4/tcp_timer.c
> >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data)
> > goto death;
> > }
> >
> >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) {
> >+ tcp_send_active_reset(sk, GFP_ATOMIC);
> >+ goto death;
>
> Here socket sk is not attached to listening socket's request queue. tcp_done()
> will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should
> release this sk) as socket is not DEAD. Therefore socket sk will be lost for
> freeing.
> ----------------------------------------
>
> Finally, Alexey Kuznetsov argues that there might not even be any
> real value or advantage to these new semantics even if we fix all
> of the bugs:
>
> ----------------------------------------
> Hiding from accept() sockets with only out-of-order data only
> is the only thing which is impossible with old approach. Is this really
> so valuable? My opinion: no, this is nothing but a new loophole
> to consume memory without control.
> ----------------------------------------
>
> So revert this thing for now.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
the 3 reverts have been extensively tested in -tip via:
# tip/out-of-tree: 9e5b6ca: tcp: revert DEFER_ACCEPT modifications
and the distcc problems are fixed. (The locking fix alone did not fix it
conclusively in my testing, possibly due to the follow-on observations
outlined in your description.)
Tested-by: Ingo Molnar <mingo@elte.hu>
Ingo
next prev parent reply other threads:[~2008-06-13 6:30 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-11 12:58 [TCP]: TCP_DEFER_ACCEPT causes leak sockets Vitaliy Gusev
2008-06-11 13:57 ` Alexey Kuznetsov
2008-06-11 23:52 ` David Miller
2008-06-12 23:32 ` David Miller
2008-06-13 6:30 ` Ingo Molnar [this message]
2008-06-13 9:32 ` David Miller
2008-06-13 11:09 ` Ingo Molnar
2008-06-13 11:47 ` Ingo Molnar
2008-06-13 21:10 ` Ingo Molnar
2008-06-16 23:59 ` David Miller
2008-06-17 7:26 ` Ingo Molnar
2008-06-17 7:38 ` David Miller
2008-06-17 8:09 ` Ingo Molnar
2008-06-17 8:32 ` Ingo Molnar
2008-06-17 9:08 ` David Miller
2008-06-17 9:27 ` Ingo Molnar
2008-06-17 9:29 ` David Miller
2008-06-17 9:39 ` Ingo Molnar
2008-06-18 18:50 ` [E1000-devel] " Kok, Auke
2008-06-18 20:08 ` Ingo Molnar
2008-06-18 21:25 ` [E1000-devel] " Kok, Auke
2008-06-18 22:12 ` David Miller
2008-06-19 7:06 ` Jarek Poplawski
2008-06-18 21:32 ` Ingo Molnar
2008-06-18 21:41 ` Denys Fedoryshchenko
2008-06-18 22:05 ` Ingo Molnar
2008-06-18 22:44 ` Denys Fedoryshchenko
2008-06-18 23:14 ` Ingo Molnar
2008-06-17 8:43 ` Vitaliy Gusev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080613063037.GA16943@elte.hu \
--to=mingo@elte.hu \
--cc=davem@davemloft.net \
--cc=ilpo.jarvinen@helsinki.fi \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=mcmanus@ducksong.com \
--cc=netdev@vger.kernel.org \
--cc=vgusev@openvz.org \
--cc=xemul@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).