From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
To: Patrick McManus <mcmanus@ducksong.com>,
Arjan van de Ven <arjan@infradead.org>
Cc: Ingo Molnar <mingo@elte.hu>, David Miller <davem@davemloft.net>,
peterz@infradead.org, LKML <linux-kernel@vger.kernel.org>,
Netdev <netdev@vger.kernel.org>,
rjw@sisk.pl, Andrew Morton <akpm@linux-foundation.org>,
johnpol@2ka.mipt.ru
Subject: Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
Date: Sat, 7 Jun 2008 00:12:55 +0300 (EEST) [thread overview]
Message-ID: <Pine.LNX.4.64.0806062314420.9424@wrl-59.cs.helsinki.fi> (raw)
In-Reply-To: <1212782937.23706.46.camel@tng>
[-- Attachment #1: Type: TEXT/PLAIN, Size: 4077 bytes --]
...added Arjan.
On Fri, 6 Jun 2008, Patrick McManus wrote:
> This is all a bit confusing, but here are the conclusions I have drawn.
Your observations here match what I've understood :-).
> There definitely is a problem with the locking of the DA commit
> ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 . That code was part of 26-rc1
> but it never appeared in 25. It exists in pretty much the same form in
> rc5 (there was 1 patch to it over that time to fix a different problem).
>
> We're certain this code has a problem with the accept queue both because
> of code inspection and the fact that Ingo can back it out (as the
> significant part of the 3-patch revert) and the problem goes away in his
> testing.
Problems were at least these:
- Accept queue addition was racy and could leave dangling items
- Dangling items caused inconsistent sk_ack_backlog
- Checking for still in LISTEN state was racy, could be changed after
the check was made (shouldn't happen with distcc though)
I didn't read ->sk_data_ready that carefully, it could have some
additional problems that are not listed (but they all should be fixed
by the added locking anyway).
AFAICT, rest of that ec3c change is safe wrt. locking, just holding sk is
enough for the rest and those bits mostly shouldn't anyway be executed
with a distcc setup.
> I have run tests that can reproduce the hung socket with distcc over
> localhost using 26-rc5. I can also apparently cure it using the locking
> fix patch Ilpo sent (c9454f0..d21d2b9) on top of that. (My test of rc5
> +lockpatch is at 4.5+ hrs and counting without failures, it fails 6
> times an hour with vanilla rc5)
>
> Based on all of that, the right thing to do seems to be to apply the
> lockpatch (c9454f0..d21d2b9) to Linus's tree and not revert anything -
> just fix the code and I'll send Ilpo and Ingo cookies at Christmas time
> for being great guys. Alternatively, Ingo could run the distcc servers
> and clients on -tip with the lockpatch (nothing reverted) for more
> testing.
Anyway, we still would have an option to revert both the DA change + the
locking fix later if the problem is still clearly more likely than with
stable-2.6.25.
> The only lingering problem is Ingo's report yesterday
> http://marc.info/?l=linux-netdev&m=121267587715976&w=2
> of a distcc hang. In this one it was not over localhost and the distcc
> server had the ec3c DA changes totally reverted. (The server is really
> the only stack that matters in this case - the client is not impacted by
> the DA changes).
It definately didn't fit to picture that well if we would be talking just
a single bug here.
...I wish Ingo would have provided the receiver state already then. :-)
> This has to be a different issue, because the ec3c code
> we're talking about here wasn't on the server at all. As Ilpo mentions,
> Hakon is beleived to have a different problem and maybe you've tripped
> over that too?
...The Håkon's case is definately different thing, also the symptoms
are quite different because there's no deadlock at all but the TCP flow
eventually dies, I don't yet know with what timescale that dying happens.
Only common denominator actually was this receiver process missing, though
it provably still was there.
Besides, I don't know how long Ingo waited in this case until concluding
that the TCP was stuck again?
> If we're sure of that conclusion we should just take Ilpo's DA patch as
> that will narrow the field for finding Hakon's issue. Its just with all
> of these data points I'm not sure if I'm reaching the right conclusion.
Lets widen the scope to two to three bugs then, one down already...
In case you missed btw, also Arjan reported some problem quite early, but
in his case claws mua+imap was the workload, so I doubt that DEFER_ACCEPT
would be involved but who knows without strace, here:
http://marc.info/?l=linux-kernel&m=121182171000434&w=2
Arjan, can you please check if your workload uses setsockopt
TCP_DEFER_ACCEPT for the LISTENing socket? ...If not, then your case
is different from Ingo's.
--
i.
next prev parent reply other threads:[~2008-06-06 21:12 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-26 11:56 [bug] stuck localhost TCP connections, v2.6.26-rc3+ Ingo Molnar
2008-05-26 13:28 ` Ilpo Järvinen
2008-05-26 13:59 ` Ingo Molnar
2008-05-26 14:12 ` Ingo Molnar
2008-05-26 14:17 ` Ingo Molnar
2008-05-26 14:29 ` Ingo Molnar
2008-05-26 14:43 ` Ilpo Järvinen
2008-05-26 14:58 ` Ilpo Järvinen
2008-05-26 16:23 ` Ingo Molnar
2008-05-26 16:32 ` Ilpo Järvinen
2008-05-26 16:54 ` Ingo Molnar
2008-05-26 17:08 ` Ilpo Järvinen
2008-05-26 18:12 ` Ingo Molnar
2008-05-26 20:41 ` Ingo Molnar
2008-05-26 21:20 ` Ilpo Järvinen
2008-05-30 16:23 ` Ray Lee
2008-05-26 16:24 ` Arjan van de Ven
2008-05-28 9:27 ` Peter Zijlstra
2008-05-31 14:25 ` Håkon Løvdal
2008-05-31 16:09 ` Ilpo Järvinen
2008-05-31 17:22 ` Ilpo Järvinen
2008-05-31 17:58 ` Håkon Løvdal
2008-05-31 18:37 ` Ilpo Järvinen
2008-05-31 20:25 ` Håkon Løvdal
2008-05-31 21:39 ` Ilpo Järvinen
2008-05-31 21:45 ` Håkon Løvdal
2008-06-04 0:10 ` Håkon Løvdal
2008-06-04 11:14 ` Ilpo Järvinen
2008-06-04 14:00 ` Håkon Løvdal
2008-06-04 15:09 ` Ilpo Järvinen
2008-06-06 9:32 ` Håkon Løvdal
2008-06-09 19:24 ` Ilpo Järvinen
2008-06-10 23:26 ` Håkon Løvdal
2008-06-11 13:39 ` Ilpo Järvinen
2008-06-19 0:30 ` Håkon Løvdal
2008-05-29 8:45 ` Ingo Molnar
2008-05-29 11:14 ` Ilpo Järvinen
2008-05-29 11:22 ` Ingo Molnar
2008-05-29 13:05 ` Evgeniy Polyakov
2008-05-29 13:43 ` Ingo Molnar
2008-05-29 13:08 ` Ingo Molnar
2008-05-29 13:48 ` Ilpo Järvinen
2008-05-30 11:09 ` Ingo Molnar
2008-05-30 21:12 ` Ilpo Järvinen
2008-05-30 18:18 ` Ingo Molnar
2008-05-31 6:09 ` Ingo Molnar
2008-05-31 11:46 ` Ilpo Järvinen
2008-05-31 12:18 ` Ilpo Järvinen
2008-05-31 12:54 ` Ingo Molnar
2008-05-31 12:58 ` Ilpo Järvinen
2008-05-31 16:35 ` Ingo Molnar
2008-05-31 22:46 ` Patrick McManus
2008-06-01 5:51 ` Ilpo Järvinen
2008-06-01 6:04 ` Eric Dumazet
2008-06-02 9:23 ` Ingo Molnar
2008-06-03 9:40 ` [fixed] [patch] " Ingo Molnar
2008-06-03 14:41 ` Patrick McManus
2008-06-03 21:46 ` Ilpo Järvinen
2008-06-03 22:01 ` Ilpo Järvinen
2008-06-03 22:03 ` David Miller
2008-06-03 22:10 ` Ilpo Järvinen
2008-06-03 23:22 ` Ilpo Järvinen
2008-06-03 23:54 ` Joe Perches
2008-06-04 6:25 ` Ilpo Järvinen
2008-06-04 2:54 ` Patrick McManus
2008-06-04 6:42 ` Ilpo Järvinen
2008-06-05 14:22 ` Ingo Molnar
2008-06-05 18:00 ` Ilpo Järvinen
2008-06-05 21:13 ` Ilpo Järvinen
2008-06-05 23:29 ` Patrick McManus
2008-06-06 10:03 ` Ilpo Järvinen
2008-06-06 17:11 ` Patrick McManus
2008-06-06 17:33 ` Ingo Molnar
2008-06-06 18:19 ` Ilpo Järvinen
2008-06-06 18:39 ` Ingo Molnar
2008-06-06 19:49 ` Ilpo Järvinen
2008-06-06 20:08 ` Patrick McManus
2008-06-06 21:12 ` Ilpo Järvinen [this message]
2008-06-06 21:23 ` Arjan van de Ven
2008-06-06 21:28 ` Ilpo Järvinen
2008-06-10 22:49 ` David Miller
2008-06-06 18:25 ` Ilpo Järvinen
2008-06-10 22:32 ` David Miller
2008-06-11 13:10 ` Patrick McManus
2008-06-11 15:13 ` Ilpo Järvinen
2008-06-04 7:23 ` Ingo Molnar
2008-06-04 18:24 ` David Miller
2008-06-04 20:56 ` Ilpo Järvinen
2008-06-04 21:55 ` David Miller
-- strict thread matches above, loose matches on Subject: below --
2008-06-11 15:06 Alexey Kuznetsov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0806062314420.9424@wrl-59.cs.helsinki.fi \
--to=ilpo.jarvinen@helsinki.fi \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=davem@davemloft.net \
--cc=johnpol@2ka.mipt.ru \
--cc=linux-kernel@vger.kernel.org \
--cc=mcmanus@ducksong.com \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rjw@sisk.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).