netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
To: Ingo Molnar <mingo@elte.hu>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Netdev <netdev@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
Date: Thu, 29 May 2008 14:14:42 +0300 (EEST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0805291332401.16829@wrl-59.cs.helsinki.fi> (raw)
In-Reply-To: <20080529084524.GA24892@elte.hu>

On Thu, 29 May 2008, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> 
> > in an overnight -tip testruns that is based on recent -git i got two 
> > stuck TCP connections:
> > 
> > Active Internet connections (w/o servers)
> > Proto Recv-Q Send-Q Local Address               Foreign Address             State      
> > tcp        0 174592 10.0.1.14:58015             10.0.1.14:3632              ESTABLISHED 
> > tcp    72134      0 10.0.1.14:3632              10.0.1.14:58015             ESTABLISHED 
> 
> update: in the past 5 days of -tip testing i've gathered about 10 
> randconfig kernel configs that all produced such failures.

...I tried yesterday some accept (& read some) & close/exit type 
stressing but I couldn't get it to show up (though I'll try longer 
time later on and also fault style exiting).

> Since the bug itself is very elusive (it takes up to 50 boot + 
> kernel-rebuild-via-distccc iterations to trigger) bisection was still 
> not an option - but with 10 configs statistical analysis of the configs 
> is now possible.
> 
> I made a histogram of all kernel options present in those configs, and 
> one networking related kernel option stood out:
> 
>       5 CONFIG_TCP_CONG_ADVANCED=y
>       6 CONFIG_INET_TCP_DIAG=y
>       6 CONFIG_TCP_MD5SIG=y
>       9 CONFIG_TCP_CONG_CUBIC=y
> 
> that code is called in the bootlogs:
> 
> > [   13.279410] calling  cubictcp_register+0x0/0x80
> > [   13.279412] TCP cubic registered
> 
> the likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in my randconfig 
> runs is 75%. The likelyhood of CONFIG_TCP_CONG_CUBIC=y being enabled in 
> 10 configs in a row is 0.75^10, or 5.6%. So statistical analysis can say 
> it with a 95% confidence that the presence of this option correlates to 
> the hung sockets.

Do I understand you correctly... it doesn't explain the tenth case out 
of ten but just nine of them?

> i have started testing this theory now, via the patch below, which turns 
> off TCP_CONG_CUBIC. It will take about 50 bootups on the affected 
> testsystems to confirm. (it will take a couple of hours today as not all 
> testsystems show these hung socket symptoms)
> 
> distributions enable TCP_CONG_CUBIC by default:
> 
>   $ grep CUBIC /boot/config-2.6.24.7-92.fc8
>   CONFIG_TCP_CONG_CUBIC=y
>   CONFIG_DEFAULT_CUBIC=y
> 
> which would explain why Arjan and Peter triggered similar hangs as well.

Main problem with this explanation is that congestion control modules are 
only in use when TCP is in ESTABLISHED and transmitting normally, while it 
has nothing to do how we enter or leave ESTABLISHED.

But if it's really that the process who owned the connection already went 
away, I think we should end up into tcp_close() which changes the state 
from established (and send RST too if there's still data to be received, 
which would be picked up by the other end and that end would no longer 
keep established either). ...A failure to send the reset would show up in 
LINUX_MIB_TCPABORTFAILED. Because both ends remain in established, it kind 
of excludes the possibility that something would have accidently allowed 
the Recv-Q end to return back to established too (due to some bug).

To me there are mainly two weird things:
1) Why we see orphaning with data in the first place (I think distcc would 
be interested to read everything, unless some worker crashed in early...
Though some timeout in distcc could explain it as well but I don't know 
too well how distcc does everything)...
2) Why the connection is still in ESTABLISHED when it was orphaned and 
it has some data to receive... If it had some unread data, should be in 
CLOSE or in FIN_WAIT1 otherwise.

-- 
 i.

  reply	other threads:[~2008-05-29 11:14 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-26 11:56 [bug] stuck localhost TCP connections, v2.6.26-rc3+ Ingo Molnar
2008-05-26 13:28 ` Ilpo Järvinen
2008-05-26 13:59   ` Ingo Molnar
2008-05-26 14:12     ` Ingo Molnar
2008-05-26 14:17       ` Ingo Molnar
2008-05-26 14:29         ` Ingo Molnar
2008-05-26 14:43         ` Ilpo Järvinen
2008-05-26 14:58       ` Ilpo Järvinen
2008-05-26 16:23         ` Ingo Molnar
2008-05-26 16:32           ` Ilpo Järvinen
2008-05-26 16:54             ` Ingo Molnar
2008-05-26 17:08               ` Ilpo Järvinen
2008-05-26 18:12                 ` Ingo Molnar
2008-05-26 20:41                   ` Ingo Molnar
2008-05-26 21:20                     ` Ilpo Järvinen
2008-05-30 16:23   ` Ray Lee
2008-05-26 16:24 ` Arjan van de Ven
2008-05-28  9:27 ` Peter Zijlstra
2008-05-31 14:25   ` Håkon Løvdal
2008-05-31 16:09     ` Ilpo Järvinen
2008-05-31 17:22       ` Ilpo Järvinen
2008-05-31 17:58       ` Håkon Løvdal
2008-05-31 18:37         ` Ilpo Järvinen
2008-05-31 20:25           ` Håkon Løvdal
2008-05-31 21:39             ` Ilpo Järvinen
2008-05-31 21:45               ` Håkon Løvdal
2008-06-04  0:10               ` Håkon Løvdal
2008-06-04 11:14                 ` Ilpo Järvinen
2008-06-04 14:00                   ` Håkon Løvdal
2008-06-04 15:09                     ` Ilpo Järvinen
2008-06-06  9:32                       ` Håkon Løvdal
2008-06-09 19:24                         ` Ilpo Järvinen
2008-06-10 23:26                           ` Håkon Løvdal
2008-06-11 13:39                             ` Ilpo Järvinen
2008-06-19  0:30                               ` Håkon Løvdal
2008-05-29  8:45 ` Ingo Molnar
2008-05-29 11:14   ` Ilpo Järvinen [this message]
2008-05-29 11:22     ` Ingo Molnar
2008-05-29 13:05       ` Evgeniy Polyakov
2008-05-29 13:43         ` Ingo Molnar
2008-05-29 13:08       ` Ingo Molnar
2008-05-29 13:48         ` Ilpo Järvinen
2008-05-30 11:09         ` Ingo Molnar
2008-05-30 21:12           ` Ilpo Järvinen
2008-05-30 18:18       ` Ingo Molnar
2008-05-31  6:09         ` Ingo Molnar
2008-05-31 11:46           ` Ilpo Järvinen
2008-05-31 12:18             ` Ilpo Järvinen
2008-05-31 12:54               ` Ingo Molnar
2008-05-31 12:58                 ` Ilpo Järvinen
2008-05-31 16:35                   ` Ingo Molnar
2008-05-31 22:46                     ` Patrick McManus
2008-06-01  5:51                       ` Ilpo Järvinen
2008-06-01  6:04                       ` Eric Dumazet
2008-06-02  9:23                         ` Ingo Molnar
2008-06-03  9:40                     ` [fixed] [patch] " Ingo Molnar
2008-06-03 14:41                       ` Patrick McManus
2008-06-03 21:46                       ` Ilpo Järvinen
2008-06-03 22:01                         ` Ilpo Järvinen
2008-06-03 22:03                           ` David Miller
2008-06-03 22:10                             ` Ilpo Järvinen
2008-06-03 23:22                             ` Ilpo Järvinen
2008-06-03 23:54                               ` Joe Perches
2008-06-04  6:25                                 ` Ilpo Järvinen
2008-06-04  2:54                               ` Patrick McManus
2008-06-04  6:42                                 ` Ilpo Järvinen
2008-06-05 14:22                               ` Ingo Molnar
2008-06-05 18:00                                 ` Ilpo Järvinen
2008-06-05 21:13                                   ` Ilpo Järvinen
2008-06-05 23:29                                     ` Patrick McManus
2008-06-06 10:03                                       ` Ilpo Järvinen
2008-06-06 17:11                                         ` Patrick McManus
2008-06-06 17:33                                           ` Ingo Molnar
2008-06-06 18:19                                             ` Ilpo Järvinen
2008-06-06 18:39                                               ` Ingo Molnar
2008-06-06 19:49                                                 ` Ilpo Järvinen
2008-06-06 20:08                                                 ` Patrick McManus
2008-06-06 21:12                                                   ` Ilpo Järvinen
2008-06-06 21:23                                                     ` Arjan van de Ven
2008-06-06 21:28                                                       ` Ilpo Järvinen
2008-06-10 22:49                                                   ` David Miller
2008-06-06 18:25                                           ` Ilpo Järvinen
2008-06-10 22:32                               ` David Miller
2008-06-11 13:10                                 ` Patrick McManus
2008-06-11 15:13                                 ` Ilpo Järvinen
2008-06-04  7:23                         ` Ingo Molnar
2008-06-04 18:24                           ` David Miller
2008-06-04 20:56                             ` Ilpo Järvinen
2008-06-04 21:55                               ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2008-05-30 16:31 Ray Lee
2008-05-30 21:11 ` Ilpo Järvinen
2008-05-31  6:03   ` Evgeniy Polyakov
2008-05-31 10:05     ` Ilpo Järvinen
2008-06-02  6:19       ` Herbert Xu
2008-06-02 11:53         ` Ilpo Järvinen
2008-06-02 14:08           ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0805291332401.16829@wrl-59.cs.helsinki.fi \
    --to=ilpo.jarvinen@helsinki.fi \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).