netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
To: Ingo Molnar <mingo@elte.hu>
Cc: Patrick McManus <mcmanus@ducksong.com>,
	David Miller <davem@davemloft.net>,
	peterz@infradead.org, LKML <linux-kernel@vger.kernel.org>,
	Netdev <netdev@vger.kernel.org>,
	rjw@sisk.pl, Andrew Morton <akpm@linux-foundation.org>,
	johnpol@2ka.mipt.ru
Subject: Re: [fixed] [patch] Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+
Date: Fri, 6 Jun 2008 21:19:09 +0300 (EEST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0806062037240.9424@wrl-59.cs.helsinki.fi> (raw)
In-Reply-To: <20080606173339.GA30894@elte.hu>

...I kind of fail to follow in general in this mail which patch have been 
tested and where and when... But I understand that it's just due to number 
of tests & hosts & kernels & what-else you use and know by heart (and we 
don't do all that well :-)). But I'll try to still sort it out below...

On Fri, 6 Jun 2008, Ingo Molnar wrote:

> 
> * Patrick McManus <mcmanus@ducksong.com> wrote:
> 
> > When I apply the locking patch you (Ilpo) wrote, I cannot reproduce 
> > the error at all in the first 90 minutes of testing. I'll let the test 
> > run and update the list.
> > 
> > I'm holding out hope that Ingo's report did not have the locking patch 
> > on the distcc server end - because it certainly makes a difference for 
> > me.
> 
> Hm, the distcc server had the full 3-patch-revert from Ilpo, was that 
> supposed to fix the problem too, indirectly?

Yes, the problematic outside of locking portion shouldn't be there
without those DA changes.

> The box is running that 3-patch revert right now as well:
> 
>  phoenix:~> uptime
>   19:20:28 up  9:58,  2 users,  load average: 7.75, 13.88, 30.95
>  phoenix:~> uname -a
>  Linux phoenix 2.6.26-rc4 #2352 SMP Fri Jun 6 09:18:07 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> ... and i never saw a single hang today in the 10 hours of uptime this 
> box has. (and it built a good 500 kernel today) Nor any hang yesterday, 
> and that was a good 500 kernels too.
> 
> You can see it that the box built more than two thousand kernels in the 
> past few days alone, so it's a rather busy little bee. The other 
> testboxes built even more kernels - a quad box built and booted 2500 
> kernels:
> 
>  #define UTS_VERSION "#2524 SMP PREEMPT Fri Jun 6 19:22:21 CEST 2008"
> 
> and i never saw a hang on that box either. 
> 
> a third box has:
> 
>  titan:~> uname -a
>  Linux titan 2.6.26-rc5-00002-g737697d-dirty #2557 SMP PREEMPT Fri Jun 6 
>  19:24:00 CEST 2008 x86_64 x86_64 x86_64 GNU/Linux
> 
> (this is the one that showed the hang for the first time)
> 
> The total count of kernel bootups i did this week for -tip QA was 
> somewhere between five and ten thousand random build+bootups - and the 
> only time i got a hang was when i removed the 3-patch-revert 
> intentionally, on one of the boxes.

...and you added the locking fix there instead? Or was this a removal?

> Maybe that 3-patch-revert just makes this locking bug a bit less likely 
> to trigger, by accident?

No, part of the DEFER_ACCEPT stuff was postponed in 2.6.25..2.6.26-rc1 
timeframe (ec3c0982a2dd1e671bad8e9d26c28dcba0039d87) so that one portion 
of it ended up being added outside of the socket lock of the listening 
socket, while touching its datastructures. Without 
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 the deferred accept related 
things happen earlier, ie., while we still are under the lock of the 
listening socket. So that particular locking bug was _introduced_ by that 
ec3c change, not made more likely or so.

...Of course software is known to have bugs, so we might always be 
(un?)lucky and hit another one and confuse... :-)

> Out tip test-setup is specialized to find 
> arch/x86 and scheduler bugs, not primarily to find networking bugs. (but 
> at this test volume, and given that it makes use of distcc, it will 
> trigger them too.)

It seems to work quite well actually for this kind of networking related 
bugs too which hardly depend on network at all :-).

> i have a rather accurate timeline of when the hang first occured, do we 
> know the timeline of the introduction of the locking bug by any chance? 
> Which commit introduced it? (Ilpo's commit log does not say it)

Ah, sorry I forgot to add that one there, it was sent quite late in the 
night and I just couldn't get sleep until sending the fix... :-) It was 
one of the reverted ones that did it:
ec3c0982a2dd1e671bad8e9d26c28dcba0039d87.

> Your test results are compelling nevertheless so i'll do a retest in any 
> case, with all boxes either running an older kernel or a kernel with the 
> locking fix.

If you want an older kernel, you would have to go basically to 2.6.25 or 
so.

To summarize. Both 3changes+1fix revert (you refer to it only as 3-patch 
revert) _and_ the locking fix I made should fix the problem (obviously 
they exclude each other). ...And end which is significant is the one which 
has LISTENing sockets (please keep this in mind if you still get the hang
and provide some info).

-- 
 i.

  reply	other threads:[~2008-06-06 18:19 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-26 11:56 [bug] stuck localhost TCP connections, v2.6.26-rc3+ Ingo Molnar
2008-05-26 13:28 ` Ilpo Järvinen
2008-05-26 13:59   ` Ingo Molnar
2008-05-26 14:12     ` Ingo Molnar
2008-05-26 14:17       ` Ingo Molnar
2008-05-26 14:29         ` Ingo Molnar
2008-05-26 14:43         ` Ilpo Järvinen
2008-05-26 14:58       ` Ilpo Järvinen
2008-05-26 16:23         ` Ingo Molnar
2008-05-26 16:32           ` Ilpo Järvinen
2008-05-26 16:54             ` Ingo Molnar
2008-05-26 17:08               ` Ilpo Järvinen
2008-05-26 18:12                 ` Ingo Molnar
2008-05-26 20:41                   ` Ingo Molnar
2008-05-26 21:20                     ` Ilpo Järvinen
2008-05-30 16:23   ` Ray Lee
2008-05-26 16:24 ` Arjan van de Ven
2008-05-28  9:27 ` Peter Zijlstra
2008-05-31 14:25   ` Håkon Løvdal
2008-05-31 16:09     ` Ilpo Järvinen
2008-05-31 17:22       ` Ilpo Järvinen
2008-05-31 17:58       ` Håkon Løvdal
2008-05-31 18:37         ` Ilpo Järvinen
2008-05-31 20:25           ` Håkon Løvdal
2008-05-31 21:39             ` Ilpo Järvinen
2008-05-31 21:45               ` Håkon Løvdal
2008-06-04  0:10               ` Håkon Løvdal
2008-06-04 11:14                 ` Ilpo Järvinen
2008-06-04 14:00                   ` Håkon Løvdal
2008-06-04 15:09                     ` Ilpo Järvinen
2008-06-06  9:32                       ` Håkon Løvdal
2008-06-09 19:24                         ` Ilpo Järvinen
2008-06-10 23:26                           ` Håkon Løvdal
2008-06-11 13:39                             ` Ilpo Järvinen
2008-06-19  0:30                               ` Håkon Løvdal
2008-05-29  8:45 ` Ingo Molnar
2008-05-29 11:14   ` Ilpo Järvinen
2008-05-29 11:22     ` Ingo Molnar
2008-05-29 13:05       ` Evgeniy Polyakov
2008-05-29 13:43         ` Ingo Molnar
2008-05-29 13:08       ` Ingo Molnar
2008-05-29 13:48         ` Ilpo Järvinen
2008-05-30 11:09         ` Ingo Molnar
2008-05-30 21:12           ` Ilpo Järvinen
2008-05-30 18:18       ` Ingo Molnar
2008-05-31  6:09         ` Ingo Molnar
2008-05-31 11:46           ` Ilpo Järvinen
2008-05-31 12:18             ` Ilpo Järvinen
2008-05-31 12:54               ` Ingo Molnar
2008-05-31 12:58                 ` Ilpo Järvinen
2008-05-31 16:35                   ` Ingo Molnar
2008-05-31 22:46                     ` Patrick McManus
2008-06-01  5:51                       ` Ilpo Järvinen
2008-06-01  6:04                       ` Eric Dumazet
2008-06-02  9:23                         ` Ingo Molnar
2008-06-03  9:40                     ` [fixed] [patch] " Ingo Molnar
2008-06-03 14:41                       ` Patrick McManus
2008-06-03 21:46                       ` Ilpo Järvinen
2008-06-03 22:01                         ` Ilpo Järvinen
2008-06-03 22:03                           ` David Miller
2008-06-03 22:10                             ` Ilpo Järvinen
2008-06-03 23:22                             ` Ilpo Järvinen
2008-06-03 23:54                               ` Joe Perches
2008-06-04  6:25                                 ` Ilpo Järvinen
2008-06-04  2:54                               ` Patrick McManus
2008-06-04  6:42                                 ` Ilpo Järvinen
2008-06-05 14:22                               ` Ingo Molnar
2008-06-05 18:00                                 ` Ilpo Järvinen
2008-06-05 21:13                                   ` Ilpo Järvinen
2008-06-05 23:29                                     ` Patrick McManus
2008-06-06 10:03                                       ` Ilpo Järvinen
2008-06-06 17:11                                         ` Patrick McManus
2008-06-06 17:33                                           ` Ingo Molnar
2008-06-06 18:19                                             ` Ilpo Järvinen [this message]
2008-06-06 18:39                                               ` Ingo Molnar
2008-06-06 19:49                                                 ` Ilpo Järvinen
2008-06-06 20:08                                                 ` Patrick McManus
2008-06-06 21:12                                                   ` Ilpo Järvinen
2008-06-06 21:23                                                     ` Arjan van de Ven
2008-06-06 21:28                                                       ` Ilpo Järvinen
2008-06-10 22:49                                                   ` David Miller
2008-06-06 18:25                                           ` Ilpo Järvinen
2008-06-10 22:32                               ` David Miller
2008-06-11 13:10                                 ` Patrick McManus
2008-06-11 15:13                                 ` Ilpo Järvinen
2008-06-04  7:23                         ` Ingo Molnar
2008-06-04 18:24                           ` David Miller
2008-06-04 20:56                             ` Ilpo Järvinen
2008-06-04 21:55                               ` David Miller
  -- strict thread matches above, loose matches on Subject: below --
2008-06-11 15:06 Alexey Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0806062037240.9424@wrl-59.cs.helsinki.fi \
    --to=ilpo.jarvinen@helsinki.fi \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=johnpol@2ka.mipt.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcmanus@ducksong.com \
    --cc=mingo@elte.hu \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).