From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756444AbYFAGFj (ORCPT ); Sun, 1 Jun 2008 02:05:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751703AbYFAGFb (ORCPT ); Sun, 1 Jun 2008 02:05:31 -0400 Received: from smtp21.orange.fr ([80.12.242.49]:61477 "EHLO smtp21.orange.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751276AbYFAGFa convert rfc822-to-8bit (ORCPT ); Sun, 1 Jun 2008 02:05:30 -0400 X-ME-UUID: 20080601060527277.439D81C0008F@mwinf2122.orange.fr Message-ID: <48423C0A.1090104@cosmosbay.com> Date: Sun, 01 Jun 2008 08:04:58 +0200 From: Eric Dumazet User-Agent: Thunderbird 1.5.0.14 (Windows/20071210) MIME-Version: 1.0 To: Patrick McManus Cc: Ingo Molnar , =?ISO-8859-1?Q?Ilpo_J=E4rvinen?= , Peter Zijlstra , LKML , Netdev , "David S. Miller" , "Rafael J. Wysocki" , Andrew Morton , Evgeniy Polyakov Subject: Re: [bug] stuck localhost TCP connections, v2.6.26-rc3+ References: <20080526115628.GA31316@elte.hu> <20080529084524.GA24892@elte.hu> <20080529112257.GA18130@elte.hu> <20080530181839.GA31915@elte.hu> <20080531060947.GA26441@elte.hu> <20080531125428.GA22111@elte.hu> <20080531163501.GB22607@elte.hu> <1212273974.28319.107.camel@tng> In-Reply-To: <1212273974.28319.107.camel@tng> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Patrick McManus a écrit : > On Sat, 2008-05-31 at 18:35 +0200, Ingo Molnar wrote: > >> * Ilpo Järvinen wrote: >> >> > > >>> ...setsockopt(listenfd, SOL_TCP, TCP_DEFER_ACCEPT, &val, sizeof(val)) >>> seems to be the magic trick that is interestion here. >>> >> seems to be used: >> >> 22003 write(3, "distccd[22003] (dcc_listen_by_ad"..., 62) = 62 >> 22003 listen(4, 10) = 0 >> 22003 setsockopt(4, SOL_TCP, TCP_DEFER_ACCEPT, [1], 4) = 0 >> >> i'll queue up your reverts for testing in -tip. >> > > > So the code you will revert came from my fingers. The circumstances here > make me nervous; while I'm at a loss to explain what might be going on > in particular, let me offer an apology in advance should the revert help > resolve the issue. > > Here's what makes me nervous: > > * not a lot of code uses DEFER_ACCEPT.. frankly it was pretty broken > before 26 - but not broken this way .. the correlation of your bug using > it is significant. > > * in 26, a server TCP socket (with DA) goes to ESTABLISHED when the 3rd > part of the handshake is received (as normal without DA), but the socket > isn't put on the accept queue until a real data packet arrives. (That's > the point of DA). In <= 25 this socket would have syn-recv until the > data packet arrived. > > - I did run tests where the server died in between the handshake being > completed and first data packet arriving - the client should see RST and > the server socket should disappear. But maybe something was missed? > > Do I understand this correctly, the server process is gone but the > socket is still in the table? And the client process is still there > waiting for the server to do something - having sent a bunch of data? > > Do we know if any data bytes (not handshake bytes) have been consumed by > the server side? If they were, that would seem to vindicate DA. > > Also pointing away from DA is that you started seeing this with rc3 - > that code was included in rc1.Is that a firm observation, or maybe there > weren't enough datapoints to conclude that rc1 and rc2 were clean? > > The most interesting patch is ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 > if anyone wants to eyeball it. > > > > I believe Ingo problems come on long lived sockets (were many bytes were exchanged between the peers), so I dont think DEFER_ACCEPT is the cullprit. I suggest to enable CONFIG_TIMER_STATS and to check timers, because /proc/net/tcp can display apparently large timer values when the timer is elapsed (jiffies > icsk->icsk_timeout) and jiffies_to_clock_t(timer_expires - jiffies) is then overflowing doing a multiply and a divide. On a 64bits server running linux-2.6.24-rc2, I can see *strange* timers values too in /proc/net/tcp, but not stuck TCP sessions. On 64 bits, these strange values have 1AD7F grep 1AD7F /proc/net/tcp | obfuscate_IP_and_ports 2017: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000001 0 0 0 2 ffff81067e7520c0 2019: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000003 0 0 0 2 ffff8106c580bcc0 2029: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000002 0 0 0 2 ffff81067313fe40 2032: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000003 0 0 0 2 ffff8106c716c340 2039: local_peer remote_peer 03 00000000:00000000 01:1AD7F29ABBA 00000002 0 0 0 2 ffff8107d45b3f40 2041: local_peer remote_peer 03 00000000:00000000 01:1AD7F29AB37 00000000 0 0 0 2 ffff810718e221c0 6610: local_peer remote_peer 01 00000000:00000000 00:1AD7F29ABCA 00000000 0 0 136594789 1 ffff8107183fb940 94 10 16 2 -1 9925: local_peer remote_peer 01 00000000:00000000 00:1AD7F29ABCA 00000000 0 0 144451161 1 ffff8107051a9840 351 10 0 2 -1 On TCP_SYN_RECV (03) sockets, timer can apparently be elapsed by many ticks, while on TCP_ESTABLISHED (01) one, I get jiffies_to_clock(-1) -> 1AD7F29ABCA value because the way get_tcp4_sock() is coded (jiffies can change while running this function). Note the 00: that means that no timer in my case. Running again the command one second later gives completely different results (other sockets are displayed) Maybe on 2.6.26-rc3+ we miss some timer correctness or we expose a latent NET bug. void sk_reset_timer(struct sock *sk, struct timer_list* timer, unsigned long expires) { if (!mod_timer(timer, expires)) sock_hold(sk); } Note that arming a timer also increase socket refcount and could explain why Ingo have sockets apparently not owned by a process but still referenced (by a timer or many ones (I see refcnt=5) on following snapshots) > sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt > uid timeout inode > 21: 111111AC:0016 480111AC:E4E9 01 00000B50:00000000 01:7D1F8746 00000000 > 0 0 398713 5 f71a8580 205 40 1 36 -1 > 23: 111111AC:0016 480111AC:D359 01 000010F8:00000000 01:7D19A035 00000000 > 0 0 396426 5 f71a8a80 202 42 1 144 -1 > 25: 111111AC:0016 480111AC:8565 01 00000B50:00000000 01:7CEBA7D1 00000000 > 0 0 349113 5 eeeaf580 204 40 1 26 -1 Just my initial thoughts, sorry I currently cannot spend much time to diagnose the problem.