From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3E7CC433F5 for ; Tue, 24 May 2022 08:19:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232234AbiEXITQ (ORCPT ); Tue, 24 May 2022 04:19:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231723AbiEXITP (ORCPT ); Tue, 24 May 2022 04:19:15 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [46.235.227.227]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B2685DA10; Tue, 24 May 2022 01:19:13 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (Authenticated sender: usama.anjum) with ESMTPSA id CFF4E1F4384F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1653380351; bh=UZRJ19uBktgKUYeBwlR0l4U9RP2hctsCuHcpTxjnTXA=; h=Date:From:Subject:Cc:To:From; b=CbA4VvBAHV85P+I6KMpuhAjzbA838pt0kGkBQ3b+M9djB/wWFUdQzqdLShR1DeN9J bxOn+13LEEpi1W1q+A6IVIElcCAry+j4bNfDJA1P58fa+6i8pE8XpXiuTeWfGlujMp NTW/FXoSxMmEk8m1cqzwHC+eNYk4YN3tfaw0NkiHYzWvmlYn8MMx040gCeAapRbt0B 1W1wmvqtVc8dZoebQDNhLh8lFtPsPcjbXM4V8YTZeb6lFp00Ek3krDjf7IqrrVU/UQ p97m3p7aL2tgO9WLsrlowIpDUtZDoUcoyCDjNB+KJIP6w8+MLbbw3VPF5s7zmSDCaM Oflm2Gc6iXTXw== Message-ID: <5099dc39-c6d9-115a-855b-6aa98d17eb4b@collabora.com> Date: Tue, 24 May 2022 13:18:55 +0500 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 From: Muhammad Usama Anjum Subject: [RFC] EADDRINUSE from bind() on application restart after killing Cc: usama.anjum@collabora.com, Gabriel Krisman Bertazi , LKML , open list Content-Language: en-US To: Eric Dumazet , "David S. Miller" , Hideaki YOSHIFUJI , David Ahern , Jakub Kicinski , Paolo Abeni , "open list:NETWORKING [TCP]" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hello, We have a set of processes which talk with each other through a local TCP socket. If the process(es) are killed (through SIGKILL) and restarted at once, the bind() fails with EADDRINUSE error. This error only appears if application is restarted at once without waiting for 60 seconds or more. It seems that there is some timeout of 60 seconds for which the previous TCP connection remains alive waiting to get closed completely. In that duration if we try to connect again, we get the error. We are able to avoid this error by adding SO_REUSEADDR attribute to the socket in a hack. But this hack cannot be added to the application process as we don't own it. I've looked at the TCP connection states after killing processes in different ways. The TCP connection ends up in 2 different states with timeouts: (1) Timeout associated with FIN_WAIT_1 state which is set through `tcp_fin_timeout` in procfs (60 seconds by default) (2) Timeout associated with TIME_WAIT state which cannot be changed. It seems like this timeout has come from RFC 1337. The timeout in (1) can be changed. Timeout in (2) cannot be changed. It also doesn't seem feasible to change the timeout of TIME_WAIT state as the RFC mentions several hazards. But we are talking about a local TCP connection where maybe those hazards aren't applicable directly? Is it possible to change timeout for TIME_WAIT state for only local connections without any hazards? We have tested a hack where we replace timeout of TIME_WAIT state from a value in procfs for local connections. This solves our problem and application starts to work without any modifications to it. The question is that what can be the best possible solution here? Any thoughts will be very helpful. Regards, -- Muhammad Usama Anjum