Re: [RFC] EADDRINUSE from bind() on application restart after killing

Netdev List
 help / color / mirror / Atom feed

From: Paul Gofman <pgofman@codeweavers.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Muhammad Usama Anjum <usama.anjum@collabora.com>,
	"open list:NETWORKING [TCP]" <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
	David Ahern <dsahern@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: [RFC] EADDRINUSE from bind() on application restart after killing
Date: Fri, 14 Oct 2022 11:39:29 -0500	[thread overview]
Message-ID: <342a762d-22f5-b979-411f-aab0474feda2@codeweavers.com> (raw)
In-Reply-To: <CANn89iKD=ceuLnhK-zpk3QerpS-FUb_wb_HevkpvsVqGJ_T4NQ@mail.gmail.com>

Sorry if I was unclear, to reformulate my question, is blocking 
listening port (not the accept one) this way a IETF requirement? I am 
asking because I could not find where such a requirement stems from 
there. Sorry if I am missing the obvious.

On 10/14/22 11:34, Eric Dumazet wrote:
>>       My question is if the behaviour of blocking listen socket port
>> while the accepted port (which, as I understand, does not have any
>> direct relation to listen port anymore from TCP standpoint) is still in
>> TIME_ or other wait is stipulated by TCP requirements which I am
>> missing? Or, if not, maybe that can be changed?
>>
> Please raise these questions at IETF, this is where major TCP changes
> need to be approved.
>
> There are multiple ways to avoid TIME_WAIT, if you really need to.
>
>
>> Thanks,
>>       Paul.
>>
>>
>> On 10/14/22 11:20, Eric Dumazet wrote:
>>> On Fri, Oct 14, 2022 at 8:52 AM Paul Gofman <pgofman@codeweavers.com> wrote:
>>>> Hello Eric,
>>>>
>>>> our problem is actually not with the accept socket / port for which
>>>> those timeouts apply, we don't care for that temporary port number. The
>>>> problem is that the listen port (to which apps bind explicitly) is also
>>>> busy until the accept socket waits through all the necessary timeouts
>>>> and is fully closed. From my reading of TCP specs I don't understand why
>>>> it should be this way. The TCP hazards stipulating those timeouts seem
>>>> to apply to accept (connection) socket / port only. Shouldn't listen
>>>> socket's port (the only one we care about) be available for bind
>>>> immediately after the app stops listening on it (either due to closing
>>>> the listen socket or process force kill), or maybe have some other
>>>> timeouts not related to connected accept socket / port hazards? Or am I
>>>> missing something why it should be the way it is done now?
>>>>
>>> To quote your initial message :
>>>
>>> <quote>
>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
>>> socket in a hack. But this hack cannot be added to the application
>>> process as we don't own it.
>>> </quote>
>>>
>>> Essentially you are complaining of the linux kernel being unable to
>>> run a buggy application.
>>>
>>> We are not going to change the linux kernel because you can not
>>> fix/recompile an application.
>>>
>>> Note that you could use LD_PRELOAD, or maybe eBPF to automatically
>>> turn SO_REUSEADDR before bind()
>>>
>>>
>>>> Thanks,
>>>>        Paul.
>>>>
>>>>
>>>> On 9/30/22 10:16, Eric Dumazet wrote:
>>>>> On Fri, Sep 30, 2022 at 6:24 AM Muhammad Usama Anjum
>>>>> <usama.anjum@collabora.com> wrote:
>>>>>> Hi Eric,
>>>>>>
>>>>>> RFC 1337 describes the TIME-WAIT Assassination Hazards in TCP. Because
>>>>>> of this hazard we have 60 seconds timeout in TIME_WAIT state if
>>>>>> connection isn't closed properly. From RFC 1337:
>>>>>>> The TIME-WAIT delay allows all old duplicate segments time
>>>>>> enough to die in the Internet before the connection is reopened.
>>>>>>
>>>>>> As on localhost there is virtually no delay. I think the TIME-WAIT delay
>>>>>> must be zero for localhost connections. I'm no expert here. On localhost
>>>>>> there is no delay. So why should we wait for 60 seconds to mitigate a
>>>>>> hazard which isn't there?
>>>>> Because we do not specialize TCP stack for loopback.
>>>>>
>>>>> It is easy to force delays even for loopback (tc qdisc add dev lo root
>>>>> netem ...)
>>>>>
>>>>> You can avoid TCP complexity (cpu costs) over loopback using AF_UNIX instead.
>>>>>
>>>>> TIME_WAIT sockets are optional.
>>>>> If you do not like them, simply set /proc/sys/net/ipv4/tcp_max_tw_buckets to 0 ?
>>>>>
>>>>>> Zapping the sockets in TIME_WAIT and FIN_WAIT_2 does removes them. But
>>>>>> zap is required from privileged (CAP_NET_ADMIN) process. We are having
>>>>>> hard time finding a privileged process to do this.
>>>>> Really, we are not going to add kludges in TCP stacks because of this reason.
>>>>>
>>>>>> Thanks,
>>>>>> Usama
>>>>>>
>>>>>>
>>>>>> On 5/24/22 1:18 PM, Muhammad Usama Anjum wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> We have a set of processes which talk with each other through a local
>>>>>>> TCP socket. If the process(es) are killed (through SIGKILL) and
>>>>>>> restarted at once, the bind() fails with EADDRINUSE error. This error
>>>>>>> only appears if application is restarted at once without waiting for 60
>>>>>>> seconds or more. It seems that there is some timeout of 60 seconds for
>>>>>>> which the previous TCP connection remains alive waiting to get closed
>>>>>>> completely. In that duration if we try to connect again, we get the error.
>>>>>>>
>>>>>>> We are able to avoid this error by adding SO_REUSEADDR attribute to the
>>>>>>> socket in a hack. But this hack cannot be added to the application
>>>>>>> process as we don't own it.
>>>>>>>
>>>>>>> I've looked at the TCP connection states after killing processes in
>>>>>>> different ways. The TCP connection ends up in 2 different states with
>>>>>>> timeouts:
>>>>>>>
>>>>>>> (1) Timeout associated with FIN_WAIT_1 state which is set through
>>>>>>> `tcp_fin_timeout` in procfs (60 seconds by default)
>>>>>>>
>>>>>>> (2) Timeout associated with TIME_WAIT state which cannot be changed. It
>>>>>>> seems like this timeout has come from RFC 1337.
>>>>>>>
>>>>>>> The timeout in (1) can be changed. Timeout in (2) cannot be changed. It
>>>>>>> also doesn't seem feasible to change the timeout of TIME_WAIT state as
>>>>>>> the RFC mentions several hazards. But we are talking about a local TCP
>>>>>>> connection where maybe those hazards aren't applicable directly? Is it
>>>>>>> possible to change timeout for TIME_WAIT state for only local
>>>>>>> connections without any hazards?
>>>>>>>
>>>>>>> We have tested a hack where we replace timeout of TIME_WAIT state from a
>>>>>>> value in procfs for local connections. This solves our problem and
>>>>>>> application starts to work without any modifications to it.
>>>>>>>
>>>>>>> The question is that what can be the best possible solution here? Any
>>>>>>> thoughts will be very helpful.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>> --
>>>>>> Muhammad Usama Anjum

next prev parent reply	other threads:[~2022-10-14 16:39 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-24  8:18 [RFC] EADDRINUSE from bind() on application restart after killing Muhammad Usama Anjum
2022-05-24 22:13 ` Eric Dumazet
2022-05-30 13:15   ` Muhammad Usama Anjum
2022-05-30 15:28     ` Eric Dumazet
2022-06-27 10:20       ` Muhammad Usama Anjum
2022-06-27 11:47         ` Eric Dumazet
2022-09-30 13:24 ` Muhammad Usama Anjum
2022-09-30 15:16   ` Eric Dumazet
2022-10-14 15:52     ` Paul Gofman
2022-10-14 16:20       ` Eric Dumazet
2022-10-14 16:31         ` Paul Gofman
2022-10-14 16:34           ` Eric Dumazet
2022-10-14 16:39             ` Paul Gofman [this message]
2022-10-14 16:45               ` Eric Dumazet
2022-10-14 17:20                 ` Paul Gofman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=342a762d-22f5-b979-411f-aab0474feda2@codeweavers.com \
    --to=pgofman@codeweavers.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=usama.anjum@collabora.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox