All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>, netdev@vger.kernel.org
Subject: Re: TCP_DEFER_ACCEPT is missing counter update
Date: Tue, 13 Oct 2009 10:29:03 +0200	[thread overview]
Message-ID: <4AD43A4F.1090800@gmail.com> (raw)
In-Reply-To: <20091013073410.GA3792@1wt.eu>

Willy Tarreau a écrit :
> On Tue, Oct 13, 2009 at 09:23:59AM +0200, Eric Dumazet wrote:
>> Willy Tarreau a écrit :
>>> Hello,
>>>
>>> I was trying to use TCP_DEFER_ACCEPT and noticed that if the
>>> client does not talk, the connection is never accepted and
>>> remains in SYN_RECV state until the retransmits expire, where
>>> it finally is deleted. This is bad when some firewall such as
>>> netfilter sits between the client and the server because the
>>> firewall sees the connection in ESTABLISHED state while the
>>> server will finally silently drop it without sending an RST.
>>>
>>> This behaviour contradicts the man page which says it should
>>> wait only for some time :
>>>
>>>        TCP_DEFER_ACCEPT (since Linux 2.4)
>>>           Allows a listener to be awakened only when data arrives
>>>           on the socket.  Takes an integer value  (seconds), this
>>>           can  bound  the  maximum  number  of attempts TCP will
>>>           make to complete the connection. This option should not
>>>           be used in code intended to be portable.
>>>
>>> Also, looking at ipv4/tcp.c, a retransmit counter is correctly
>>> computed :
>>>
>>>         case TCP_DEFER_ACCEPT:
>>>                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
>>>                 if (val > 0) {
>>>                         /* Translate value in seconds to number of
>>>                          * retransmits */
>>>                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
>>>                                val > ((TCP_TIMEOUT_INIT / HZ) <<
>>>                                        icsk->icsk_accept_queue.rskq_defer_accept))
>>>                                 icsk->icsk_accept_queue.rskq_defer_accept++;
>>>                         icsk->icsk_accept_queue.rskq_defer_accept++;
>>>                 }
>>>                 break;
>>>
>>> ==> rskq_defer_accept is used as a counter of retransmits.
>>>
>>> But in tcp_minisocks.c, this counter is only checked. And in
>>> fact, I have found no location which updates it. So I think
>>> that what was intended was to decrease it in tcp_minisocks
>>> whenever it is checked, which the trivial patch below does :
>>>
>>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
>>> index f8d67cc..1b443b0 100644
>>> --- a/net/ipv4/tcp_minisocks.c
>>> +++ b/net/ipv4/tcp_minisocks.c
>>> @@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>>>  	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
>>>  	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
>>> +		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
>>>  		inet_rsk(req)->acked = 1;
>>>  		return NULL;
>>>  	}
>>>  
>>
>> I dont understand why you want to decrement rskq_defer_accept here.
> 
> Because the "timeout" as set by setsockopt() is converted into number
> of retransmits.
> 
>> We receive a pure ACK (wihout DATA).
>> We should receive exactly one such ACK.
> 
> No, we will receive other ones because the socket remains in SYN_RECV
> and since the local system ignores this ACK, it will send a SYN-ACK
> again, triggering a new ACK from the client.
> 
> Although the concept looks strange at first, I think its implementation
> is in fact very smart because it manages to defer acceptation with an
> approximate timeout without using another timer.
> 
> The most common requirement should only be to wait for an HTTP request
> to come in, and setting the timeout to anything non-zero is enough to
> just drop the first empty ACK and immediately accept on the data
> segment, so this method fits this purpose perfectly.
> 

Indeed, thanks for this detailed explanation, I missed the SYN-ACK timer
and retransmits.

I played some years ago with TCP_DEFER_ACCEPT and got some unexpected results
on transmitted packets (server was consuming more bandwidth), and I know understand
it was very broken until today !

Thanks again Willy

  parent reply	other threads:[~2009-10-13  8:30 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-13  5:07 TCP_DEFER_ACCEPT is missing counter update Willy Tarreau
2009-10-13  7:11 ` David Miller
2009-10-13  7:19   ` Willy Tarreau
2009-10-13  7:27     ` David Miller
2009-10-13 21:27     ` Julian Anastasov
2009-10-14  4:52       ` Willy Tarreau
2009-10-14  7:27         ` Julian Anastasov
2009-10-14 20:17           ` Willy Tarreau
2009-10-14 21:12             ` Olaf van der Spek
2009-10-14 22:43             ` David Miller
2009-10-15  6:08               ` Willy Tarreau
2009-10-15  8:47                 ` Julian Anastasov
2009-10-15 12:41                   ` Willy Tarreau
2009-10-15 22:44                     ` Julian Anastasov
2009-10-16  3:51                       ` Eric Dumazet
2009-10-16  5:00                         ` Eric Dumazet
2009-10-16  5:29                           ` Willy Tarreau
2009-10-16  6:05                             ` Eric Dumazet
2009-10-16  6:18                               ` Willy Tarreau
2009-10-16  7:08                                 ` Eric Dumazet
2009-10-16  7:19                                   ` Willy Tarreau
2009-10-16  5:03                       ` Willy Tarreau
2009-10-16  8:49                         ` Julian Anastasov
2009-10-16 10:40                           ` Eric Dumazet
2009-10-16 19:27                             ` Willy Tarreau
2009-10-17 11:48                             ` Julian Anastasov
2009-10-17 12:07                               ` Eric Dumazet
2009-10-17 14:20                                 ` Julian Anastasov
2009-10-19 20:01                                   ` Eric Dumazet
2009-10-19 20:11                                     ` Willy Tarreau
2009-10-19 20:17                                       ` Eric Dumazet
2009-10-20  2:23                                     ` David Miller
2009-10-15  7:59               ` Julian Anastasov
2009-10-16 10:08           ` Ilpo Järvinen
2009-10-13  7:23 ` Eric Dumazet
2009-10-13  7:34   ` Willy Tarreau
2009-10-13  8:08     ` Olaf van der Spek
2009-10-13  8:29     ` Eric Dumazet [this message]
2009-10-13  8:35       ` David Miller
2009-10-13  7:35   ` David Miller
2009-10-13  8:12     ` Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AD43A4F.1090800@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=w@1wt.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.