TCP_DEFER_ACCEPT is missing counter update

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* TCP_DEFER_ACCEPT is missing counter update
@ 2009-10-13  5:07 Willy Tarreau
  2009-10-13  7:11 ` David Miller
  2009-10-13  7:23 ` Eric Dumazet
  0 siblings, 2 replies; 41+ messages in thread
From: Willy Tarreau @ 2009-10-13  5:07 UTC (permalink / raw)
  To: netdev

Hello,

I was trying to use TCP_DEFER_ACCEPT and noticed that if the
client does not talk, the connection is never accepted and
remains in SYN_RECV state until the retransmits expire, where
it finally is deleted. This is bad when some firewall such as
netfilter sits between the client and the server because the
firewall sees the connection in ESTABLISHED state while the
server will finally silently drop it without sending an RST.

This behaviour contradicts the man page which says it should
wait only for some time :

       TCP_DEFER_ACCEPT (since Linux 2.4)
          Allows a listener to be awakened only when data arrives
          on the socket.  Takes an integer value  (seconds), this
          can  bound  the  maximum  number  of attempts TCP will
          make to complete the connection. This option should not
          be used in code intended to be portable.

Also, looking at ipv4/tcp.c, a retransmit counter is correctly
computed :

        case TCP_DEFER_ACCEPT:
                icsk->icsk_accept_queue.rskq_defer_accept = 0;
                if (val > 0) {
                        /* Translate value in seconds to number of
                         * retransmits */
                        while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                               val > ((TCP_TIMEOUT_INIT / HZ) <<
                                       icsk->icsk_accept_queue.rskq_defer_accept))
                                icsk->icsk_accept_queue.rskq_defer_accept++;
                        icsk->icsk_accept_queue.rskq_defer_accept++;
                }
                break;

==> rskq_defer_accept is used as a counter of retransmits.

But in tcp_minisocks.c, this counter is only checked. And in
fact, I have found no location which updates it. So I think
that what was intended was to decrease it in tcp_minisocks
whenever it is checked, which the trivial patch below does :

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index f8d67cc..1b443b0 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
 	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
 		inet_rsk(req)->acked = 1;
 		return NULL;
 	}

Is there anything I am missing ?

Regards,
Willy

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  5:07 TCP_DEFER_ACCEPT is missing counter update Willy Tarreau
@ 2009-10-13  7:11 ` David Miller
  2009-10-13  7:19   ` Willy Tarreau
  2009-10-13  7:23 ` Eric Dumazet
  1 sibling, 1 reply; 41+ messages in thread
From: David Miller @ 2009-10-13  7:11 UTC (permalink / raw)
  To: w; +Cc: netdev

From: Willy Tarreau <w@1wt.eu>
Date: Tue, 13 Oct 2009 07:07:06 +0200

> But in tcp_minisocks.c, this counter is only checked. And in
> fact, I have found no location which updates it. So I think
> that what was intended was to decrease it in tcp_minisocks
> whenever it is checked, which the trivial patch below does :
> 
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index f8d67cc..1b443b0 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>  	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
>  	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
> +		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
>  		inet_rsk(req)->acked = 1;
>  		return NULL;
>  	}
>  
> Is there anything I am missing ?

Your logic looks sound and I can't come to any other conclusion
after going over this code, even going back to 2.4.x

Feel free to make a formal patch submission, thanks Willy.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:11 ` David Miller
@ 2009-10-13  7:19   ` Willy Tarreau
  2009-10-13  7:27     ` David Miller
  2009-10-13 21:27     ` Julian Anastasov
  0 siblings, 2 replies; 41+ messages in thread
From: Willy Tarreau @ 2009-10-13  7:19 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Oct 13, 2009 at 12:11:06AM -0700, David Miller wrote:
> Your logic looks sound and I can't come to any other conclusion
> after going over this code, even going back to 2.4.x
> 
> Feel free to make a formal patch submission, thanks Willy.

Ok, here it comes (I used my explanation as the commit message).

Thanks David,
Willy

>From da80c99a503bab1256706ed8d967e2ab3f71afe0 Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w@1wt.eu>
Date: Tue, 13 Oct 2009 07:26:54 +0200
Subject: tcp: fix tcp_defer_accept to consider the timeout

I was trying to use TCP_DEFER_ACCEPT and noticed that if the
client does not talk, the connection is never accepted and
remains in SYN_RECV state until the retransmits expire, where
it finally is deleted. This is bad when some firewall such as
netfilter sits between the client and the server because the
firewall sees the connection in ESTABLISHED state while the
server will finally silently drop it without sending an RST.

This behaviour contradicts the man page which says it should
wait only for some time :

       TCP_DEFER_ACCEPT (since Linux 2.4)
          Allows a listener to be awakened only when data arrives
          on the socket.  Takes an integer value  (seconds), this
          can  bound  the  maximum  number  of attempts TCP will
          make to complete the connection. This option should not
          be used in code intended to be portable.

Also, looking at ipv4/tcp.c, a retransmit counter is correctly
computed :

        case TCP_DEFER_ACCEPT:
                icsk->icsk_accept_queue.rskq_defer_accept = 0;
                if (val > 0) {
                        /* Translate value in seconds to number of
                         * retransmits */
                        while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                               val > ((TCP_TIMEOUT_INIT / HZ) <<
                                       icsk->icsk_accept_queue.rskq_defer_accept))
                                icsk->icsk_accept_queue.rskq_defer_accept++;
                        icsk->icsk_accept_queue.rskq_defer_accept++;
                }
                break;

==> rskq_defer_accept is used as a counter of retransmits.

But in tcp_minisocks.c, this counter is only checked. And in
fact, I have found no location which updates it. So I think
that what was intended was to decrease it in tcp_minisocks
whenever it is checked, which the trivial patch below does.

Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_minisocks.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index f8d67cc..2f676f3 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -644,6 +644,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
 	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
 	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
+		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
 		inet_rsk(req)->acked = 1;
 		return NULL;
 	}
-- 
1.5.3.3


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:19   ` Willy Tarreau
@ 2009-10-13  7:27     ` David Miller
  2009-10-13 21:27     ` Julian Anastasov
  1 sibling, 0 replies; 41+ messages in thread
From: David Miller @ 2009-10-13  7:27 UTC (permalink / raw)
  To: w; +Cc: netdev

From: Willy Tarreau <w@1wt.eu>
Date: Tue, 13 Oct 2009 09:19:55 +0200

> On Tue, Oct 13, 2009 at 12:11:06AM -0700, David Miller wrote:
>> Your logic looks sound and I can't come to any other conclusion
>> after going over this code, even going back to 2.4.x
>> 
>> Feel free to make a formal patch submission, thanks Willy.
> 
> Ok, here it comes (I used my explanation as the commit message).

Applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:19   ` Willy Tarreau
  2009-10-13  7:27     ` David Miller
@ 2009-10-13 21:27     ` Julian Anastasov
  2009-10-14  4:52       ` Willy Tarreau
  1 sibling, 1 reply; 41+ messages in thread
From: Julian Anastasov @ 2009-10-13 21:27 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev


	Hello,

On Tue, 13 Oct 2009, Willy Tarreau wrote:

> >From da80c99a503bab1256706ed8d967e2ab3f71afe0 Mon Sep 17 00:00:00 2001
> From: Willy Tarreau <w@1wt.eu>
> Date: Tue, 13 Oct 2009 07:26:54 +0200
> Subject: tcp: fix tcp_defer_accept to consider the timeout
> 
> I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> client does not talk, the connection is never accepted and
> remains in SYN_RECV state until the retransmits expire, where
> it finally is deleted. This is bad when some firewall such as

	I think, this is by design, there is big comment in
tcp_check_req().

> netfilter sits between the client and the server because the
> firewall sees the connection in ESTABLISHED state while the
> server will finally silently drop it without sending an RST.

	Client can stay ESTABLISHED for long time but
RST will be sent when client sends DATA or FIN.

> This behaviour contradicts the man page which says it should
> wait only for some time :
> 
>        TCP_DEFER_ACCEPT (since Linux 2.4)
>           Allows a listener to be awakened only when data arrives
>           on the socket.  Takes an integer value  (seconds), this
>           can  bound  the  maximum  number  of attempts TCP will
>           make to complete the connection. This option should not
>           be used in code intended to be portable.

	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.

> Also, looking at ipv4/tcp.c, a retransmit counter is correctly
> computed :

	rskq_defer_accept is threshold, not counter

>         case TCP_DEFER_ACCEPT:
>                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
>                 if (val > 0) {
>                         /* Translate value in seconds to number of
>                          * retransmits */
>                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
>                                val > ((TCP_TIMEOUT_INIT / HZ) <<
>                                        icsk->icsk_accept_queue.rskq_defer_accept))
>                                 icsk->icsk_accept_queue.rskq_defer_accept++;
>                         icsk->icsk_accept_queue.rskq_defer_accept++;
>                 }
>                 break;
> 
> ==> rskq_defer_accept is used as a counter of retransmits.

	as limit for retransmits, not as counter

> But in tcp_minisocks.c, this counter is only checked. And in
> fact, I have found no location which updates it. So I think
> that what was intended was to decrease it in tcp_minisocks
> whenever it is checked, which the trivial patch below does.

	You can check net/ipv4/inet_connection_sock.c,
inet_csk_reqsk_queue_prune() where TCP_DEFER_ACCEPT can extend
the retransmission threshold for acked sockets above the
applied 'thresh'. So, there are 2 options:

a) TCP_DEFER_ACCEPT is used as flag (eg. 1) or the period is below
the TCP_SYNCNT period. In this case TCP_DEFER_ACCEPT does not
extend the period for DATA (DATA must come before TCP_SYNCNT).
Application is notified only when DATA comes.

or

b) TCP_DEFER_ACCEPT is set with seconds above the TCP_SYNCNT
retrans limit and the first ACK extends the period up to
TCP_DEFER_ACCEPT seconds (converted as retrans). By this
way we provide more time for DATA after the empty ACKs.
ACK again can come before TCP_SYNCNT but DATA after ACK
can come even after TCP_SYNCNT but before TCP_DEFER_ACCEPT
timeout. Again, application is notified only when DATA comes.

> Signed-off-by: Willy Tarreau <w@1wt.eu>
> ---
>  net/ipv4/tcp_minisocks.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index f8d67cc..2f676f3 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -644,6 +644,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>  	/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
>  	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
>  	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {

	This is wrong considering inet_csk_reqsk_queue_prune()
and patch should not be applied except if I'm missing something:

> +		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
>  		inet_rsk(req)->acked = 1;
>  		return NULL;
>  	}

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13 21:27     ` Julian Anastasov
@ 2009-10-14  4:52       ` Willy Tarreau
  2009-10-14  7:27         ` Julian Anastasov
  0 siblings, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2009-10-14  4:52 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev, Eric Dumazet

Hello Julian,

On Wed, Oct 14, 2009 at 12:27:41AM +0300, Julian Anastasov wrote:
> 
> 	Hello,
> 
> On Tue, 13 Oct 2009, Willy Tarreau wrote:
> 
> > >From da80c99a503bab1256706ed8d967e2ab3f71afe0 Mon Sep 17 00:00:00 2001
> > From: Willy Tarreau <w@1wt.eu>
> > Date: Tue, 13 Oct 2009 07:26:54 +0200
> > Subject: tcp: fix tcp_defer_accept to consider the timeout
> > 
> > I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> > client does not talk, the connection is never accepted and
> > remains in SYN_RECV state until the retransmits expire, where
> > it finally is deleted. This is bad when some firewall such as
> 
> 	I think, this is by design, there is big comment in
> tcp_check_req().

I'm not sure. That would considerably reduce the usefulness of
the feature. The comment I see there is just a one line explaining
why we drop the ACK. It does not indicate any strategy on what to
do when the counter expires.

> > netfilter sits between the client and the server because the
> > firewall sees the connection in ESTABLISHED state while the
> > server will finally silently drop it without sending an RST.
> 
> 	Client can stay ESTABLISHED for long time but
> RST will be sent when client sends DATA or FIN.

Yes you're right. In fact, this only weakens firewalls in case of
pure scans, but attacks on SYN cookies do that too, as well as
TTL-based attacks.

> > This behaviour contradicts the man page which says it should
> > wait only for some time :
> > 
> >        TCP_DEFER_ACCEPT (since Linux 2.4)
> >           Allows a listener to be awakened only when data arrives
> >           on the socket.  Takes an integer value  (seconds), this
> >           can  bound  the  maximum  number  of attempts TCP will
> >           make to complete the connection. This option should not
> >           be used in code intended to be portable.
> 
> 	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
> and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.

That's what I observe too, but the connection is silently dropped
afterwards and I'm clearly not sure this was the intended behaviour.

> > Also, looking at ipv4/tcp.c, a retransmit counter is correctly
> > computed :
> 
> 	rskq_defer_accept is threshold, not counter
> 
> >         case TCP_DEFER_ACCEPT:
> >                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
> >                 if (val > 0) {
> >                         /* Translate value in seconds to number of
> >                          * retransmits */
> >                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
> >                                val > ((TCP_TIMEOUT_INIT / HZ) <<
> >                                        icsk->icsk_accept_queue.rskq_defer_accept))
> >                                 icsk->icsk_accept_queue.rskq_defer_accept++;
> >                         icsk->icsk_accept_queue.rskq_defer_accept++;
> >                 }
> >                 break;
> > 
> > ==> rskq_defer_accept is used as a counter of retransmits.
> 
> 	as limit for retransmits, not as counter

yes if you want, that's what I mean.

> > But in tcp_minisocks.c, this counter is only checked. And in
> > fact, I have found no location which updates it. So I think
> > that what was intended was to decrease it in tcp_minisocks
> > whenever it is checked, which the trivial patch below does.
> 
> 	You can check net/ipv4/inet_connection_sock.c,
> inet_csk_reqsk_queue_prune() where TCP_DEFER_ACCEPT can extend
> the retransmission threshold for acked sockets above the
> applied 'thresh'.

So clearly this is in order to improve chances that the application
will receive the connection, no ?

> So, there are 2 options:
> 
> a) TCP_DEFER_ACCEPT is used as flag (eg. 1) or the period is below
> the TCP_SYNCNT period. In this case TCP_DEFER_ACCEPT does not
> extend the period for DATA (DATA must come before TCP_SYNCNT).
> Application is notified only when DATA comes.
> 
> or
> 
> b) TCP_DEFER_ACCEPT is set with seconds above the TCP_SYNCNT
> retrans limit and the first ACK extends the period up to
> TCP_DEFER_ACCEPT seconds (converted as retrans). By this
> way we provide more time for DATA after the empty ACKs.
> ACK again can come before TCP_SYNCNT but DATA after ACK
> can come even after TCP_SYNCNT but before TCP_DEFER_ACCEPT
> timeout. Again, application is notified only when DATA comes.

Yes this is what happens right now, but reading the man again
does not imply to me that the connection will not be accepted
once we reach the retransmit limit.

Maybe we have different usages and different interpretations of
the man can satisfy either, but I don't see what this would be
useful to in case we silently drop instead of finally accepting.

Regards,
Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-14  4:52       ` Willy Tarreau
@ 2009-10-14  7:27         ` Julian Anastasov
  2009-10-14 20:17           ` Willy Tarreau
  2009-10-16 10:08           ` Ilpo Järvinen
  0 siblings, 2 replies; 41+ messages in thread
From: Julian Anastasov @ 2009-10-14  7:27 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev, Eric Dumazet


	Hello,

On Wed, 14 Oct 2009, Willy Tarreau wrote:

> > > I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> > > client does not talk, the connection is never accepted and
> > > remains in SYN_RECV state until the retransmits expire, where
> > > it finally is deleted. This is bad when some firewall such as
> > 
> > 	I think, this is by design, there is big comment in
> > tcp_check_req().
> 
> I'm not sure. That would considerably reduce the usefulness of
> the feature. The comment I see there is just a one line explaining
> why we drop the ACK. It does not indicate any strategy on what to
> do when the counter expires.

	There were attempts to switch the server socket to
established state, no SYN-ACK retransmissions after client ACK
and send FIN (or RST) to client on TCP_DEFER_ACCEPT expiration
but due to some locking problems it was reverted:

http://marc.info/?t=121318919000001&r=1&w=2

	Current handling is as before: drop client ACKs before DATA,
stay in SYN_RECV state, send RST on outdated ACKs.

> > > netfilter sits between the client and the server because the
> > > firewall sees the connection in ESTABLISHED state while the
> > > server will finally silently drop it without sending an RST.
> > 
> > 	Client can stay ESTABLISHED for long time but
> > RST will be sent when client sends DATA or FIN.
> 
> Yes you're right. In fact, this only weakens firewalls in case of
> pure scans, but attacks on SYN cookies do that too, as well as
> TTL-based attacks.

	Hm, yes, may be firewalls should be better here.
They should know which server ports send welcome and which
do no not.

> > 	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
> > and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.
> 
> That's what I observe too, but the connection is silently dropped
> afterwards and I'm clearly not sure this was the intended behaviour.

	Yes, bad only for firewalls. Note that if
TCP_SYNCNT/TCP_DEFER_ACCEPT is shorter the client still gets
RST on next ACK. So, the problem is when client is silent
after first ACK.

> > > But in tcp_minisocks.c, this counter is only checked. And in
> > > fact, I have found no location which updates it. So I think
> > > that what was intended was to decrease it in tcp_minisocks
> > > whenever it is checked, which the trivial patch below does.
> > 
> > 	You can check net/ipv4/inet_connection_sock.c,
> > inet_csk_reqsk_queue_prune() where TCP_DEFER_ACCEPT can extend
> > the retransmission threshold for acked sockets above the
> > applied 'thresh'.
> 
> So clearly this is in order to improve chances that the application
> will receive the connection, no ?

	Yes, it is not logical to configure
TCP_DEFER_ACCEPT < TCP_SYNCNT with the idea TCP_DEFER_ACCEPT
to reduce the SYN_RECV period. TCP_DEFER_ACCEPT does not
reduce the SYN_RECV period (before ACK), may be this is lacking
in man page. The idea to optionally give more time for DATA
to come after ACK is more logical. If TCP_DEFER_ACCEPT
switches state to ESTABLISHED may be then the TCP_DEFER_ACCEPT
period should start after ACK, eg. "wait for DATA 10 seconds after
ACK" sounds logical, it will need new name... Note that
TCP_DEFER_ACCEPT is used also as flag for clients but you can do
the same with TCP_QUICKACK=0 (to delay first ACK and to send
DATA with first ACK).

> Yes this is what happens right now, but reading the man again
> does not imply to me that the connection will not be accepted
> once we reach the retransmit limit.
> 
> Maybe we have different usages and different interpretations of
> the man can satisfy either, but I don't see what this would be
> useful to in case we silently drop instead of finally accepting.

	May be we want the same but currently TCP_DEFER_ACCEPT
works as ACK dropper which is easier for implementation.
The semantic 'TCP_DEFER_ACCEPT extends the period after ACK'
is good, you can tune it together with TCP_SYNCNT, to
extend or not to extend the period. What happens on
TCP_DEFER_ACCEPT expiration after ACK - we all prefer to
see FIN, so we have to wait someone to come with new
implementation.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-14  7:27         ` Julian Anastasov
@ 2009-10-14 20:17           ` Willy Tarreau
  2009-10-14 21:12             ` Olaf van der Spek
  2009-10-14 22:43             ` David Miller
  2009-10-16 10:08           ` Ilpo Järvinen
  1 sibling, 2 replies; 41+ messages in thread
From: Willy Tarreau @ 2009-10-14 20:17 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev, Eric Dumazet

Hello Julian,

On Wed, Oct 14, 2009 at 10:27:50AM +0300, Julian Anastasov wrote:
> 	There were attempts to switch the server socket to
> established state, no SYN-ACK retransmissions after client ACK
> and send FIN (or RST) to client on TCP_DEFER_ACCEPT expiration
> but due to some locking problems it was reverted:
> 
> http://marc.info/?t=121318919000001&r=1&w=2

interesting lecture, thanks for the link.

> 	Current handling is as before: drop client ACKs before DATA,
> stay in SYN_RECV state, send RST on outdated ACKs.

OK, that's what I observed too.

(...)
> > > 	This works properly in 2.6.31.3, I set TCP_SYNCNT=1
> > > and TCP_DEFER_ACCEPT then only 2 SYN-ACKs are sent.
> > 
> > That's what I observe too, but the connection is silently dropped
> > afterwards and I'm clearly not sure this was the intended behaviour.
> 
> 	Yes, bad only for firewalls. Note that if
> TCP_SYNCNT/TCP_DEFER_ACCEPT is shorter the client still gets
> RST on next ACK. So, the problem is when client is silent
> after first ACK.

Not only, we have the problem that the application layer has no way
to be informed that someone connected and failed to send anything.

I have added the defer_accept feature in haproxy (http load balancer),
just as an optimization to avoid a EAGAIN upon first recv() attempt
to read the HTTP request after an accept. But users rely on stats and
logs a lot to check if their site works. If clients really connect and
fail to send a request (most often because of PPPoE outgoing MTU issues),
they want to see that in the stats in order to monitor their quality of
service. Also, having the ability to return a "408 request timeout" to
the users is a lot cleaner and easier to debug than not responding
anything at all, especially when there are reverse-proxies between
both ends.

So this behaviour has visible effects to the end user, that are not
expected at all when reading the doc.

> > So clearly this is in order to improve chances that the application
> > will receive the connection, no ?
> 
> 	Yes, it is not logical to configure
> TCP_DEFER_ACCEPT < TCP_SYNCNT with the idea TCP_DEFER_ACCEPT
> to reduce the SYN_RECV period. TCP_DEFER_ACCEPT does not
> reduce the SYN_RECV period (before ACK), may be this is lacking
> in man page. The idea to optionally give more time for DATA
> to come after ACK is more logical. If TCP_DEFER_ACCEPT
> switches state to ESTABLISHED may be then the TCP_DEFER_ACCEPT
> period should start after ACK, eg. "wait for DATA 10 seconds after
> ACK" sounds logical, it will need new name...

I don't even need to wait for 10 seconds after first ACK, being
able to drop only the first ACK is already excellent IMHO.

> Note that
> TCP_DEFER_ACCEPT is used also as flag for clients but you can do
> the same with TCP_QUICKACK=0 (to delay first ACK and to send
> DATA with first ACK).

Yes and I already use that, this gives substantial performance
boosts on small HTTP requests ; unfortunately I have no control
over the end-user's browser !

> > Yes this is what happens right now, but reading the man again
> > does not imply to me that the connection will not be accepted
> > once we reach the retransmit limit.
> > 
> > Maybe we have different usages and different interpretations of
> > the man can satisfy either, but I don't see what this would be
> > useful to in case we silently drop instead of finally accepting.
> 
> 	May be we want the same but currently TCP_DEFER_ACCEPT
> works as ACK dropper which is easier for implementation.

That's what I like too. No timer, very light requirements, just drop
the first empty ACK I'm not interested in, and that's all. That's
really what I understood from the man page.

> The semantic 'TCP_DEFER_ACCEPT extends the period after ACK'
> is good, you can tune it together with TCP_SYNCNT, to
> extend or not to extend the period. What happens on
> TCP_DEFER_ACCEPT expiration after ACK - we all prefer to
> see FIN, so we have to wait someone to come with new
> implementation.

Well, too much complicated for very little gain IMHO.

Regards,
Willy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-14 20:17           ` Willy Tarreau
@ 2009-10-14 21:12             ` Olaf van der Spek
  2009-10-14 22:43             ` David Miller
  1 sibling, 0 replies; 41+ messages in thread
From: Olaf van der Spek @ 2009-10-14 21:12 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev

On Wed, Oct 14, 2009 at 10:17 PM, Willy Tarreau <w@1wt.eu> wrote:
> Yes and I already use that, this gives substantial performance
> boosts on small HTTP requests ; unfortunately I have no control
> over the end-user's browser !

Most accept feature requests, don't they?

Olaf

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-14 20:17           ` Willy Tarreau
  2009-10-14 21:12             ` Olaf van der Spek
@ 2009-10-14 22:43             ` David Miller
  2009-10-15  6:08               ` Willy Tarreau
  2009-10-15  7:59               ` Julian Anastasov
  1 sibling, 2 replies; 41+ messages in thread
From: David Miller @ 2009-10-14 22:43 UTC (permalink / raw)
  To: w; +Cc: ja, netdev, eric.dumazet

From: Willy Tarreau <w@1wt.eu>
Date: Wed, 14 Oct 2009 22:17:06 +0200

> Hello Julian,
> 
> On Wed, Oct 14, 2009 at 10:27:50AM +0300, Julian Anastasov wrote:
>> The semantic 'TCP_DEFER_ACCEPT extends the period after ACK'
>> is good, you can tune it together with TCP_SYNCNT, to
>> extend or not to extend the period. What happens on
>> TCP_DEFER_ACCEPT expiration after ACK - we all prefer to
>> see FIN, so we have to wait someone to come with new
>> implementation.
> 
> Well, too much complicated for very little gain IMHO.

For now I'm pushing Willy's change into Linus's tree.

After more discussion we can revert if necessary.

I won't submit this to -stable until the discussion is fully resolved.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-14 22:43             ` David Miller
@ 2009-10-15  6:08               ` Willy Tarreau
  2009-10-15  8:47                 ` Julian Anastasov
  2009-10-15  7:59               ` Julian Anastasov
  1 sibling, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2009-10-15  6:08 UTC (permalink / raw)
  To: David Miller; +Cc: ja, netdev, eric.dumazet

On Wed, Oct 14, 2009 at 03:43:49PM -0700, David Miller wrote:
> For now I'm pushing Willy's change into Linus's tree.
> 
> After more discussion we can revert if necessary.
> 
> I won't submit this to -stable until the discussion is fully resolved.

Makes sense, thanks David.

BTW, I found a use case I didn't think about where current behaviour
causes trouble :

   https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/134274
   http://lkml.indiana.edu/hypermail/linux/kernel/0711.0/0461.html

In summary, when front proxies establish pools of connections to
an apache server making use of TCP_DEFER_ACCEPT, the connection
never establishes on the apache server but silently expires in
SYN_RECV state. The front proxy sees lots of SYN/ACKs and sends
many ACKs trying to complete this connection and finally believes
it got it since the server eventually becomes silent. However,
when trying to send data over such a socket, the server immediately
returns an RST.

Such a problem would not happen if we would only drop the first
X packets (X >= 1 is already fine), because the front proxy would
establish the connection, send a second ACK in response to the
second SYN/ACK and the connection would then really be established
and would not have to expire early in SYN_RECV state.

If we really want to behave as it does today, well, let's not fix
it, but obviously, I fail to see what real world use it has, except
causing random and hard to debug issues :-/

Reading the articles below clearly make it think it was designed
to help with HTTP connections by skipping the first expected and
useless ACK packet before waking up the task :

   http://httpd.apache.org/docs/1.3/misc/perf-bsd44.html
   http://articles.techrepublic.com.com/5100-10878_11-1050771.html

and people still get caught :

   http://lkml.indiana.edu/hypermail/linux/kernel/0711.0/0416.html

Maybe it was a bit over-engineered, in the end causing it to fail
to satisfy the primary goal ?

Regards,
Willy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-15  6:08               ` Willy Tarreau
@ 2009-10-15  8:47                 ` Julian Anastasov
  2009-10-15 12:41                   ` Willy Tarreau
  0 siblings, 1 reply; 41+ messages in thread
From: Julian Anastasov @ 2009-10-15  8:47 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev, eric.dumazet


	Hello,

On Thu, 15 Oct 2009, Willy Tarreau wrote:

> BTW, I found a use case I didn't think about where current behaviour
> causes trouble :
> 
>    https://bugs.launchpad.net/ubuntu/+source/apache2/+bug/134274
>    http://lkml.indiana.edu/hypermail/linux/kernel/0711.0/0461.html
> 
> In summary, when front proxies establish pools of connections to
> an apache server making use of TCP_DEFER_ACCEPT, the connection
> never establishes on the apache server but silently expires in
> SYN_RECV state. The front proxy sees lots of SYN/ACKs and sends
> many ACKs trying to complete this connection and finally believes
> it got it since the server eventually becomes silent. However,
> when trying to send data over such a socket, the server immediately
> returns an RST.

	Such proxies using open connections for insane long
time should be prepared to retry idempotent methods such as
GET and to send POST methods to fresh connections. They can
close idle connections, say, after 5 seconds. Even if
server runs with TCP_DEFER_ACCEPT=OFF there is possibility
server to send FIN while request is flying (servers are
configured with some period to wait for first request).

> Such a problem would not happen if we would only drop the first
> X packets (X >= 1 is already fine), because the front proxy would
> establish the connection, send a second ACK in response to the
> second SYN/ACK and the connection would then really be established
> and would not have to expire early in SYN_RECV state.
> 
> If we really want to behave as it does today, well, let's not fix
> it, but obviously, I fail to see what real world use it has, except
> causing random and hard to debug issues :-/

	The reason is that in SYN_RECV state the server
saves resources. Socket and FD are created on DATA and possibly
for the short time while response is sent. If server is lucky
such resources will live miliseconds. Short responses can
be sent together with FIN. OTOH, servers running
with TCP_DEFER_ACCEPT=OFF can live with some wakeups (epoll
is fast enough) but the problem is that they have sockets
for longer time (the difference between first ACK and first DATA).

> Reading the articles below clearly make it think it was designed
> to help with HTTP connections by skipping the first expected and
> useless ACK packet before waking up the task :
> 
>    http://httpd.apache.org/docs/1.3/misc/perf-bsd44.html
>    http://articles.techrepublic.com.com/5100-10878_11-1050771.html
> 
> and people still get caught :
> 
>    http://lkml.indiana.edu/hypermail/linux/kernel/0711.0/0416.html

	They wait for minutes because they do not configure
TCP_SYNCNT. TCP_DEFER_ACCEPT works as expected if configured
properly.

> Maybe it was a bit over-engineered, in the end causing it to fail
> to satisfy the primary goal ?

	If one changes TCP_DEFER_ACCEPT to create socket it
will save wakeups but not resources. I'm wondering if the
behavior should be changed at all. For me the options are two:

a) you want to save resources: use TCP_DEFER_ACCEPT. To help
proxies use large values for TCP_SYNCNT and TCP_DEFER_ACCEPT.

b) you can live with wakeups and many sockets: do not use
TCP_DEFER_ACCEPT. Suitable for servers using short timeouts
for first request.

> Regards,
> Willy

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-15  8:47                 ` Julian Anastasov
@ 2009-10-15 12:41                   ` Willy Tarreau
  2009-10-15 22:44                     ` Julian Anastasov
  0 siblings, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2009-10-15 12:41 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev, eric.dumazet

Hello Julian,

On Thu, Oct 15, 2009 at 11:47:51AM +0300, Julian Anastasov wrote:
(...)
> 	If one changes TCP_DEFER_ACCEPT to create socket it
> will save wakeups but not resources. I'm wondering if the
> behavior should be changed at all. For me the options are two:
> 
> a) you want to save resources: use TCP_DEFER_ACCEPT. To help
> proxies use large values for TCP_SYNCNT and TCP_DEFER_ACCEPT.
> 
> b) you can live with wakeups and many sockets: do not use
> TCP_DEFER_ACCEPT. Suitable for servers using short timeouts
> for first request.

and c) you want to avoid wakeups as much as possible and you'd like
to drop just one empty ACK packet, so that as soon as you accept a
an HTTP connection, you can read the request without polling at all.

Right now I'm able to process a complete HTTP request without
registering the any FD in epoll *at all* for most requests if the first
two ACKs are close enough and the server responds quickly. This saves a
substantial amount of CPU cycles. Epoll is fast, but calling epoll_ctl()
100000 times a second still has a measurable cost. Doing an accept() on
an empty connection implies this cost. Waiting for data always saves this
cost, but causes the undesirable side effects that have been reported.
Waiting for data just a few milliseconds is enough to save this cost
99.99% of the time, just as skipping the first empty packet.

Since you're saying that updating the value is wrong when it's used as
a flag, would a patch to implement a specific option for this usage be
accepted ?  Either by passing a negative value to TCP_DEFER_ACCEPT, or
by using another flag ?

Thanks,
Willy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-15 12:41                   ` Willy Tarreau
@ 2009-10-15 22:44                     ` Julian Anastasov
  2009-10-16  3:51                       ` Eric Dumazet
  2009-10-16  5:03                       ` Willy Tarreau
  0 siblings, 2 replies; 41+ messages in thread
From: Julian Anastasov @ 2009-10-15 22:44 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev, eric.dumazet


	Hello,

On Thu, 15 Oct 2009, Willy Tarreau wrote:

> Hello Julian,
> 
> On Thu, Oct 15, 2009 at 11:47:51AM +0300, Julian Anastasov wrote:
> (...)
> > 	If one changes TCP_DEFER_ACCEPT to create socket it
> > will save wakeups but not resources. I'm wondering if the
> > behavior should be changed at all. For me the options are two:
> > 
> > a) you want to save resources: use TCP_DEFER_ACCEPT. To help
> > proxies use large values for TCP_SYNCNT and TCP_DEFER_ACCEPT.
> > 
> > b) you can live with wakeups and many sockets: do not use
> > TCP_DEFER_ACCEPT. Suitable for servers using short timeouts
> > for first request.
> 
> and c) you want to avoid wakeups as much as possible and you'd like
> to drop just one empty ACK packet, so that as soon as you accept a
> an HTTP connection, you can read the request without polling at all.
> 
> Right now I'm able to process a complete HTTP request without
> registering the any FD in epoll *at all* for most requests if the first
> two ACKs are close enough and the server responds quickly. This saves a
> substantial amount of CPU cycles. Epoll is fast, but calling epoll_ctl()
> 100000 times a second still has a measurable cost. Doing an accept() on
> an empty connection implies this cost. Waiting for data always saves this
> cost, but causes the undesirable side effects that have been reported.
> Waiting for data just a few milliseconds is enough to save this cost
> 99.99% of the time, just as skipping the first empty packet.
> 
> Since you're saying that updating the value is wrong when it's used as
> a flag, would a patch to implement a specific option for this usage be
> accepted ?  Either by passing a negative value to TCP_DEFER_ACCEPT, or
> by using another flag ?

	Sorry for the long mail ...

	I was not clear enough in previous email. Your goal
is to decrease period per client while the actually decreased
threshold is on the listener's socket. 256 conns will be enough
to completely disable TCP_DEFER_ACCEPT on the listener (u8). I'm not
sure that you tested what happens after Nth client (where
N matches your TCP_DEFER_ACCEPT value as retransmissions), do you
still see accept deferring for next clients? Now if your patch
is applied the deferring will be disabled soon after server start.

	The open requests are just 64 bytes for me:
cat /proc/slabinfo | grep request_sock_TCP
and there is no rskq_defer_accept flag in struct tcp_request_sock,
struct inet_request_sock or struct request_sock. It is
present only for normal sockets. These open connection
requests have only 'retrans' and 'acked' flag. Please,
check again what your patch does and test it with some simple
client that sleeps after connect().

	As for new flags, may be we should not change
TCP_DEFER_ACCEPT values because current applications can
depend on it. There is some free space in
struct request_sock_queue just after u8 rskq_defer_accept.
May be new flags/modes can go there to define another
behavior but it means also changes in applications to support
it.

	Because using TCP_DEFER_ACCEPT as flag is not documented
one solution can be the following change, may be it matches your
idea but implemented correctly:

        /* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
-       if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+       if (req->retrans < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
            TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
                inet_rsk(req)->acked = 1;
                return NULL;
        }

	with the meaning "When TCP_DEFER_ACCEPT retransmission
period is shorter than SYN-ACK retrans period (eg. TCP_SYNCNT
or sysctl_tcp_synack_retries) move the open request as
established on client's packet after the TCP_DEFER_ACCEPT
period has expired". Then if you set TCP_SYNCNT=5 and
TCP_DEFER_ACCEPT=1retrans first ACK will set inet_rsk(req)->acked=1
because req->retrans is 0 and rskq_defer_accept is 1,
later timer will send one SYN-ACK (which marks the end
of TCP_DEFER_ACCEPT period), then client will send DATA or it
will be forced by our SYN-ACK retransmission to send 2nd
ACK packet for which we will create established socket.

	Such change will affect all servers that use
TCP_DEFER_ACCEPT retransmissions less than TCP_SYNCNT. They
will start to see wakeups without data after the TCP_DEFER_ACCEPT
period.

	To summarize:

SECOND	CLIENT			SERVER
---------------------------------------------------------
0	SYN			SYN-ACK
	if DATA => ESTABLISH
	if ACK => acked=1
3				SYN-ACK (set retrans=1)
	if ACK and TCP_DEFER_ACCEPT=1retrans => ESTABLISH
	if DATA => ESTABLISH
	if ACK => acked=1
9				if TCP_SYNCNT=1 => expire
				else SYN-ACK (set retrans=2)
	if ACK and TCP_DEFER_ACCEPT=2retrans => ESTABLISH
	if DATA => ESTABLISH
	if ACK => acked=1
...

PRO:

- if TCP_DEFER_ACCEPT<TCP_SYNCNT and client properly resends
ACK on every SYN-ACK retransmission then we always will
switch to established state on TCP_DEFER_ACCEPT expiration.
Such conns will never expire in SYN_RECV state. They
will be terminated by client's FIN or will be accepted
by server application and terminated properly. Of course,
there is some chance if client delays its ACKs or if SYN-ACK
is lost the open request to expire in SYN_RECV state.

CON:

- if client refuses to send DATA we still need these SYN-ACKs
to trigger ACK retransmissions from client because the only
way to switch to established state is when packet is received,
I don't know how TCP_DEFER_ACCEPT expiration can directly change
the open request to established state.

	May be it is possible to send first SYN-ACK and
if one ACK is received to send more SYN-ACKs after
TCP_DEFER_ACCEPT period expires. Then client still has chance
to send ACK or DATA that will switch open request to established
socket. So, our timer will be silent when acked=1 while
TCP_DEFER_ACCEPT period is active, for example:

SYN
	SYN-ACK
ACK
	...
	acked=1 => no SYN-ACKs retrans (assume they are sent and lost)
	...
	TCP_DEFER_ACCEPT expires => send 2nd SYN-ACK
	... If no client's ACK then resend SYN-ACK while retrans<TCP_SYNCNT
	...
ACK or DATA => ESTABLISHED

	This will need little change in inet_csk_reqsk_queue_prune()
but it saves SYN-ACK traffic during deferring period in the
common case when client sends ACK. If such compromise is
acceptable I can prepare and test some patch.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-15 22:44                     ` Julian Anastasov
@ 2009-10-16  3:51                       ` Eric Dumazet
  2009-10-16  5:00                         ` Eric Dumazet
  2009-10-16  5:03                       ` Willy Tarreau
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2009-10-16  3:51 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Willy Tarreau, David Miller, netdev

Julian Anastasov a écrit :
> 	Sorry for the long mail ...
> 
> 	I was not clear enough in previous email. Your goal
> is to decrease period per client while the actually decreased
> threshold is on the listener's socket. 256 conns will be enough
> to completely disable TCP_DEFER_ACCEPT on the listener (u8). I'm not
> sure that you tested what happens after Nth client (where
> N matches your TCP_DEFER_ACCEPT value as retransmissions), do you
> still see accept deferring for next clients? Now if your patch
> is applied the deferring will be disabled soon after server start.


So, it appears defer_accept value is not an inherited attribute,
but shared by all embryons. Therefore we should not touch it.

No need to type so much text Julian.

Of course it should be done, or add a new connection field to count number
of pure ACKS received on each SYN_RECV embryon.

Do you volunter for this patch ?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  3:51                       ` Eric Dumazet
@ 2009-10-16  5:00                         ` Eric Dumazet
  2009-10-16  5:29                           ` Willy Tarreau
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2009-10-16  5:00 UTC (permalink / raw)
  Cc: Julian Anastasov, Willy Tarreau, David Miller, netdev

Eric Dumazet a écrit :
> 
> 
> So, it appears defer_accept value is not an inherited attribute,
> but shared by all embryons. Therefore we should not touch it.
> 
> Of course it should be done, or add a new connection field to count number
> of pure ACKS received on each SYN_RECV embryon.
> 

Could be something like this ? (on top of net-next-2.6 of course)

7 bits is more than enough, we could take 5 bits IMHO.


Thanks

[PATCH net-next-2.6] tcp: TCP_DEFER_ACCEPT fix

Julian Anastasov pointed out defer_accept was an attribute of listening socket,
shared by all embryons. We therefore need a new per connection attribute.

We can split current u8 used by cookie_ts into 7 bits used to store
number of pure ACKS received by the embryon, and 1 bit to store cookie_ts.

Note, I use an unsigned int field so that kmemcheck doesnt shoot,
so patch also touches cookie_v4_init_sequence()/cookie_v6_init_sequence()
implementations.

Reported-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/request_sock.h |    9 ++++++---
 include/net/tcp.h          |    6 ++++--
 net/ipv4/syncookies.c      |    7 ++++---
 net/ipv4/tcp_ipv4.c        |    2 +-
 net/ipv4/tcp_minisocks.c   |    4 ++--
 net/ipv6/syncookies.c      |    6 +++---
 net/ipv6/tcp_ipv6.c        |    2 +-
 7 files changed, 21 insertions(+), 15 deletions(-)

diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index c719084..3464d90 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -45,9 +45,12 @@ struct request_sock_ops {
  */
 struct request_sock {
 	struct request_sock		*dl_next; /* Must be first member! */
-	u16				mss;
-	u8				retrans;
-	u8				cookie_ts; /* syncookie: encode tcpopts in timestamp */
+	kmemcheck_bitfield_begin(flags);
+	unsigned int			mss : 16,
+					retrans : 8,
+					req_acks : 7,
+					cookie_ts : 1; /* syncookie: encode tcpopts in timestamp */
+	kmemcheck_bitfield_end(flags);
 	/* The following two fields can be easily recomputed I think -AK */
 	u32				window_clamp; /* window clamp at creation time */
 	u32				rcv_wnd;	  /* rcv_wnd offered first time */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..a2c0439 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -453,7 +453,7 @@ extern __u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS];
 extern struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb, 
 				    struct ip_options *opt);
 extern __u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, 
-				     __u16 *mss);
+				     struct request_sock *req);
 
 extern __u32 cookie_init_timestamp(struct request_sock *req);
 extern void cookie_check_timestamp(struct tcp_options_received *tcp_opt);
@@ -461,7 +461,7 @@ extern void cookie_check_timestamp(struct tcp_options_received *tcp_opt);
 /* From net/ipv6/syncookies.c */
 extern struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb);
 extern __u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb,
-				     __u16 *mss);
+				     struct request_sock *req);
 
 /* tcp_output.c */
 
@@ -1000,7 +1000,9 @@ static inline void tcp_openreq_init(struct request_sock *req,
 	struct inet_request_sock *ireq = inet_rsk(req);
 
 	req->rcv_wnd = 0;		/* So that tcp_send_synack() knows! */
+	kmemcheck_annotate_bitfield(req, flags);
 	req->cookie_ts = 0;
+	req->req_acks = 0;
 	tcp_rsk(req)->rcv_isn = TCP_SKB_CB(skb)->seq;
 	req->mss = rx_opt->mss_clamp;
 	req->ts_recent = rx_opt->saw_tstamp ? rx_opt->rcv_tsval : 0;
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 5ec678a..8af9261 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -160,19 +160,20 @@ static __u16 const msstab[] = {
  * Generate a syncookie.  mssp points to the mss, which is returned
  * rounded down to the value encoded in the cookie.
  */
-__u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
+__u32 cookie_v4_init_sequence(struct sock *sk, struct sk_buff *skb,
+			      struct request_sock *req)
 {
 	const struct iphdr *iph = ip_hdr(skb);
 	const struct tcphdr *th = tcp_hdr(skb);
 	int mssind;
-	const __u16 mss = *mssp;
+	const __u16 mss = req->mss;
 
 	tcp_synq_overflow(sk);
 
 	/* XXX sort msstab[] by probability?  Binary search? */
 	for (mssind = 0; mss > msstab[mssind + 1]; mssind++)
 		;
-	*mssp = msstab[mssind] + 1;
+	req->mss = msstab[mssind] + 1;
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 9971870..3a2dfc5 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1286,7 +1286,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		syn_flood_warning(skb);
 		req->cookie_ts = tmp_opt.tstamp_ok;
 #endif
-		isn = cookie_v4_init_sequence(sk, skb, &req->mss);
+		isn = cookie_v4_init_sequence(sk, skb, req);
 	} else if (!isn) {
 		struct inet_peer *peer = NULL;
 
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index e320afe..92778e6 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -642,9 +642,9 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 		return NULL;
 
 	/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
-	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept > req->req_acks &&
 	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
-		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
+		req->req_acks++;
 		inet_rsk(req)->acked = 1;
 		return NULL;
 	}
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index cbe55e5..60d024d 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -125,18 +125,18 @@ static __u32 check_tcp_syn_cookie(__u32 cookie, struct in6_addr *saddr,
 		& COOKIEMASK;
 }
 
-__u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb, __u16 *mssp)
+__u32 cookie_v6_init_sequence(struct sock *sk, struct sk_buff *skb, struct request_sock *req)
 {
 	struct ipv6hdr *iph = ipv6_hdr(skb);
 	const struct tcphdr *th = tcp_hdr(skb);
 	int mssind;
-	const __u16 mss = *mssp;
+	const __u16 mss = req->mss;
 
 	tcp_synq_overflow(sk);
 
 	for (mssind = 0; mss > msstab[mssind + 1]; mssind++)
 		;
-	*mssp = msstab[mssind] + 1;
+	req->mss = msstab[mssind] + 1;
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESSENT);
 
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 4517630..2717bcb 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1219,7 +1219,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 		TCP_ECN_create_request(req, tcp_hdr(skb));
 
 	if (want_cookie) {
-		isn = cookie_v6_init_sequence(sk, skb, &req->mss);
+		isn = cookie_v6_init_sequence(sk, skb, req);
 		req->cookie_ts = tmp_opt.tstamp_ok;
 	} else if (!isn) {
 		if (ipv6_opt_accepted(sk, skb) ||


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  5:00                         ` Eric Dumazet
@ 2009-10-16  5:29                           ` Willy Tarreau
  2009-10-16  6:05                             ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2009-10-16  5:29 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, David Miller, netdev

On Fri, Oct 16, 2009 at 07:00:49AM +0200, Eric Dumazet wrote:
> Eric Dumazet a écrit :
> > 
> > 
> > So, it appears defer_accept value is not an inherited attribute,
> > but shared by all embryons. Therefore we should not touch it.
> > 
> > Of course it should be done, or add a new connection field to count number
> > of pure ACKS received on each SYN_RECV embryon.
> > 
> 
> Could be something like this ? (on top of net-next-2.6 of course)
> 
> 7 bits is more than enough, we could take 5 bits IMHO.

Couldn't we just rely on the retrans vs rskq_defer_accept comparison ?

Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  5:29                           ` Willy Tarreau
@ 2009-10-16  6:05                             ` Eric Dumazet
  2009-10-16  6:18                               ` Willy Tarreau
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2009-10-16  6:05 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Julian Anastasov, David Miller, netdev

Willy Tarreau a écrit :
> On Fri, Oct 16, 2009 at 07:00:49AM +0200, Eric Dumazet wrote:
>> Eric Dumazet a écrit :
>>>
>>> So, it appears defer_accept value is not an inherited attribute,
>>> but shared by all embryons. Therefore we should not touch it.
>>>
>>> Of course it should be done, or add a new connection field to count number
>>> of pure ACKS received on each SYN_RECV embryon.
>>>
>> Could be something like this ? (on top of net-next-2.6 of course)
>>
>> 7 bits is more than enough, we could take 5 bits IMHO.
> 
> Couldn't we just rely on the retrans vs rskq_defer_accept comparison ?
> 

In this case, we lose TCP_DEFER_ACCEPT advantage in case one SYN-ACK was dropped
by the network : We wakeup the listening server when first ACK comes from client,
instead of really wait the request.

I think being able to count pure-acks would be slighly better, and cost nothing.


retrans is the number of SYN-RECV (re)sent, while req_acks would count number of
pure ACK received.

Those numbers, in an ideal world should be related, but could differ in real world ?


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  6:05                             ` Eric Dumazet
@ 2009-10-16  6:18                               ` Willy Tarreau
  2009-10-16  7:08                                 ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2009-10-16  6:18 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, David Miller, netdev

On Fri, Oct 16, 2009 at 08:05:19AM +0200, Eric Dumazet wrote:
> > Couldn't we just rely on the retrans vs rskq_defer_accept comparison ?
> > 
> 
> In this case, we lose TCP_DEFER_ACCEPT advantage in case one SYN-ACK was dropped
> by the network : We wakeup the listening server when first ACK comes from client,
> instead of really wait the request.
> 
> I think being able to count pure-acks would be slighly better, and cost nothing.
> 
> 
> retrans is the number of SYN-RECV (re)sent, while req_acks would count number of
> pure ACK received.
> 
> Those numbers, in an ideal world should be related, but could differ in real world ?

Yes it could differ if a pure ACK is lost between the client and the server,
but in my opinion what is important is not to precisely account the number
of ACKs to ensure we wake up exactly after XXX ACKs received, but that in
most common situations we avoid to wake up too early.

Also, keep in mind that the TCP_DEFER_ACCEPT parameter is passed in number
of seconds by the application, which are in turn converted to a number of
retransmits based on our own timer, which means that our SYN-ACK counter
is what most closely matches the application's expected delay, even if an
ACK from the client gets lost in between or if a client's stack retransmits
pure ACKs very fast for any implementation-specific reason.

Regards,
Willy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  6:18                               ` Willy Tarreau
@ 2009-10-16  7:08                                 ` Eric Dumazet
  2009-10-16  7:19                                   ` Willy Tarreau
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2009-10-16  7:08 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Julian Anastasov, David Miller, netdev

Willy Tarreau a écrit :
> On Fri, Oct 16, 2009 at 08:05:19AM +0200, Eric Dumazet wrote:
>>> Couldn't we just rely on the retrans vs rskq_defer_accept comparison ?
>>>
>> In this case, we lose TCP_DEFER_ACCEPT advantage in case one SYN-ACK was dropped
>> by the network : We wakeup the listening server when first ACK comes from client,
>> instead of really wait the request.
>>
>> I think being able to count pure-acks would be slighly better, and cost nothing.
>>
>>
>> retrans is the number of SYN-RECV (re)sent, while req_acks would count number of
>> pure ACK received.
>>
>> Those numbers, in an ideal world should be related, but could differ in real world ?
> 
> Yes it could differ if a pure ACK is lost between the client and the server,
> but in my opinion what is important is not to precisely account the number
> of ACKs to ensure we wake up exactly after XXX ACKs received, but that in
> most common situations we avoid to wake up too early.
> 

We basically same thing, but you misundertood me. I was concerning about
one lost (server -> client SYN-ACK), not a lost (client -> server ACK) which is fine
(even without playing with TCP_DEFER_ACCEPT at all)

In this case, if we do the retrans test, we'll accept the first (client -> server ACK)
and wakeup the application, while most probably we'll receive the client request
 few milli second later.

> Also, keep in mind that the TCP_DEFER_ACCEPT parameter is passed in number
> of seconds by the application, which are in turn converted to a number of
> retransmits based on our own timer, which means that our SYN-ACK counter
> is what most closely matches the application's expected delay, even if an
> ACK from the client gets lost in between or if a client's stack retransmits
> pure ACKs very fast for any implementation-specific reason.
> 

Well, this is why converting application delay (sockopt() argument) in second units
to a number of SYN-ACK counter is subobptimal and error prone.

This might be changed to be mapped to what documentation states : a number of seconds,
or even better a number of milli seconds (new TCP_DEFER_ACCEPT_MS setsockopt cmd),
because a high performance server wont play with > 1 sec values anyway.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  7:08                                 ` Eric Dumazet
@ 2009-10-16  7:19                                   ` Willy Tarreau
  0 siblings, 0 replies; 41+ messages in thread
From: Willy Tarreau @ 2009-10-16  7:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, David Miller, netdev

On Fri, Oct 16, 2009 at 09:08:59AM +0200, Eric Dumazet wrote:
(...)
> > Yes it could differ if a pure ACK is lost between the client and the server,
> > but in my opinion what is important is not to precisely account the number
> > of ACKs to ensure we wake up exactly after XXX ACKs received, but that in
> > most common situations we avoid to wake up too early.
> > 
> 
> We basically same thing, but you misundertood me. I was concerning about
> one lost (server -> client SYN-ACK), not a lost (client -> server ACK) which is fine
> (even without playing with TCP_DEFER_ACCEPT at all)
> 
> In this case, if we do the retrans test, we'll accept the first (client -> server ACK)
> and wakeup the application, while most probably we'll receive the client request
>  few milli second later.

OK I get your point. We can detect that though, as Julian explained it, with
the ->acked field. It indicates we got an ACK, which proves the SYN-ACK was
received. At first glance, I think that Julian's algorithm explained at the
end of his mail exactly covers all cases without using any additional field,
though this is not an issue anyway.

> > Also, keep in mind that the TCP_DEFER_ACCEPT parameter is passed in number
> > of seconds by the application, which are in turn converted to a number of
> > retransmits based on our own timer, which means that our SYN-ACK counter
> > is what most closely matches the application's expected delay, even if an
> > ACK from the client gets lost in between or if a client's stack retransmits
> > pure ACKs very fast for any implementation-specific reason.
> > 
> 
> Well, this is why converting application delay (sockopt() argument) in second units
> to a number of SYN-ACK counter is subobptimal and error prone.

I agree, but it allows the application to be unware of retransmit timers.

> This might be changed to be mapped to what documentation states : a number of seconds,
> or even better a number of milli seconds (new TCP_DEFER_ACCEPT_MS setsockopt cmd),
> because a high performance server wont play with > 1 sec values anyway.

It would be nice but it would require a new timer. Current implementation
does not need any and is efficient enough for most common cases. In fact it
would have been better to simply be able to specify that we want to skip one
empty ACK (or X empty ACKs). But let's make use of what we currently have,
with your (or Julian's) changes, it should cover almost all usages without
changing semantics for applications.

Regards,
Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-15 22:44                     ` Julian Anastasov
  2009-10-16  3:51                       ` Eric Dumazet
@ 2009-10-16  5:03                       ` Willy Tarreau
  2009-10-16  8:49                         ` Julian Anastasov
  1 sibling, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2009-10-16  5:03 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: David Miller, netdev, eric.dumazet

Hello Julian,

On Fri, Oct 16, 2009 at 01:44:34AM +0300, Julian Anastasov wrote:
(...)
> 	I was not clear enough in previous email. Your goal
> is to decrease period per client while the actually decreased
> threshold is on the listener's socket. 256 conns will be enough
> to completely disable TCP_DEFER_ACCEPT on the listener (u8).

Damn! Now that you're explaining this, it seems obvious and makes
so much sense ! Of course you're right. We should not touch the
listening socket, only the new socket being created ! Thanks for
this clarification. David, Julian is right, please drop my patch.

(...)
> 	As for new flags, may be we should not change
> TCP_DEFER_ACCEPT values because current applications can
> depend on it.

I have no problem with that, eventhough I'm really doubting
that many applications depend on its current behaviour.

> There is some free space in
> struct request_sock_queue just after u8 rskq_defer_accept.
> May be new flags/modes can go there to define another
> behavior but it means also changes in applications to support
> it.

One idea I had was to make it signed, because there is currently
no use for negative values. But let's see your proposal below.

>         /* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
> -       if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
> +       if (req->retrans < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
>             TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
>                 inet_rsk(req)->acked = 1;
>                 return NULL;
>         }

Yes, of course that looks a lot better !

(...)
> 	Such change will affect all servers that use
> TCP_DEFER_ACCEPT retransmissions less than TCP_SYNCNT. They
> will start to see wakeups without data after the TCP_DEFER_ACCEPT
> period.

Indeed. But anyway a server must always be prepared to see wakeup
without data, because there are some situations where it will still
happen. For instance, if hardware RX cksum is not possible, a data
packet will cause a wakeup and during the recv(), the copy_and_cksum
will detect a cksum error and will not send anything to userland,
so the application will get an empty read.

> 	To summarize:
> 
> SECOND	CLIENT			SERVER
> ---------------------------------------------------------
> 0	SYN			SYN-ACK
> 	if DATA => ESTABLISH
> 	if ACK => acked=1
> 3				SYN-ACK (set retrans=1)
> 	if ACK and TCP_DEFER_ACCEPT=1retrans => ESTABLISH
> 	if DATA => ESTABLISH
> 	if ACK => acked=1
> 9				if TCP_SYNCNT=1 => expire
> 				else SYN-ACK (set retrans=2)
> 	if ACK and TCP_DEFER_ACCEPT=2retrans => ESTABLISH
> 	if DATA => ESTABLISH
> 	if ACK => acked=1
> ...
> 
> PRO:
> 
> - if TCP_DEFER_ACCEPT<TCP_SYNCNT and client properly resends
> ACK on every SYN-ACK retransmission then we always will
> switch to established state on TCP_DEFER_ACCEPT expiration.

That's what I wanted to achieve :-)

> Such conns will never expire in SYN_RECV state. They
> will be terminated by client's FIN or will be accepted
> by server application and terminated properly. Of course,
> there is some chance if client delays its ACKs or if SYN-ACK
> is lost the open request to expire in SYN_RECV state.

Yes, but that must be covered by both ends stacks. Network
losses are normal and such events already happen in minor
quantities everyday.

> CON:
> 
> - if client refuses to send DATA we still need these SYN-ACKs
> to trigger ACK retransmissions from client because the only
> way to switch to established state is when packet is received,
> I don't know how TCP_DEFER_ACCEPT expiration can directly change
> the open request to established state.

Well anyway this is already better than the current situation where
an apparently established connection silently dies. With this
proposal, applications have a way to get a normal behaviour.

> 	May be it is possible to send first SYN-ACK and
> if one ACK is received to send more SYN-ACKs after
> TCP_DEFER_ACCEPT period expires. Then client still has chance
> to send ACK or DATA that will switch open request to established
> socket. So, our timer will be silent when acked=1 while
> TCP_DEFER_ACCEPT period is active, for example:
> 
> SYN
> 	SYN-ACK
> ACK
> 	...
> 	acked=1 => no SYN-ACKs retrans (assume they are sent and lost)
>
> 	...
> 	TCP_DEFER_ACCEPT expires => send 2nd SYN-ACK
> 	... If no client's ACK then resend SYN-ACK while retrans<TCP_SYNCNT
> 	...
> ACK or DATA => ESTABLISHED

On first reading I found it a bit dangerous because the client will assume
an established connection when not seeing any new SYN-ACKs. And if it has
no data to send, it will never retransmit its ACK. But after some thinking,
it still matches the purpose of TCP_DEFER_ACCEPT, without the extra
SYN-ACK / ACK dance that we currently observe. I was also worried about
middle firewalls, but they will have no problem because they'd see an
established connection (SYN,SYN-ACK,ACK), so their expiration timer will
be large enough to cover the lack of SYN-ACK during TCP_DEFER_ACCEPT.

> 	This will need little change in inet_csk_reqsk_queue_prune()
> but it saves SYN-ACK traffic during deferring period in the
> common case when client sends ACK. If such compromise is
> acceptable I can prepare and test some patch.

I would personally like this a lot ! This will satisfy people who
expect it to establish at the end of the "TCP_DEFER_ACCEPT delay"
as can be interpreted from the man page, will reduce the number of
useless SYN-ACKs that annoy other people while still making no
visible change for anyone who would rely on the current behaviour.

Regards,
Willy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  5:03                       ` Willy Tarreau
@ 2009-10-16  8:49                         ` Julian Anastasov
  2009-10-16 10:40                           ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Julian Anastasov @ 2009-10-16  8:49 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: David Miller, netdev, eric.dumazet


	Hello,

On Fri, 16 Oct 2009, Willy Tarreau wrote:

> > 	This will need little change in inet_csk_reqsk_queue_prune()
> > but it saves SYN-ACK traffic during deferring period in the
> > common case when client sends ACK. If such compromise is
> > acceptable I can prepare and test some patch.
> 
> I would personally like this a lot ! This will satisfy people who
> expect it to establish at the end of the "TCP_DEFER_ACCEPT delay"
> as can be interpreted from the man page, will reduce the number of
> useless SYN-ACKs that annoy other people while still making no
> visible change for anyone who would rely on the current behaviour.

	OK, I don't have much time now, this is what I'm
going to test later today and later can provide proper comments:

Signed-off-by: Julian Anastasov <ja@ssi.bg>

Patch 1: Accept connections after deferring period:

diff -urp v2.6.31/linux/net/ipv4/tcp_minisocks.c linux/net/ipv4/tcp_minisocks.c
--- v2.6.31/linux/net/ipv4/tcp_minisocks.c	2009-09-11 10:27:17.000000000 +0300
+++ linux/net/ipv4/tcp_minisocks.c	2009-10-16 10:29:19.000000000 +0300
@@ -641,8 +641,8 @@ struct sock *tcp_check_req(struct sock *
 	if (!(flg & TCP_FLAG_ACK))
 		return NULL;
 
-	/* If TCP_DEFER_ACCEPT is set, drop bare ACK. */
-	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
+	/* While TCP_DEFER_ACCEPT is active, drop bare ACK. */
+	if (req->retrans < inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
 	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
 		inet_rsk(req)->acked = 1;
 		return NULL;

Patch 2: Do not resend SYN-ACK while waiting for data after ACK.
Also, do not drop acked request if sending of SYN-ACK fails.

diff -urp v2.6.31/linux/net/ipv4/inet_connection_sock.c linux/net/ipv4/inet_connection_sock.c
--- v2.6.31/linux/net/ipv4/inet_connection_sock.c	2009-06-13 10:53:58.000000000 +0300
+++ linux/net/ipv4/inet_connection_sock.c	2009-10-16 11:35:52.000000000 +0300
@@ -446,6 +446,28 @@ extern int sysctl_tcp_synack_retries;
 
 EXPORT_SYMBOL_GPL(inet_csk_reqsk_queue_hash_add);
 
+/* Decide when to expire the request and when to resend SYN-ACK */
+static inline void syn_ack_recalc(struct request_sock *req, const int thresh,
+				  const int max_retries,
+				  const u8 rskq_defer_accept,
+				  int *expire, int *resend)
+{
+	if (!rskq_defer_accept) {
+		*expire = req->retrans >= thresh;
+		*resend = 1;
+		return;
+	}
+	*expire = req->retrans >= thresh &&
+		  (!inet_rsk(req)->acked || req->retrans >= max_retries);
+	/*
+	 * Do not resend while waiting for data after ACK,
+	 * start to resend on end of deferring period to give
+	 * last chance for data or ACK to create established socket.
+	 */
+	*resend = !inet_rsk(req)->acked ||
+		  req->retrans >= rskq_defer_accept - 1;
+}
+
 void inet_csk_reqsk_queue_prune(struct sock *parent,
 				const unsigned long interval,
 				const unsigned long timeout,
@@ -501,9 +523,15 @@ void inet_csk_reqsk_queue_prune(struct s
 		reqp=&lopt->syn_table[i];
 		while ((req = *reqp) != NULL) {
 			if (time_after_eq(now, req->expires)) {
-				if ((req->retrans < thresh ||
-				     (inet_rsk(req)->acked && req->retrans < max_retries))
-				    && !req->rsk_ops->rtx_syn_ack(parent, req)) {
+				int expire = 0, resend = 0;
+
+				syn_ack_recalc(req, thresh, max_retries,
+					       queue->rskq_defer_accept,
+					       &expire, &resend);
+				if (!expire &&
+				    (!resend ||
+				     !req->rsk_ops->rtx_syn_ack(parent, req) ||
+				     inet_rsk(req)->acked)) {
 					unsigned long timeo;
 
 					if (req->retrans++ == 0)

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16  8:49                         ` Julian Anastasov
@ 2009-10-16 10:40                           ` Eric Dumazet
  2009-10-16 19:27                             ` Willy Tarreau
  2009-10-17 11:48                             ` Julian Anastasov
  0 siblings, 2 replies; 41+ messages in thread
From: Eric Dumazet @ 2009-10-16 10:40 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Willy Tarreau, David Miller, netdev

Julian Anastasov a écrit :
> 	Hello,
> 
> On Fri, 16 Oct 2009, Willy Tarreau wrote:
> 
>>> 	This will need little change in inet_csk_reqsk_queue_prune()
>>> but it saves SYN-ACK traffic during deferring period in the
>>> common case when client sends ACK. If such compromise is
>>> acceptable I can prepare and test some patch.
>> I would personally like this a lot ! This will satisfy people who
>> expect it to establish at the end of the "TCP_DEFER_ACCEPT delay"
>> as can be interpreted from the man page, will reduce the number of
>> useless SYN-ACKs that annoy other people while still making no
>> visible change for anyone who would rely on the current behaviour.
> 
> 	OK, I don't have much time now, this is what I'm
> going to test later today and later can provide proper comments:
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>

I tested both patches and they perform very well, thank you !

For the minimum 1 sec value, tcpdump looks like :
12:32:03.850456 IP 127.0.0.1.20000 > 127.0.0.1.2222: S 1879889239:1879889239(0) win 32792 <mss 16396,nop,nop,timestamp 952803 0,nop,wscale 6>
12:32:03.850463 IP 127.0.0.1.2222 > 127.0.0.1.20000: S 1890330616:1890330616(0) ack 1879889240 win 32768 <mss 16396,nop,nop,timestamp 952803 952803,nop,wscale 6>
12:32:03.850469 IP 127.0.0.1.20000 > 127.0.0.1.2222: . ack 1 win 513 <nop,nop,timestamp 952803 952803>

12:32:06.849989 IP 127.0.0.1.2222 > 127.0.0.1.20000: S 1890330616:1890330616(0) ack 1879889240 win 32768 <mss 16396,nop,nop,timestamp 955803 952803,nop,wscale 6>
12:32:06.849996 IP 127.0.0.1.20000 > 127.0.0.1.2222: . ack 1 win 513 <nop,nop,timestamp 955803 955803>

So listening application gets the accept() 3 seconds after initial SYN


# ss -emoian | grep SYN-RECV
SYN-RECV   0      0                 127.0.0.1:2222             127.0.0.1:20000  timer:(on,24sec,4) ino:0 sk:f6f0ec80

I wonder if tcp_diag should be extented a bit to reflect fact that the ACK was received from client
(ie forward the inet_rsk(req)->acked information to idiag_rqueue)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index cb73fde..c172bd4 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -589,7 +589,7 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
 	r->id.idiag_src[0] = ireq->loc_addr;
 	r->id.idiag_dst[0] = ireq->rmt_addr;
 	r->idiag_expires = jiffies_to_msecs(tmo);
- 	r->idiag_rqueue = 0;
+	r->idiag_rqueue = ireq->acked;
 	r->idiag_wqueue = 0;
 	r->idiag_uid = sock_i_uid(sk);
 	r->idiag_inode = 0;



-> 
# ss -emoian | grep SYN-RECV
SYN-RECV   1      0                 127.0.0.1:2222             127.0.0.1:20000  timer:(on,24sec,4) ino:0 sk:f6f0ec80


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16 10:40                           ` Eric Dumazet
@ 2009-10-16 19:27                             ` Willy Tarreau
  2009-10-17 11:48                             ` Julian Anastasov
  1 sibling, 0 replies; 41+ messages in thread
From: Willy Tarreau @ 2009-10-16 19:27 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, David Miller, netdev

Hi,

On Fri, Oct 16, 2009 at 12:40:11PM +0200, Eric Dumazet wrote:
> Julian Anastasov a écrit :
> > 	Hello,
> > 
> > On Fri, 16 Oct 2009, Willy Tarreau wrote:
> > 
> >>> 	This will need little change in inet_csk_reqsk_queue_prune()
> >>> but it saves SYN-ACK traffic during deferring period in the
> >>> common case when client sends ACK. If such compromise is
> >>> acceptable I can prepare and test some patch.
> >> I would personally like this a lot ! This will satisfy people who
> >> expect it to establish at the end of the "TCP_DEFER_ACCEPT delay"
> >> as can be interpreted from the man page, will reduce the number of
> >> useless SYN-ACKs that annoy other people while still making no
> >> visible change for anyone who would rely on the current behaviour.
> > 
> > 	OK, I don't have much time now, this is what I'm
> > going to test later today and later can provide proper comments:
> > 
> > Signed-off-by: Julian Anastasov <ja@ssi.bg>
> 
> I tested both patches and they perform very well, thank you !
> 
> For the minimum 1 sec value, tcpdump looks like :
> 12:32:03.850456 IP 127.0.0.1.20000 > 127.0.0.1.2222: S 1879889239:1879889239(0) win 32792 <mss 16396,nop,nop,timestamp 952803 0,nop,wscale 6>
> 12:32:03.850463 IP 127.0.0.1.2222 > 127.0.0.1.20000: S 1890330616:1890330616(0) ack 1879889240 win 32768 <mss 16396,nop,nop,timestamp 952803 952803,nop,wscale 6>
> 12:32:03.850469 IP 127.0.0.1.20000 > 127.0.0.1.2222: . ack 1 win 513 <nop,nop,timestamp 952803 952803>
> 
> 12:32:06.849989 IP 127.0.0.1.2222 > 127.0.0.1.20000: S 1890330616:1890330616(0) ack 1879889240 win 32768 <mss 16396,nop,nop,timestamp 955803 952803,nop,wscale 6>
> 12:32:06.849996 IP 127.0.0.1.20000 > 127.0.0.1.2222: . ack 1 win 513 <nop,nop,timestamp 955803 955803>
> 
> So listening application gets the accept() 3 seconds after initial SYN

Excellent! Nice work guys.

> # ss -emoian | grep SYN-RECV
> SYN-RECV   0      0                 127.0.0.1:2222             127.0.0.1:20000  timer:(on,24sec,4) ino:0 sk:f6f0ec80
> 
> I wonder if tcp_diag should be extented a bit to reflect fact that the ACK was received from client
> (ie forward the inet_rsk(req)->acked information to idiag_rqueue)

I personally have no opinion on this point.

Thanks!
Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-16 10:40                           ` Eric Dumazet
  2009-10-16 19:27                             ` Willy Tarreau
@ 2009-10-17 11:48                             ` Julian Anastasov
  2009-10-17 12:07                               ` Eric Dumazet
  1 sibling, 1 reply; 41+ messages in thread
From: Julian Anastasov @ 2009-10-17 11:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev


	Hello,

On Fri, 16 Oct 2009, Eric Dumazet wrote:

> I wonder if tcp_diag should be extented a bit to reflect fact that the ACK was received from client
> (ie forward the inet_rsk(req)->acked information to idiag_rqueue)

	It is a good idea.

> diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
> index cb73fde..c172bd4 100644
> --- a/net/ipv4/inet_diag.c
> +++ b/net/ipv4/inet_diag.c
> @@ -589,7 +589,7 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
>  	r->id.idiag_src[0] = ireq->loc_addr;
>  	r->id.idiag_dst[0] = ireq->rmt_addr;
>  	r->idiag_expires = jiffies_to_msecs(tmo);
> - 	r->idiag_rqueue = 0;
> +	r->idiag_rqueue = ireq->acked;
>  	r->idiag_wqueue = 0;
>  	r->idiag_uid = sock_i_uid(sk);
>  	r->idiag_inode = 0;

	I tested both patches. It seems the current algorithm to
convert seconds to retransmissions does not match well the TCP
SYN-ACK timer and sometimes can convert the seconds to
retransmissions which are 1 above the expected. For example,
you set 9 seconds (expecting 2 retrans) but you get 3 retrans,
visible with TCP_SYNCNT=1.

	Also, it is limited to period of 32 retransmissions.

	The following patch changes the TCP_DEFER_ACCEPT
period calculation to match TCP SYN-ACK retransmissions and to
help those folks who select the seconds with TCP SYN-ACK
timing in mind. It also allows the retransmission threshold
to be up to 255.

Signed-off-by: Julian Anastasov <ja@ssi.bg>

diff -urp v2.6.31/linux/net/ipv4/tcp.c linux/net/ipv4/tcp.c
--- v2.6.31/linux/net/ipv4/tcp.c	2009-09-11 10:27:17.000000000 +0300
+++ linux/net/ipv4/tcp.c	2009-10-17 12:34:38.000000000 +0300
@@ -2165,13 +2165,20 @@ static int do_tcp_setsockopt(struct sock
 	case TCP_DEFER_ACCEPT:
 		icsk->icsk_accept_queue.rskq_defer_accept = 0;
 		if (val > 0) {
+			int timeout = TCP_TIMEOUT_INIT / HZ;
+			int period = timeout;
+
 			/* Translate value in seconds to number of
 			 * retransmits */
-			while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
-			       val > ((TCP_TIMEOUT_INIT / HZ) <<
-				       icsk->icsk_accept_queue.rskq_defer_accept))
+			icsk->icsk_accept_queue.rskq_defer_accept = 1;
+			while (icsk->icsk_accept_queue.rskq_defer_accept < 255 &&
+			       val > period) {
 				icsk->icsk_accept_queue.rskq_defer_accept++;
-			icsk->icsk_accept_queue.rskq_defer_accept++;
+				timeout <<= 1;
+				if (timeout > TCP_RTO_MAX / HZ)
+					timeout = TCP_RTO_MAX / HZ;
+				period += timeout;
+			}
 		}
 		break;
 

	FYI, the old algorithm selects the following retransmissions
for the configured seconds:

defer_accept=1 retrans for 1-3 secs
defer_accept=2 retrans for 4-6 secs
defer_accept=3 retrans for 7-12 secs
defer_accept=4 retrans for 13-24 secs
defer_accept=5 retrans for 25-48 secs
defer_accept=6 retrans for 49-96 secs
defer_accept=7 retrans for 97-192 secs
defer_accept=8 retrans for 193-384 secs

	While the new algorithm is as follows:

defer_accept=1 retrans for 1-3 secs
defer_accept=2 retrans for 4-9 secs
defer_accept=3 retrans for 10-21 secs
defer_accept=4 retrans for 22-45 secs
defer_accept=5 retrans for 46-93 secs
defer_accept=6 retrans for 94-189 secs
defer_accept=7 retrans for 190-309 secs
defer_accept=8 retrans for 310-429 secs

	Comments? Next step is to post the 3 patches separately
for final review and applying.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-17 11:48                             ` Julian Anastasov
@ 2009-10-17 12:07                               ` Eric Dumazet
  2009-10-17 14:20                                 ` Julian Anastasov
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2009-10-17 12:07 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Willy Tarreau, David Miller, netdev

Julian Anastasov a écrit :
> 
> 	I tested both patches. It seems the current algorithm to
> convert seconds to retransmissions does not match well the TCP
> SYN-ACK timer and sometimes can convert the seconds to
> retransmissions which are 1 above the expected. For example,
> you set 9 seconds (expecting 2 retrans) but you get 3 retrans,
> visible with TCP_SYNCNT=1.
> 
> 	Also, it is limited to period of 32 retransmissions.
> 
> 	The following patch changes the TCP_DEFER_ACCEPT
> period calculation to match TCP SYN-ACK retransmissions and to
> help those folks who select the seconds with TCP SYN-ACK
> timing in mind. It also allows the retransmission threshold
> to be up to 255.
> 
> Signed-off-by: Julian Anastasov <ja@ssi.bg>
> 
> diff -urp v2.6.31/linux/net/ipv4/tcp.c linux/net/ipv4/tcp.c
> --- v2.6.31/linux/net/ipv4/tcp.c	2009-09-11 10:27:17.000000000 +0300
> +++ linux/net/ipv4/tcp.c	2009-10-17 12:34:38.000000000 +0300
> @@ -2165,13 +2165,20 @@ static int do_tcp_setsockopt(struct sock
>  	case TCP_DEFER_ACCEPT:
>  		icsk->icsk_accept_queue.rskq_defer_accept = 0;
>  		if (val > 0) {
> +			int timeout = TCP_TIMEOUT_INIT / HZ;
> +			int period = timeout;
> +
>  			/* Translate value in seconds to number of
>  			 * retransmits */
> -			while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
> -			       val > ((TCP_TIMEOUT_INIT / HZ) <<
> -				       icsk->icsk_accept_queue.rskq_defer_accept))
> +			icsk->icsk_accept_queue.rskq_defer_accept = 1;
> +			while (icsk->icsk_accept_queue.rskq_defer_accept < 255 &&
> +			       val > period) {
>  				icsk->icsk_accept_queue.rskq_defer_accept++;
> -			icsk->icsk_accept_queue.rskq_defer_accept++;
> +				timeout <<= 1;
> +				if (timeout > TCP_RTO_MAX / HZ)
> +					timeout = TCP_RTO_MAX / HZ;
> +				period += timeout;
> +			}
>  		}
>  		break;
>  
> 
> 	FYI, the old algorithm selects the following retransmissions
> for the configured seconds:
> 
> defer_accept=1 retrans for 1-3 secs
> defer_accept=2 retrans for 4-6 secs
> defer_accept=3 retrans for 7-12 secs
> defer_accept=4 retrans for 13-24 secs
> defer_accept=5 retrans for 25-48 secs
> defer_accept=6 retrans for 49-96 secs
> defer_accept=7 retrans for 97-192 secs
> defer_accept=8 retrans for 193-384 secs
> 
> 	While the new algorithm is as follows:
> 
> defer_accept=1 retrans for 1-3 secs
> defer_accept=2 retrans for 4-9 secs
> defer_accept=3 retrans for 10-21 secs
> defer_accept=4 retrans for 22-45 secs
> defer_accept=5 retrans for 46-93 secs
> defer_accept=6 retrans for 94-189 secs
> defer_accept=7 retrans for 190-309 secs
> defer_accept=8 retrans for 310-429 secs
> 
> 	Comments? Next step is to post the 3 patches separately
> for final review and applying.
> 

I really like this, but please define helper functions out of do_tcp_setsockopt()

/* Translate value in seconds to number of SYN-ACK retransmits */
static u8 secs_to_retrans(int seconds)
{
	u8 res = 0;

	if (seconds > 0) {
		int timeout = TCP_TIMEOUT_INIT / HZ;
		int period = timeout;

		res = 1;
		while (res < 255 && seconds > period) {
			res++;
			timeout <<= 1;
			if (timeout > TCP_RTO_MAX / HZ)
				timeout = TCP_RTO_MAX / HZ;
			period += timeout;
		}
	}
	return res;
}

You also need the reverse function for getsockopt()...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-17 12:07                               ` Eric Dumazet
@ 2009-10-17 14:20                                 ` Julian Anastasov
  2009-10-19 20:01                                   ` Eric Dumazet
  0 siblings, 1 reply; 41+ messages in thread
From: Julian Anastasov @ 2009-10-17 14:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Willy Tarreau, David Miller, netdev


	Hello,

On Sat, 17 Oct 2009, Eric Dumazet wrote:

> I really like this, but please define helper functions out of do_tcp_setsockopt()
> 
> /* Translate value in seconds to number of SYN-ACK retransmits */
> static u8 secs_to_retrans(int seconds)
> {
> 	u8 res = 0;
> 
> 	if (seconds > 0) {
> 		int timeout = TCP_TIMEOUT_INIT / HZ;
> 		int period = timeout;
> 
> 		res = 1;
> 		while (res < 255 && seconds > period) {
> 			res++;
> 			timeout <<= 1;
> 			if (timeout > TCP_RTO_MAX / HZ)
> 				timeout = TCP_RTO_MAX / HZ;
> 			period += timeout;
> 		}
> 	}
> 	return res;
> }
> 
> You also need the reverse function for getsockopt()...

	Yes, I forgot that. Here is what I tested, should
I split it later to 2 patches? May be it should go somewhere
in net/core/sock.c as extern funcs with EXPORT_SYMBOL to
allow other protocols one day to use it?

Signed-off-by: Julian Anastasov <ja@ssi.bg>

diff -urp v2.6.31/linux/net/ipv4/tcp.c linux/net/ipv4/tcp.c
--- v2.6.31/linux/net/ipv4/tcp.c	2009-09-11 10:27:17.000000000 +0300
+++ linux/net/ipv4/tcp.c	2009-10-17 16:37:48.000000000 +0300
@@ -326,6 +326,43 @@ void tcp_enter_memory_pressure(struct so
 
 EXPORT_SYMBOL(tcp_enter_memory_pressure);
 
+/* Convert seconds to retransmits based on initial and max timeout */
+static u8 secs_to_retrans(int seconds, int timeout, int rto_max)
+{
+	u8 res = 0;
+
+	if (seconds > 0) {
+		int period = timeout;
+
+		res = 1;
+		while (seconds > period && res < 255) {
+			res++;
+			timeout <<= 1;
+			if (timeout > rto_max)
+				timeout = rto_max;
+			period += timeout;
+		}
+	}
+	return res;
+}
+
+/* Convert retransmits to seconds based on initial and max timeout */
+static int retrans_to_secs(u8 retrans, int timeout, int rto_max)
+{
+	int period = 0;
+
+	if (retrans > 0) {
+		period = timeout;
+		while (--retrans) {
+			timeout <<= 1;
+			if (timeout > rto_max)
+				timeout = rto_max;
+			period += timeout;
+		}
+	}
+	return period;
+}
+
 /*
  *	Wait for a TCP event.
  *
@@ -2163,16 +2200,10 @@ static int do_tcp_setsockopt(struct sock
 		break;
 
 	case TCP_DEFER_ACCEPT:
-		icsk->icsk_accept_queue.rskq_defer_accept = 0;
-		if (val > 0) {
-			/* Translate value in seconds to number of
-			 * retransmits */
-			while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
-			       val > ((TCP_TIMEOUT_INIT / HZ) <<
-				       icsk->icsk_accept_queue.rskq_defer_accept))
-				icsk->icsk_accept_queue.rskq_defer_accept++;
-			icsk->icsk_accept_queue.rskq_defer_accept++;
-		}
+		/* Translate value in seconds to number of retransmits */
+		icsk->icsk_accept_queue.rskq_defer_accept =
+			secs_to_retrans(val, TCP_TIMEOUT_INIT / HZ,
+					TCP_RTO_MAX / HZ);
 		break;
 
 	case TCP_WINDOW_CLAMP:
@@ -2353,8 +2384,8 @@ static int do_tcp_getsockopt(struct sock
 			val = (val ? : sysctl_tcp_fin_timeout) / HZ;
 		break;
 	case TCP_DEFER_ACCEPT:
-		val = !icsk->icsk_accept_queue.rskq_defer_accept ? 0 :
-			((TCP_TIMEOUT_INIT / HZ) << (icsk->icsk_accept_queue.rskq_defer_accept - 1));
+		val = retrans_to_secs(icsk->icsk_accept_queue.rskq_defer_accept,
+				      TCP_TIMEOUT_INIT / HZ, TCP_RTO_MAX / HZ);
 		break;
 	case TCP_WINDOW_CLAMP:
 		val = tp->window_clamp;

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-17 14:20                                 ` Julian Anastasov
@ 2009-10-19 20:01                                   ` Eric Dumazet
  2009-10-19 20:11                                     ` Willy Tarreau
  2009-10-20  2:23                                     ` David Miller
  0 siblings, 2 replies; 41+ messages in thread
From: Eric Dumazet @ 2009-10-19 20:01 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Willy Tarreau, David Miller, netdev

Julian Anastasov a écrit :
> 	Hello,
> 
> On Sat, 17 Oct 2009, Eric Dumazet wrote:
> 
>> I really like this, but please define helper functions out of do_tcp_setsockopt()
>>
>> /* Translate value in seconds to number of SYN-ACK retransmits */
>> static u8 secs_to_retrans(int seconds)
>> {
>> 	u8 res = 0;
>>
>> 	if (seconds > 0) {
>> 		int timeout = TCP_TIMEOUT_INIT / HZ;
>> 		int period = timeout;
>>
>> 		res = 1;
>> 		while (res < 255 && seconds > period) {
>> 			res++;
>> 			timeout <<= 1;
>> 			if (timeout > TCP_RTO_MAX / HZ)
>> 				timeout = TCP_RTO_MAX / HZ;
>> 			period += timeout;
>> 		}
>> 	}
>> 	return res;
>> }
>>
>> You also need the reverse function for getsockopt()...
> 
> 	Yes, I forgot that. Here is what I tested, should
> I split it later to 2 patches? May be it should go somewhere
> in net/core/sock.c as extern funcs with EXPORT_SYMBOL to
> allow other protocols one day to use it?
> 

No need to EXPORT_SYMBOL it for the moment. We can add it later if really needed.

David, I think we should revert 6d01a026b7d3009a418326bdcf313503a314f1ea
(tcp: fix tcp_defer_accept to consider the timeout) 
since we know its broken.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-19 20:01                                   ` Eric Dumazet
@ 2009-10-19 20:11                                     ` Willy Tarreau
  2009-10-19 20:17                                       ` Eric Dumazet
  2009-10-20  2:23                                     ` David Miller
  1 sibling, 1 reply; 41+ messages in thread
From: Willy Tarreau @ 2009-10-19 20:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Julian Anastasov, David Miller, netdev

On Mon, Oct 19, 2009 at 10:01:15PM +0200, Eric Dumazet wrote:
> David, I think we should revert 6d01a026b7d3009a418326bdcf313503a314f1ea
> (tcp: fix tcp_defer_accept to consider the timeout) 
> since we know its broken.

yes I agree with Eric, please revert it and wait for Julian's
patch which gets it right. Sorry for the wrong fix, at least
it will have brought the discussion on the table, but without
a faulty patch it would have been better :-/

Thanks,
Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-19 20:11                                     ` Willy Tarreau
@ 2009-10-19 20:17                                       ` Eric Dumazet
  0 siblings, 0 replies; 41+ messages in thread
From: Eric Dumazet @ 2009-10-19 20:17 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Julian Anastasov, David Miller, netdev

Willy Tarreau a écrit :
> On Mon, Oct 19, 2009 at 10:01:15PM +0200, Eric Dumazet wrote:
>> David, I think we should revert 6d01a026b7d3009a418326bdcf313503a314f1ea
>> (tcp: fix tcp_defer_accept to consider the timeout) 
>> since we know its broken.
> 
> yes I agree with Eric, please revert it and wait for Julian's
> patch which gets it right. Sorry for the wrong fix, at least
> it will have brought the discussion on the table, but without
> a faulty patch it would have been better :-/
> 

Thanks again Willy, we made progress and this is nice :)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-19 20:01                                   ` Eric Dumazet
  2009-10-19 20:11                                     ` Willy Tarreau
@ 2009-10-20  2:23                                     ` David Miller
  1 sibling, 0 replies; 41+ messages in thread
From: David Miller @ 2009-10-20  2:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ja, w, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 19 Oct 2009 22:01:15 +0200

> David, I think we should revert 6d01a026b7d3009a418326bdcf313503a314f1ea
> (tcp: fix tcp_defer_accept to consider the timeout) 
> since we know its broken.

I've reverted that patch and applied Julian's three tcp
patches, thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-14 22:43             ` David Miller
  2009-10-15  6:08               ` Willy Tarreau
@ 2009-10-15  7:59               ` Julian Anastasov
  1 sibling, 0 replies; 41+ messages in thread
From: Julian Anastasov @ 2009-10-15  7:59 UTC (permalink / raw)
  To: David Miller; +Cc: w, netdev, eric.dumazet


	Hello,

On Wed, 14 Oct 2009, David Miller wrote:

> For now I'm pushing Willy's change into Linus's tree.

	Better do not apply this change because it's reasoning is
wrong - counter is increased at the same time when threshold
decreases - what this change does is to half the TCP_DEFER_ACCEPT
period (assuming first ACK comes immediately) with the side
effect to switch off the TCP_DEFER_ACCEPT flag during
SYN-ACK next retransmits. When one uses TCP_DEFER_ACCEPT=1 as
flag the listener will see the new socket on the 2nd ACK which
contradicts with the TCP_DEFER_ACCEPT semantic to wakeup only
when data comes.

> After more discussion we can revert if necessary.
> 
> I won't submit this to -stable until the discussion is fully resolved.
> 
> Thanks!

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-14  7:27         ` Julian Anastasov
  2009-10-14 20:17           ` Willy Tarreau
@ 2009-10-16 10:08           ` Ilpo Järvinen
  1 sibling, 0 replies; 41+ messages in thread
From: Ilpo Järvinen @ 2009-10-16 10:08 UTC (permalink / raw)
  To: Julian Anastasov; +Cc: Willy Tarreau, David Miller, Netdev, Eric Dumazet

On Wed, 14 Oct 2009, Julian Anastasov wrote:

> 
> 	Hello,
> 
> On Wed, 14 Oct 2009, Willy Tarreau wrote:
> 
> > > > I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> > > > client does not talk, the connection is never accepted and
> > > > remains in SYN_RECV state until the retransmits expire, where
> > > > it finally is deleted. This is bad when some firewall such as
> > > 
> > > 	I think, this is by design, there is big comment in
> > > tcp_check_req().
> > 
> > I'm not sure. That would considerably reduce the usefulness of
> > the feature. The comment I see there is just a one line explaining
> > why we drop the ACK. It does not indicate any strategy on what to
> > do when the counter expires.
> 
> 	There were attempts to switch the server socket to
> established state, no SYN-ACK retransmissions after client ACK
> and send FIN (or RST) to client on TCP_DEFER_ACCEPT expiration
> but due to some locking problems it was reverted:
> 
> http://marc.info/?t=121318919000001&r=1&w=2

For the record,

I think you underestimate that particular change by putting it like that. 
After being there and figuring out all the what that change did I'd say it 
attempted even more ambitious goal, ie., to allow ofo data to be queued. 
And that was the cause for all the troubles as the full state was created 
without a wakeup resulting in plurality in locks that one had to acquire 
when waking up and all the havoc broke loose due to lack of locking (or 
deadlockish ordering)...

...So the problem is not in the "ESTABLISHED" (or like) state itself, but 
in the fact that full state was created before the wakeup which needs to 
access the listening socket state too.

-- 
 i.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  5:07 TCP_DEFER_ACCEPT is missing counter update Willy Tarreau
  2009-10-13  7:11 ` David Miller
@ 2009-10-13  7:23 ` Eric Dumazet
  2009-10-13  7:34   ` Willy Tarreau
  2009-10-13  7:35   ` David Miller
  1 sibling, 2 replies; 41+ messages in thread
From: Eric Dumazet @ 2009-10-13  7:23 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: netdev

Willy Tarreau a écrit :
> Hello,
> 
> I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> client does not talk, the connection is never accepted and
> remains in SYN_RECV state until the retransmits expire, where
> it finally is deleted. This is bad when some firewall such as
> netfilter sits between the client and the server because the
> firewall sees the connection in ESTABLISHED state while the
> server will finally silently drop it without sending an RST.
> 
> This behaviour contradicts the man page which says it should
> wait only for some time :
> 
>        TCP_DEFER_ACCEPT (since Linux 2.4)
>           Allows a listener to be awakened only when data arrives
>           on the socket.  Takes an integer value  (seconds), this
>           can  bound  the  maximum  number  of attempts TCP will
>           make to complete the connection. This option should not
>           be used in code intended to be portable.
> 
> Also, looking at ipv4/tcp.c, a retransmit counter is correctly
> computed :
> 
>         case TCP_DEFER_ACCEPT:
>                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
>                 if (val > 0) {
>                         /* Translate value in seconds to number of
>                          * retransmits */
>                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
>                                val > ((TCP_TIMEOUT_INIT / HZ) <<
>                                        icsk->icsk_accept_queue.rskq_defer_accept))
>                                 icsk->icsk_accept_queue.rskq_defer_accept++;
>                         icsk->icsk_accept_queue.rskq_defer_accept++;
>                 }
>                 break;
> 
> ==> rskq_defer_accept is used as a counter of retransmits.
> 
> But in tcp_minisocks.c, this counter is only checked. And in
> fact, I have found no location which updates it. So I think
> that what was intended was to decrease it in tcp_minisocks
> whenever it is checked, which the trivial patch below does :
> 
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index f8d67cc..1b443b0 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>  	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
>  	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
> +		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
>  		inet_rsk(req)->acked = 1;
>  		return NULL;
>  	}
>  


I dont understand why you want to decrement rskq_defer_accept here.
We receive a pure ACK (wihout DATA).
We should receive exactly one such ACK.
(This is the third packet of the three way TCP handshake)

How this can solve the problem you mention ?
(ie, not sending an RST when we timeout the SYN_RECV session)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:23 ` Eric Dumazet
@ 2009-10-13  7:34   ` Willy Tarreau
  2009-10-13  8:08     ` Olaf van der Spek
  2009-10-13  8:29     ` Eric Dumazet
  2009-10-13  7:35   ` David Miller
  1 sibling, 2 replies; 41+ messages in thread
From: Willy Tarreau @ 2009-10-13  7:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Tue, Oct 13, 2009 at 09:23:59AM +0200, Eric Dumazet wrote:
> Willy Tarreau a écrit :
> > Hello,
> > 
> > I was trying to use TCP_DEFER_ACCEPT and noticed that if the
> > client does not talk, the connection is never accepted and
> > remains in SYN_RECV state until the retransmits expire, where
> > it finally is deleted. This is bad when some firewall such as
> > netfilter sits between the client and the server because the
> > firewall sees the connection in ESTABLISHED state while the
> > server will finally silently drop it without sending an RST.
> > 
> > This behaviour contradicts the man page which says it should
> > wait only for some time :
> > 
> >        TCP_DEFER_ACCEPT (since Linux 2.4)
> >           Allows a listener to be awakened only when data arrives
> >           on the socket.  Takes an integer value  (seconds), this
> >           can  bound  the  maximum  number  of attempts TCP will
> >           make to complete the connection. This option should not
> >           be used in code intended to be portable.
> > 
> > Also, looking at ipv4/tcp.c, a retransmit counter is correctly
> > computed :
> > 
> >         case TCP_DEFER_ACCEPT:
> >                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
> >                 if (val > 0) {
> >                         /* Translate value in seconds to number of
> >                          * retransmits */
> >                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
> >                                val > ((TCP_TIMEOUT_INIT / HZ) <<
> >                                        icsk->icsk_accept_queue.rskq_defer_accept))
> >                                 icsk->icsk_accept_queue.rskq_defer_accept++;
> >                         icsk->icsk_accept_queue.rskq_defer_accept++;
> >                 }
> >                 break;
> > 
> > ==> rskq_defer_accept is used as a counter of retransmits.
> > 
> > But in tcp_minisocks.c, this counter is only checked. And in
> > fact, I have found no location which updates it. So I think
> > that what was intended was to decrease it in tcp_minisocks
> > whenever it is checked, which the trivial patch below does :
> > 
> > diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> > index f8d67cc..1b443b0 100644
> > --- a/net/ipv4/tcp_minisocks.c
> > +++ b/net/ipv4/tcp_minisocks.c
> > @@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
> >  	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
> >  	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
> > +		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
> >  		inet_rsk(req)->acked = 1;
> >  		return NULL;
> >  	}
> >  
> 
> 
> I dont understand why you want to decrement rskq_defer_accept here.

Because the "timeout" as set by setsockopt() is converted into number
of retransmits.

> We receive a pure ACK (wihout DATA).
> We should receive exactly one such ACK.

No, we will receive other ones because the socket remains in SYN_RECV
and since the local system ignores this ACK, it will send a SYN-ACK
again, triggering a new ACK from the client.

Although the concept looks strange at first, I think its implementation
is in fact very smart because it manages to defer acceptation with an
approximate timeout without using another timer.

The most common requirement should only be to wait for an HTTP request
to come in, and setting the timeout to anything non-zero is enough to
just drop the first empty ACK and immediately accept on the data
segment, so this method fits this purpose perfectly.

Regards,
Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:34   ` Willy Tarreau
@ 2009-10-13  8:08     ` Olaf van der Spek
  2009-10-13  8:29     ` Eric Dumazet
  1 sibling, 0 replies; 41+ messages in thread
From: Olaf van der Spek @ 2009-10-13  8:08 UTC (permalink / raw)
  To: netdev

On Tue, Oct 13, 2009 at 9:34 AM, Willy Tarreau <w@1wt.eu> wrote:
>> We receive a pure ACK (wihout DATA).
>> We should receive exactly one such ACK.
>
> No, we will receive other ones because the socket remains in SYN_RECV
> and since the local system ignores this ACK, it will send a SYN-ACK
> again, triggering a new ACK from the client.

Why does it ignore the ACK? Just because that's the simplest
implementation of defer_accept?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:34   ` Willy Tarreau
  2009-10-13  8:08     ` Olaf van der Spek
@ 2009-10-13  8:29     ` Eric Dumazet
  2009-10-13  8:35       ` David Miller
  1 sibling, 1 reply; 41+ messages in thread
From: Eric Dumazet @ 2009-10-13  8:29 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Eric Dumazet, netdev

Willy Tarreau a écrit :
> On Tue, Oct 13, 2009 at 09:23:59AM +0200, Eric Dumazet wrote:
>> Willy Tarreau a écrit :
>>> Hello,
>>>
>>> I was trying to use TCP_DEFER_ACCEPT and noticed that if the
>>> client does not talk, the connection is never accepted and
>>> remains in SYN_RECV state until the retransmits expire, where
>>> it finally is deleted. This is bad when some firewall such as
>>> netfilter sits between the client and the server because the
>>> firewall sees the connection in ESTABLISHED state while the
>>> server will finally silently drop it without sending an RST.
>>>
>>> This behaviour contradicts the man page which says it should
>>> wait only for some time :
>>>
>>>        TCP_DEFER_ACCEPT (since Linux 2.4)
>>>           Allows a listener to be awakened only when data arrives
>>>           on the socket.  Takes an integer value  (seconds), this
>>>           can  bound  the  maximum  number  of attempts TCP will
>>>           make to complete the connection. This option should not
>>>           be used in code intended to be portable.
>>>
>>> Also, looking at ipv4/tcp.c, a retransmit counter is correctly
>>> computed :
>>>
>>>         case TCP_DEFER_ACCEPT:
>>>                 icsk->icsk_accept_queue.rskq_defer_accept = 0;
>>>                 if (val > 0) {
>>>                         /* Translate value in seconds to number of
>>>                          * retransmits */
>>>                         while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
>>>                                val > ((TCP_TIMEOUT_INIT / HZ) <<
>>>                                        icsk->icsk_accept_queue.rskq_defer_accept))
>>>                                 icsk->icsk_accept_queue.rskq_defer_accept++;
>>>                         icsk->icsk_accept_queue.rskq_defer_accept++;
>>>                 }
>>>                 break;
>>>
>>> ==> rskq_defer_accept is used as a counter of retransmits.
>>>
>>> But in tcp_minisocks.c, this counter is only checked. And in
>>> fact, I have found no location which updates it. So I think
>>> that what was intended was to decrease it in tcp_minisocks
>>> whenever it is checked, which the trivial patch below does :
>>>
>>> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
>>> index f8d67cc..1b443b0 100644
>>> --- a/net/ipv4/tcp_minisocks.c
>>> +++ b/net/ipv4/tcp_minisocks.c
>>> @@ -645,6 +645,7 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
>>>  	if (inet_csk(sk)->icsk_accept_queue.rskq_defer_accept &&
>>>  	    TCP_SKB_CB(skb)->end_seq == tcp_rsk(req)->rcv_isn + 1) {
>>> +		inet_csk(sk)->icsk_accept_queue.rskq_defer_accept--;
>>>  		inet_rsk(req)->acked = 1;
>>>  		return NULL;
>>>  	}
>>>  
>>
>> I dont understand why you want to decrement rskq_defer_accept here.
> 
> Because the "timeout" as set by setsockopt() is converted into number
> of retransmits.
> 
>> We receive a pure ACK (wihout DATA).
>> We should receive exactly one such ACK.
> 
> No, we will receive other ones because the socket remains in SYN_RECV
> and since the local system ignores this ACK, it will send a SYN-ACK
> again, triggering a new ACK from the client.
> 
> Although the concept looks strange at first, I think its implementation
> is in fact very smart because it manages to defer acceptation with an
> approximate timeout without using another timer.
> 
> The most common requirement should only be to wait for an HTTP request
> to come in, and setting the timeout to anything non-zero is enough to
> just drop the first empty ACK and immediately accept on the data
> segment, so this method fits this purpose perfectly.
> 

Indeed, thanks for this detailed explanation, I missed the SYN-ACK timer
and retransmits.

I played some years ago with TCP_DEFER_ACCEPT and got some unexpected results
on transmitted packets (server was consuming more bandwidth), and I know understand
it was very broken until today !

Thanks again Willy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  8:29     ` Eric Dumazet
@ 2009-10-13  8:35       ` David Miller
  0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2009-10-13  8:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: w, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Oct 2009 10:29:03 +0200

> Indeed, thanks for this detailed explanation, I missed the SYN-ACK timer
> and retransmits.
> 
> I played some years ago with TCP_DEFER_ACCEPT and got some unexpected results
> on transmitted packets (server was consuming more bandwidth), and I know understand
> it was very broken until today !
> 
> Thanks again Willy

Ok, and with that I put Willy's patch back in.

Thanks!

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:23 ` Eric Dumazet
  2009-10-13  7:34   ` Willy Tarreau
@ 2009-10-13  7:35   ` David Miller
  2009-10-13  8:12     ` Willy Tarreau
  1 sibling, 1 reply; 41+ messages in thread
From: David Miller @ 2009-10-13  7:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: w, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Oct 2009 09:23:59 +0200

> I dont understand why you want to decrement rskq_defer_accept here.
> We receive a pure ACK (wihout DATA).
> We should receive exactly one such ACK.
> (This is the third packet of the three way TCP handshake)
> 
> How this can solve the problem you mention ?
> (ie, not sending an RST when we timeout the SYN_RECV session)

I'll hold off on this patch until Eric's comments are
addressed, thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: TCP_DEFER_ACCEPT is missing counter update
  2009-10-13  7:35   ` David Miller
@ 2009-10-13  8:12     ` Willy Tarreau
  0 siblings, 0 replies; 41+ messages in thread
From: Willy Tarreau @ 2009-10-13  8:12 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev

On Tue, Oct 13, 2009 at 12:35:06AM -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 13 Oct 2009 09:23:59 +0200
> 
> > I dont understand why you want to decrement rskq_defer_accept here.
> > We receive a pure ACK (wihout DATA).
> > We should receive exactly one such ACK.
> > (This is the third packet of the three way TCP handshake)
> > 
> > How this can solve the problem you mention ?
> > (ie, not sending an RST when we timeout the SYN_RECV session)
> 
> I'll hold off on this patch until Eric's comments are
> addressed, thanks.

I have replied, but let's wait for Eric's response.

Willy


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2009-10-20  2:23 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-13  5:07 TCP_DEFER_ACCEPT is missing counter update Willy Tarreau
2009-10-13  7:11 ` David Miller
2009-10-13  7:19   ` Willy Tarreau
2009-10-13  7:27     ` David Miller
2009-10-13 21:27     ` Julian Anastasov
2009-10-14  4:52       ` Willy Tarreau
2009-10-14  7:27         ` Julian Anastasov
2009-10-14 20:17           ` Willy Tarreau
2009-10-14 21:12             ` Olaf van der Spek
2009-10-14 22:43             ` David Miller
2009-10-15  6:08               ` Willy Tarreau
2009-10-15  8:47                 ` Julian Anastasov
2009-10-15 12:41                   ` Willy Tarreau
2009-10-15 22:44                     ` Julian Anastasov
2009-10-16  3:51                       ` Eric Dumazet
2009-10-16  5:00                         ` Eric Dumazet
2009-10-16  5:29                           ` Willy Tarreau
2009-10-16  6:05                             ` Eric Dumazet
2009-10-16  6:18                               ` Willy Tarreau
2009-10-16  7:08                                 ` Eric Dumazet
2009-10-16  7:19                                   ` Willy Tarreau
2009-10-16  5:03                       ` Willy Tarreau
2009-10-16  8:49                         ` Julian Anastasov
2009-10-16 10:40                           ` Eric Dumazet
2009-10-16 19:27                             ` Willy Tarreau
2009-10-17 11:48                             ` Julian Anastasov
2009-10-17 12:07                               ` Eric Dumazet
2009-10-17 14:20                                 ` Julian Anastasov
2009-10-19 20:01                                   ` Eric Dumazet
2009-10-19 20:11                                     ` Willy Tarreau
2009-10-19 20:17                                       ` Eric Dumazet
2009-10-20  2:23                                     ` David Miller
2009-10-15  7:59               ` Julian Anastasov
2009-10-16 10:08           ` Ilpo Järvinen
2009-10-13  7:23 ` Eric Dumazet
2009-10-13  7:34   ` Willy Tarreau
2009-10-13  8:08     ` Olaf van der Spek
2009-10-13  8:29     ` Eric Dumazet
2009-10-13  8:35       ` David Miller
2009-10-13  7:35   ` David Miller
2009-10-13  8:12     ` Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).