netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* TSecr != 0 check in inet_lro.c
@ 2009-08-24 21:54 Octavian Purdila
  2009-08-25  5:42 ` Eric Dumazet
  0 siblings, 1 reply; 4+ messages in thread
From: Octavian Purdila @ 2009-08-24 21:54 UTC (permalink / raw)
  To: Jan-Bernd Themann; +Cc: netdev, Christoph Raisch


Hi,

We are seeing a performance issue with TSO/LRO which we tracked down to the 
TSecr !=0 check in lro_tcp_ip_check.

It happens when the LRO side's TSval wraps around and gets to 0. That triggers 
the TSO side to send packets with TSecr set to 0, which means that such 
packets won't be aggregated - and that will put a lot of burden on the stack 
which will result in lots of drops.

I'm failing to understand the purpose of this check. Any hints? :)

Thanks,
tavi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TSecr != 0 check in inet_lro.c
  2009-08-24 21:54 TSecr != 0 check in inet_lro.c Octavian Purdila
@ 2009-08-25  5:42 ` Eric Dumazet
  2009-08-25 11:50   ` Octavian Purdila
  2009-09-01 22:46   ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Eric Dumazet @ 2009-08-25  5:42 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: Jan-Bernd Themann, netdev, Christoph Raisch

Octavian Purdila a écrit :
> Hi,
> 
> We are seeing a performance issue with TSO/LRO which we tracked down to the 
> TSecr !=0 check in lro_tcp_ip_check.

ouch...

> 
> It happens when the LRO side's TSval wraps around and gets to 0. That triggers 
> the TSO side to send packets with TSecr set to 0, which means that such 
> packets won't be aggregated - and that will put a lot of burden on the stack 
> which will result in lots of drops.

Probability of such event is 1 / 2^32 or so ?

> 
> I'm failing to understand the purpose of this check. Any hints? :)
> 

rfc1323 badly interpreted ?

I remember tsecr=0 was forbidden by Linux, while apparently rfc is not
so clear.

rfc1323 : 3.2
         The Timestamp Echo Reply field (TSecr) is only valid if the ACK
         bit is set in the TCP header; if it is valid, it echos a times-
         tamp value that was sent by the remote TCP in the TSval field
         of a Timestamps option.  When TSecr is not valid, its value
         must be zero.  The TSecr value will generally be from the most
         recent Timestamp option that was received; however, there are
         exceptions that are explained below.

Note how this is not saying "a zero Tsecr value is not valid"

I could not find why : "When TSecr is not valid, its value
must be zero", and why we consider a zero value to be not meaningfull...

static inline void tcp_ack_update_rtt(struct sock *sk, const int flag,
                                      const s32 seq_rtt)
{
        const struct tcp_sock *tp = tcp_sk(sk);
        /* Note that peer MAY send zero echo. In this case it is ignored. (rfc1323) */
        if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr)
                tcp_ack_saw_tstamp(sk, flag);
        else if (seq_rtt >= 0)
                tcp_ack_no_tstamp(sk, seq_rtt, flag);
}

static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
                                         struct tcphdr *th, unsigned len)
{
        struct tcp_sock *tp = tcp_sk(sk);
        struct inet_connection_sock *icsk = inet_csk(sk);
        int saved_clamp = tp->rx_opt.mss_clamp;

        tcp_parse_options(skb, &tp->rx_opt, 0);

        if (th->ack) {
...
                if (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
                    !between(tp->rx_opt.rcv_tsecr, tp->retrans_stamp,
                             tcp_time_stamp)) {
                        NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_PAWSACTIVEREJECTED);
                        goto reset_and_undo;
                }
...

static inline void tcp_rcv_rtt_measure_ts(struct sock *sk,
                                          const struct sk_buff *skb)
{
        struct tcp_sock *tp = tcp_sk(sk);
        if (tp->rx_opt.rcv_tsecr &&
            (TCP_SKB_CB(skb)->end_seq -
             TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss))
                tcp_rcv_rtt_update(tp, tcp_time_stamp - tp->rx_opt.rcv_tsecr, 0);
}
...
static inline int tcp_packet_delayed(struct tcp_sock *tp)
{
        return !tp->retrans_stamp ||
                (tp->rx_opt.saw_tstamp && tp->rx_opt.rcv_tsecr &&
                 before(tp->rx_opt.rcv_tsecr, tp->retrans_stamp));
}
...

So we dont have a bit saying we received a tsecr, we use the 
'if saw_tstamp AND tsecr is not null' convention...


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TSecr != 0 check in inet_lro.c
  2009-08-25  5:42 ` Eric Dumazet
@ 2009-08-25 11:50   ` Octavian Purdila
  2009-09-01 22:46   ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: Octavian Purdila @ 2009-08-25 11:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Jan-Bernd Themann, netdev, Christoph Raisch

On Tuesday 25 August 2009 08:42:33 Eric Dumazet wrote:
> Octavian Purdila a écrit :
> > Hi,
> >
> > We are seeing a performance issue with TSO/LRO which we tracked down to
> > the TSecr !=0 check in lro_tcp_ip_check.
>
> ouch...
>
> > It happens when the LRO side's TSval wraps around and gets to 0. That
> > triggers the TSO side to send packets with TSecr set to 0, which means
> > that such packets won't be aggregated - and that will put a lot of burden
> > on the stack which will result in lots of drops.
>
> Probability of such event is 1 / 2^32 or so ?
>

Yes, its pretty low, but the timestamps are taken from jiffies and jiffies are 
initialized to -300*HZ so it will happen in 5 minutes after every reboot :)

> > I'm failing to understand the purpose of this check. Any hints? :)
>
> rfc1323 badly interpreted ?
>
> I remember tsecr=0 was forbidden by Linux, while apparently rfc is not
> so clear.
>
> rfc1323 : 3.2
>          The Timestamp Echo Reply field (TSecr) is only valid if the ACK
>          bit is set in the TCP header; if it is valid, it echos a times-
>          tamp value that was sent by the remote TCP in the TSval field
>          of a Timestamps option.  When TSecr is not valid, its value
>          must be zero.  The TSecr value will generally be from the most
>          recent Timestamp option that was received; however, there are
>          exceptions that are explained below.
>
> Note how this is not saying "a zero Tsecr value is not valid"

That is my understanding as well.

> I could not find why : "When TSecr is not valid, its value
> must be zero", and why we consider a zero value to be not meaningfull...
>
> ...
>
> So we dont have a bit saying we received a tsecr, we use the
> 'if saw_tstamp AND tsecr is not null' convention...

Alright, its starting to make sense. So, it looks like we can remove the check 
from inet_lro, and that may even reduce the probability of receiving a zero 
TSecr in the stack. Right?

Thanks for you help!

tavi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: TSecr != 0 check in inet_lro.c
  2009-08-25  5:42 ` Eric Dumazet
  2009-08-25 11:50   ` Octavian Purdila
@ 2009-09-01 22:46   ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2009-09-01 22:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: opurdila, themann, netdev, raisch

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 25 Aug 2009 07:42:33 +0200

> Octavian Purdila a écrit :
>> 
>> I'm failing to understand the purpose of this check. Any hints? :)
>> 
> 
> rfc1323 badly interpreted ?
> 
> I remember tsecr=0 was forbidden by Linux, while apparently rfc is not
> so clear.
> 
> rfc1323 : 3.2
>          The Timestamp Echo Reply field (TSecr) is only valid if the ACK
>          bit is set in the TCP header; if it is valid, it echos a times-
>          tamp value that was sent by the remote TCP in the TSval field
>          of a Timestamps option.  When TSecr is not valid, its value
>          must be zero.  The TSecr value will generally be from the most
>          recent Timestamp option that was received; however, there are
>          exceptions that are explained below.
> 
> Note how this is not saying "a zero Tsecr value is not valid"

Indeed this is a very tricky situation.

What it's saying here is that you can emit a zero TSecr when you
simply haven't received a valid timestamp from the peer yet.

I think we can therefore remove all of these special zero checks, but
we must be mindful to make sure we handle the connection startup case
properly (where we see these 'no valid timestamp yet' zero TSecrs).

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-09-01 22:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-24 21:54 TSecr != 0 check in inet_lro.c Octavian Purdila
2009-08-25  5:42 ` Eric Dumazet
2009-08-25 11:50   ` Octavian Purdila
2009-09-01 22:46   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).