netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT
@ 2014-08-12  9:45 Andrey Vagin
  2014-08-12 12:15 ` Eric Dumazet
  2014-08-12 14:53 ` Yuchung Cheng
  0 siblings, 2 replies; 7+ messages in thread
From: Andrey Vagin @ 2014-08-12  9:45 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Andrey Vagin, Eric Dumazet, Pavel Emelyanov,
	David S. Miller

We don't know right timestamp for repaired skb-s. Wrong RTT estimations
isn't good, because some congestion modules heavily depends on it.

This patch adds the TCPCB_REPAIRED flag, which is included in
TCPCB_RETRANS.

Thanks to Eric for the advice how to fix this issue.

This patch fixes the warning:
[  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
[  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
[  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
[  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
[  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
[  879.569982] Call Trace:
[  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
[  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
[  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
[  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
[  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
[  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
[  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
[  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
[  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
[  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
[  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
[  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
[  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
[  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---

Cc: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 include/net/tcp.h |  4 +++-
 net/ipv4/tcp.c    | 16 +++++++++-------
 2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index dafa1cb..36f5525 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -705,8 +705,10 @@ struct tcp_skb_cb {
 #define TCPCB_SACKED_RETRANS	0x02	/* SKB retransmitted		*/
 #define TCPCB_LOST		0x04	/* SKB is lost			*/
 #define TCPCB_TAGBITS		0x07	/* All tag bits			*/
+#define TCPCB_REPAIRED		0x10	/* SKB repaired (no skb_mstamp)	*/
 #define TCPCB_EVER_RETRANS	0x80	/* Ever retransmitted frame	*/
-#define TCPCB_RETRANS		(TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS)
+#define TCPCB_RETRANS		(TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS| \
+				TCPCB_REPAIRED)
 
 	__u8		ip_dsfield;	/* IPv4 tos or IPv6 dsfield	*/
 	/* 1 byte hole */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 181b70e..cb5f548 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1188,13 +1188,6 @@ new_segment:
 					goto wait_for_memory;
 
 				/*
-				 * All packets are restored as if they have
-				 * already been sent.
-				 */
-				if (tp->repair)
-					TCP_SKB_CB(skb)->when = tcp_time_stamp;
-
-				/*
 				 * Check whether we can use HW checksum.
 				 */
 				if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
@@ -1203,6 +1196,15 @@ new_segment:
 				skb_entail(sk, skb);
 				copy = size_goal;
 				max = size_goal;
+
+				/* All packets are restored as if they have
+				 * already been sent. skb_mstamp isn't set to
+				 * avoid wrong rtt estimation.
+				 */
+				if (tp->repair) {
+					TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
+					TCP_SKB_CB(skb)->when = tcp_time_stamp;
+				}
 			}
 
 			/* Try to append data to the end of skb. */
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT
  2014-08-12  9:45 [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT Andrey Vagin
@ 2014-08-12 12:15 ` Eric Dumazet
  2014-08-12 12:33   ` Andrew Vagin
  2014-08-12 14:53 ` Yuchung Cheng
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-08-12 12:15 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: netdev, linux-kernel, Eric Dumazet, Pavel Emelyanov,
	David S. Miller

On Tue, 2014-08-12 at 13:45 +0400, Andrey Vagin wrote:
> We don't know right timestamp for repaired skb-s. Wrong RTT estimations
> isn't good, because some congestion modules heavily depends on it.
> 
> This patch adds the TCPCB_REPAIRED flag, which is included in
> TCPCB_RETRANS.

...

> +
> +				/* All packets are restored as if they have
> +				 * already been sent. skb_mstamp isn't set to
> +				 * avoid wrong rtt estimation.
> +				 */
> +				if (tp->repair) {
> +					TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
> +					TCP_SKB_CB(skb)->when = tcp_time_stamp;
> +				}
>  			}
>  
>  			/* Try to append data to the end of skb. */


Are you sure TCP_SKB_CB(skb)->when needs to be set ?

It should not anymore.

If yes, I believe a comment would help a lot here.

Also, please include this tag to ease stable backports (3.15+) :

Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages")
Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")

Thanks !

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT
  2014-08-12 12:15 ` Eric Dumazet
@ 2014-08-12 12:33   ` Andrew Vagin
  2014-08-12 13:14     ` Eric Dumazet
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Vagin @ 2014-08-12 12:33 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrey Vagin, netdev, linux-kernel, Eric Dumazet, Pavel Emelyanov,
	David S. Miller

On Tue, Aug 12, 2014 at 05:15:01AM -0700, Eric Dumazet wrote:
> On Tue, 2014-08-12 at 13:45 +0400, Andrey Vagin wrote:
> > We don't know right timestamp for repaired skb-s. Wrong RTT estimations
> > isn't good, because some congestion modules heavily depends on it.
> > 
> > This patch adds the TCPCB_REPAIRED flag, which is included in
> > TCPCB_RETRANS.
> 
> ...
> 
> > +
> > +				/* All packets are restored as if they have
> > +				 * already been sent. skb_mstamp isn't set to
> > +				 * avoid wrong rtt estimation.
> > +				 */
> > +				if (tp->repair) {
> > +					TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
> > +					TCP_SKB_CB(skb)->when = tcp_time_stamp;
> > +				}
> >  			}
> >  
> >  			/* Try to append data to the end of skb. */
> 
> 
> Are you sure TCP_SKB_CB(skb)->when needs to be set ?

It's used in tcp_rearm_rto() for calculating a retransmit timeout.
...
	const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto;
	s32 delta = (s32)(rto_time_stamp - tcp_time_stamp);
...

"when" is used as a start point, so I think it's acceptable here.

I will add a comment. Thanks.

> 
> It should not anymore.
> 
> If yes, I believe a comment would help a lot here.
> 
> Also, please include this tag to ease stable backports (3.15+) :
> 
> Fixes: 431a91242d8d ("tcp: timestamp SYN+DATA messages")
> Fixes: 740b0f1841f6 ("tcp: switch rtt estimations to usec resolution")

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT
  2014-08-12 12:33   ` Andrew Vagin
@ 2014-08-12 13:14     ` Eric Dumazet
  2014-08-12 14:34       ` Andrew Vagin
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2014-08-12 13:14 UTC (permalink / raw)
  To: Andrew Vagin
  Cc: Andrey Vagin, netdev, linux-kernel, Eric Dumazet, Pavel Emelyanov,
	David S. Miller

On Tue, 2014-08-12 at 16:33 +0400, Andrew Vagin wrote:
> On Tue, Aug 12, 2014 at 05:15:01AM -0700, Eric Dumazet wrote:
> > On Tue, 2014-08-12 at 13:45 +0400, Andrey Vagin wrote:
> > > We don't know right timestamp for repaired skb-s. Wrong RTT estimations
> > > isn't good, because some congestion modules heavily depends on it.
> > > 
> > > This patch adds the TCPCB_REPAIRED flag, which is included in
> > > TCPCB_RETRANS.
> > 
> > ...
> > 
> > > +
> > > +				/* All packets are restored as if they have
> > > +				 * already been sent. skb_mstamp isn't set to
> > > +				 * avoid wrong rtt estimation.
> > > +				 */
> > > +				if (tp->repair) {
> > > +					TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
> > > +					TCP_SKB_CB(skb)->when = tcp_time_stamp;
> > > +				}
> > >  			}
> > >  
> > >  			/* Try to append data to the end of skb. */
> > 
> > 
> > Are you sure TCP_SKB_CB(skb)->when needs to be set ?
> 
> It's used in tcp_rearm_rto() for calculating a retransmit timeout.
> ...
> 	const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto;
> 	s32 delta = (s32)(rto_time_stamp - tcp_time_stamp);
> ...
> 
> "when" is used as a start point, so I think it's acceptable here.
> 
> I will add a comment. Thanks.

tcp_rearm_rto() does the following :


        if (!tp->packets_out) {
                inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS);
        } else {
                u32 rto = inet_csk(sk)->icsk_rto;
                /* Offset the time elapsed after installing regular RTO */
                if (icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
                    icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
                        struct sk_buff *skb = tcp_write_queue_head(sk);
                        const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto;

This means that : at least one packet was transmitted (packets_out is not 0)

Since we timestamp all packets we transmit (look at tcp_transmit_skb() callers, all doing :

TCP_SKB_CB(skb)->when = tcp_time_stamp; 

Then, write queue head was timestamped properly at the time packet was sent,
not at the time tcp repair code reinjected skbs into the write queue.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT
  2014-08-12 13:14     ` Eric Dumazet
@ 2014-08-12 14:34       ` Andrew Vagin
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Vagin @ 2014-08-12 14:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrey Vagin, netdev, linux-kernel, Eric Dumazet, Pavel Emelyanov,
	David S. Miller

On Tue, Aug 12, 2014 at 06:14:43AM -0700, Eric Dumazet wrote:
> On Tue, 2014-08-12 at 16:33 +0400, Andrew Vagin wrote:
> > On Tue, Aug 12, 2014 at 05:15:01AM -0700, Eric Dumazet wrote:
> > > On Tue, 2014-08-12 at 13:45 +0400, Andrey Vagin wrote:
> > > > We don't know right timestamp for repaired skb-s. Wrong RTT estimations
> > > > isn't good, because some congestion modules heavily depends on it.
> > > > 
> > > > This patch adds the TCPCB_REPAIRED flag, which is included in
> > > > TCPCB_RETRANS.
> > > 
> > > ...
> > > 
> > > > +
> > > > +				/* All packets are restored as if they have
> > > > +				 * already been sent. skb_mstamp isn't set to
> > > > +				 * avoid wrong rtt estimation.
> > > > +				 */
> > > > +				if (tp->repair) {
> > > > +					TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
> > > > +					TCP_SKB_CB(skb)->when = tcp_time_stamp;
> > > > +				}
> > > >  			}
> > > >  
> > > >  			/* Try to append data to the end of skb. */
> > > 
> > > 
> > > Are you sure TCP_SKB_CB(skb)->when needs to be set ?
> > 
> > It's used in tcp_rearm_rto() for calculating a retransmit timeout.
> > ...
> > 	const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto;
> > 	s32 delta = (s32)(rto_time_stamp - tcp_time_stamp);
> > ...
> > 
> > "when" is used as a start point, so I think it's acceptable here.
> > 
> > I will add a comment. Thanks.
> 
> tcp_rearm_rto() does the following :
> 
> 
>         if (!tp->packets_out) {
>                 inet_csk_clear_xmit_timer(sk, ICSK_TIME_RETRANS);
>         } else {
>                 u32 rto = inet_csk(sk)->icsk_rto;
>                 /* Offset the time elapsed after installing regular RTO */
>                 if (icsk->icsk_pending == ICSK_TIME_EARLY_RETRANS ||
>                     icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) {
>                         struct sk_buff *skb = tcp_write_queue_head(sk);
>                         const u32 rto_time_stamp = TCP_SKB_CB(skb)->when + rto;
> 
> This means that : at least one packet was transmitted (packets_out is not 0)

This one packet may be repaired:

static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
                           int push_one, gfp_t gfp)
	...
	while ((skb = tcp_send_head(sk))) {
		...
		if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE)
                        goto repair; /* Skip network transmission */
		...
		TCP_SKB_CB(skb)->when = tcp_time_stamp;
		...
repair:
		/* Advance the send_head.  This one is sent out.
		 * This call will increment packets_out.
		 */
		tcp_event_new_data_sent(sk, skb)
			tp->packets_out += tcp_skb_pcount(skb);
	}

Looks like we need move setting of "when" in tcp_write_xmit, because it
is set here for all normal skb-s.


> 
> Since we timestamp all packets we transmit (look at tcp_transmit_skb() callers, all doing :
> 
> TCP_SKB_CB(skb)->when = tcp_time_stamp; 
> 
> Then, write queue head was timestamped properly at the time packet was sent,
> not at the time tcp repair code reinjected skbs into the write queue.
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT
  2014-08-12  9:45 [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT Andrey Vagin
  2014-08-12 12:15 ` Eric Dumazet
@ 2014-08-12 14:53 ` Yuchung Cheng
  2014-08-12 18:29   ` Andrew Vagin
  1 sibling, 1 reply; 7+ messages in thread
From: Yuchung Cheng @ 2014-08-12 14:53 UTC (permalink / raw)
  To: Andrey Vagin
  Cc: netdev, linux-kernel@vger.kernel.org, Eric Dumazet,
	Pavel Emelyanov, David S. Miller

On Tue, Aug 12, 2014 at 2:45 AM, Andrey Vagin <avagin@openvz.org> wrote:
> We don't know right timestamp for repaired skb-s. Wrong RTT estimations
> isn't good, because some congestion modules heavily depends on it.
>
> This patch adds the TCPCB_REPAIRED flag, which is included in
> TCPCB_RETRANS.
>
> Thanks to Eric for the advice how to fix this issue.
>
> This patch fixes the warning:
> [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
> [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
> [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
> [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
> [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
> [  879.569982] Call Trace:
> [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
> [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
> [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
> [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
> [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
> [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
> [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
> [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
> [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
> [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
> [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
> [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
> [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
> [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Pavel Emelyanov <xemul@parallels.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>
> ---
>  include/net/tcp.h |  4 +++-
>  net/ipv4/tcp.c    | 16 +++++++++-------
>  2 files changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index dafa1cb..36f5525 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -705,8 +705,10 @@ struct tcp_skb_cb {
>  #define TCPCB_SACKED_RETRANS   0x02    /* SKB retransmitted            */
>  #define TCPCB_LOST             0x04    /* SKB is lost                  */
>  #define TCPCB_TAGBITS          0x07    /* All tag bits                 */
> +#define TCPCB_REPAIRED         0x10    /* SKB repaired (no skb_mstamp) */
>  #define TCPCB_EVER_RETRANS     0x80    /* Ever retransmitted frame     */
> -#define TCPCB_RETRANS          (TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS)
> +#define TCPCB_RETRANS          (TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS| \
> +                               TCPCB_REPAIRED)
>
>         __u8            ip_dsfield;     /* IPv4 tos or IPv6 dsfield     */
>         /* 1 byte hole */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 181b70e..cb5f548 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1188,13 +1188,6 @@ new_segment:
>                                         goto wait_for_memory;
>
>                                 /*
> -                                * All packets are restored as if they have
> -                                * already been sent.
> -                                */
> -                               if (tp->repair)
> -                                       TCP_SKB_CB(skb)->when = tcp_time_stamp;
> -
> -                               /*
>                                  * Check whether we can use HW checksum.
>                                  */
>                                 if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
> @@ -1203,6 +1196,15 @@ new_segment:
>                                 skb_entail(sk, skb);
>                                 copy = size_goal;
>                                 max = size_goal;
> +
> +                               /* All packets are restored as if they have
> +                                * already been sent. skb_mstamp isn't set to
> +                                * avoid wrong rtt estimation.
> +                                */
> +                               if (tp->repair) {
> +                                       TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
> +                                       TCP_SKB_CB(skb)->when = tcp_time_stamp;
But this still allow RTT samples from TCP timestamp options even if
the packet is marked retransmitted/repaired in tcp_ack_update_rtt()?

> +                               }
>                         }
>
>                         /* Try to append data to the end of skb. */
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT
  2014-08-12 14:53 ` Yuchung Cheng
@ 2014-08-12 18:29   ` Andrew Vagin
  0 siblings, 0 replies; 7+ messages in thread
From: Andrew Vagin @ 2014-08-12 18:29 UTC (permalink / raw)
  To: Yuchung Cheng
  Cc: Andrey Vagin, netdev, linux-kernel@vger.kernel.org, Eric Dumazet,
	Pavel Emelyanov, David S. Miller

On Tue, Aug 12, 2014 at 07:53:57AM -0700, Yuchung Cheng wrote:
> On Tue, Aug 12, 2014 at 2:45 AM, Andrey Vagin <avagin@openvz.org> wrote:
> > We don't know right timestamp for repaired skb-s. Wrong RTT estimations
> > isn't good, because some congestion modules heavily depends on it.
> >
> > This patch adds the TCPCB_REPAIRED flag, which is included in
> > TCPCB_RETRANS.
> >
> > Thanks to Eric for the advice how to fix this issue.
> >
> > This patch fixes the warning:
> > [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
> > [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
> > [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
> > [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
> > [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
> > [  879.569982] Call Trace:
> > [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
> > [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
> > [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
> > [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
> > [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
> > [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
> > [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
> > [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
> > [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
> > [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
> > [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
> > [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
> > [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
> > [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
> >
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Pavel Emelyanov <xemul@parallels.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Signed-off-by: Andrey Vagin <avagin@openvz.org>
> > ---
> >  include/net/tcp.h |  4 +++-
> >  net/ipv4/tcp.c    | 16 +++++++++-------
> >  2 files changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index dafa1cb..36f5525 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -705,8 +705,10 @@ struct tcp_skb_cb {
> >  #define TCPCB_SACKED_RETRANS   0x02    /* SKB retransmitted            */
> >  #define TCPCB_LOST             0x04    /* SKB is lost                  */
> >  #define TCPCB_TAGBITS          0x07    /* All tag bits                 */
> > +#define TCPCB_REPAIRED         0x10    /* SKB repaired (no skb_mstamp) */
> >  #define TCPCB_EVER_RETRANS     0x80    /* Ever retransmitted frame     */
> > -#define TCPCB_RETRANS          (TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS)
> > +#define TCPCB_RETRANS          (TCPCB_SACKED_RETRANS|TCPCB_EVER_RETRANS| \
> > +                               TCPCB_REPAIRED)
> >
> >         __u8            ip_dsfield;     /* IPv4 tos or IPv6 dsfield     */
> >         /* 1 byte hole */
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index 181b70e..cb5f548 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -1188,13 +1188,6 @@ new_segment:
> >                                         goto wait_for_memory;
> >
> >                                 /*
> > -                                * All packets are restored as if they have
> > -                                * already been sent.
> > -                                */
> > -                               if (tp->repair)
> > -                                       TCP_SKB_CB(skb)->when = tcp_time_stamp;
> > -
> > -                               /*
> >                                  * Check whether we can use HW checksum.
> >                                  */
> >                                 if (sk->sk_route_caps & NETIF_F_ALL_CSUM)
> > @@ -1203,6 +1196,15 @@ new_segment:
> >                                 skb_entail(sk, skb);
> >                                 copy = size_goal;
> >                                 max = size_goal;
> > +
> > +                               /* All packets are restored as if they have
> > +                                * already been sent. skb_mstamp isn't set to
> > +                                * avoid wrong rtt estimation.
> > +                                */
> > +                               if (tp->repair) {
> > +                                       TCP_SKB_CB(skb)->sacked |= TCPCB_REPAIRED;
> > +                                       TCP_SKB_CB(skb)->when = tcp_time_stamp;
> But this still allow RTT samples from TCP timestamp options even if
> the packet is marked retransmitted/repaired in tcp_ack_update_rtt()?

"when" isn't used there.

rtt = tcp_time_stamp - tp->rx_opt.rcv_tsecr

If a tcp connection is moved from another host, we set tp->tsoffset so,
that rcv_tsecr remains coherent with tcp_time_stamp on the target host.

> 
> > +                               }
> >                         }
> >
> >                         /* Try to append data to the end of skb. */
> > --
> > 1.9.3
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-08-12 18:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-12  9:45 [PATCH] tcp: don't use timestamp from repaired skb-s to calculate RTT Andrey Vagin
2014-08-12 12:15 ` Eric Dumazet
2014-08-12 12:33   ` Andrew Vagin
2014-08-12 13:14     ` Eric Dumazet
2014-08-12 14:34       ` Andrew Vagin
2014-08-12 14:53 ` Yuchung Cheng
2014-08-12 18:29   ` Andrew Vagin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).