From: Neal Cardwell <ncardwell@google.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Neal Cardwell <ncardwell@google.com>,
Van Jacobson <vanj@google.com>, Yuchung Cheng <ycheng@google.com>,
Nandita Dukkipati <nanditad@google.com>,
Eric Dumazet <edumazet@google.com>,
Soheil Hassas Yeganeh <soheil@google.com>
Subject: [PATCH v3 net-next 06/16] tcp: count packets marked lost for a TCP connection
Date: Sun, 18 Sep 2016 18:03:43 -0400 [thread overview]
Message-ID: <1474236233-28511-7-git-send-email-ncardwell@google.com> (raw)
In-Reply-To: <1474236233-28511-1-git-send-email-ncardwell@google.com>
Count the number of packets that a TCP connection marks lost.
Congestion control modules can use this loss rate information for more
intelligent decisions about how fast to send.
Specifically, this is used in TCP BBR policer detection. BBR uses a
high packet loss rate as one signal in its policer detection and
policer bandwidth estimation algorithm.
The BBR policer detection algorithm cannot simply track retransmits,
because a retransmit can be (and often is) an indicator of packets
lost long, long ago. This is particularly true in a long CA_Loss
period that repairs the initial massive losses when a policer kicks
in.
Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
include/linux/tcp.h | 1 +
net/ipv4/tcp_input.c | 25 ++++++++++++++++++++++++-
2 files changed, 25 insertions(+), 1 deletion(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 6433cc8..38590fb 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -267,6 +267,7 @@ struct tcp_sock {
* receiver in Recovery. */
u32 prr_out; /* Total number of pkts sent during Recovery. */
u32 delivered; /* Total data packets delivered incl. rexmits */
+ u32 lost; /* Total data packets lost incl. rexmits */
u32 rcv_wnd; /* Current receiver window */
u32 write_seq; /* Tail(+1) of data held in tcp send buffer */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ac5b38f..024b579 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -899,12 +899,29 @@ static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb)
tp->retransmit_high = TCP_SKB_CB(skb)->end_seq;
}
+/* Sum the number of packets on the wire we have marked as lost.
+ * There are two cases we care about here:
+ * a) Packet hasn't been marked lost (nor retransmitted),
+ * and this is the first loss.
+ * b) Packet has been marked both lost and retransmitted,
+ * and this means we think it was lost again.
+ */
+static void tcp_sum_lost(struct tcp_sock *tp, struct sk_buff *skb)
+{
+ __u8 sacked = TCP_SKB_CB(skb)->sacked;
+
+ if (!(sacked & TCPCB_LOST) ||
+ ((sacked & TCPCB_LOST) && (sacked & TCPCB_SACKED_RETRANS)))
+ tp->lost += tcp_skb_pcount(skb);
+}
+
static void tcp_skb_mark_lost(struct tcp_sock *tp, struct sk_buff *skb)
{
if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_ACKED))) {
tcp_verify_retransmit_hint(tp, skb);
tp->lost_out += tcp_skb_pcount(skb);
+ tcp_sum_lost(tp, skb);
TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
}
}
@@ -913,6 +930,7 @@ void tcp_skb_mark_lost_uncond_verify(struct tcp_sock *tp, struct sk_buff *skb)
{
tcp_verify_retransmit_hint(tp, skb);
+ tcp_sum_lost(tp, skb);
if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_ACKED))) {
tp->lost_out += tcp_skb_pcount(skb);
TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
@@ -1890,6 +1908,7 @@ void tcp_enter_loss(struct sock *sk)
struct sk_buff *skb;
bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
bool is_reneg; /* is receiver reneging on SACKs? */
+ bool mark_lost;
/* Reduce ssthresh if it has not yet been made inside this window. */
if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
@@ -1923,8 +1942,12 @@ void tcp_enter_loss(struct sock *sk)
if (skb == tcp_send_head(sk))
break;
+ mark_lost = (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||
+ is_reneg);
+ if (mark_lost)
+ tcp_sum_lost(tp, skb);
TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
- if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED) || is_reneg) {
+ if (mark_lost) {
TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
tp->lost_out += tcp_skb_pcount(skb);
--
2.8.0.rc3.226.g39d4020
next prev parent reply other threads:[~2016-09-18 22:04 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-18 22:03 [PATCH v3 net-next 00/16] tcp: BBR congestion control algorithm Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 01/16] tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 02/16] lib/win_minmax: windowed min or max estimator Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 03/16] tcp: use windowed min filter library for TCP min_rtt estimation Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 04/16] net_sched: sch_fq: add low_rate_threshold parameter Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 05/16] tcp: switch back to proper tcp_skb_cb size check in tcp_init() Neal Cardwell
2016-09-19 14:37 ` Lance Richardson
2016-09-19 14:41 ` Eric Dumazet
2016-09-18 22:03 ` Neal Cardwell [this message]
2016-09-18 22:03 ` [PATCH v3 net-next 07/16] tcp: track data delivery rate for a TCP connection Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 08/16] tcp: track application-limited rate samples Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 09/16] tcp: export data delivery rate Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 10/16] tcp: allow congestion control module to request TSO skb segment count Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 11/16] tcp: export tcp_tso_autosize() and parameterize minimum number of TSO segments Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 12/16] tcp: export tcp_mss_to_mtu() for congestion control modules Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 13/16] tcp: allow congestion control to expand send buffer differently Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 14/16] tcp: new CC hook to set sending rate with rate_sample in any CA state Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 15/16] tcp: increase ICSK_CA_PRIV_SIZE from 64 bytes to 88 Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 16/16] tcp_bbr: add BBR congestion control Neal Cardwell
[not found] ` <CA++eYdtWkMqT1zk_D00H1TciYb_4+aQ6-96YzG1n_h4LLk663g@mail.gmail.com>
2016-09-19 2:43 ` Neal Cardwell
2016-09-19 20:57 ` Stephen Hemminger
2016-09-19 21:10 ` Eric Dumazet
2016-09-19 21:17 ` Rick Jones
2016-09-19 21:23 ` Eric Dumazet
2016-09-19 23:28 ` Stephen Hemminger
2016-09-19 23:33 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1474236233-28511-7-git-send-email-ncardwell@google.com \
--to=ncardwell@google.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=nanditad@google.com \
--cc=netdev@vger.kernel.org \
--cc=soheil@google.com \
--cc=vanj@google.com \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).