netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Neal Cardwell <ncardwell@google.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, Yuchung Cheng <ycheng@google.com>,
	Van Jacobson <vanj@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Nandita Dukkipati <nanditad@google.com>,
	Eric Dumazet <edumazet@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>
Subject: [PATCH v3 net-next 14/16] tcp: new CC hook to set sending rate with rate_sample in any CA state
Date: Sun, 18 Sep 2016 18:03:51 -0400	[thread overview]
Message-ID: <1474236233-28511-15-git-send-email-ncardwell@google.com> (raw)
In-Reply-To: <1474236233-28511-1-git-send-email-ncardwell@google.com>

From: Yuchung Cheng <ycheng@google.com>

This commit introduces an optional new "omnipotent" hook,
cong_control(), for congestion control modules. The cong_control()
function is called at the end of processing an ACK (i.e., after
updating sequence numbers, the SACK scoreboard, and loss
detection). At that moment we have precise delivery rate information
the congestion control module can use to control the sending behavior
(using cwnd, TSO skb size, and pacing rate) in any CA state.

This function can also be used by a congestion control that prefers
not to use the default cwnd reduction approach (i.e., the PRR
algorithm) during CA_Recovery to control the cwnd and sending rate
during loss recovery.

We take advantage of the fact that recent changes defer the
retransmission or transmission of new data (e.g. by F-RTO) in recovery
until the new tcp_cong_control() function is run.

With this commit, we only run tcp_update_pacing_rate() if the
congestion control is not using this new API. New congestion controls
which use the new API do not want the TCP stack to run the default
pacing rate calculation and overwrite whatever pacing rate they have
chosen at initialization time.

Signed-off-by: Van Jacobson <vanj@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 include/net/tcp.h    |  4 ++++
 net/ipv4/tcp_cong.c  |  2 +-
 net/ipv4/tcp_input.c | 17 ++++++++++++++---
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 1aa9628..f83b7f2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -919,6 +919,10 @@ struct tcp_congestion_ops {
 	u32 (*tso_segs_goal)(struct sock *sk);
 	/* returns the multiplier used in tcp_sndbuf_expand (optional) */
 	u32 (*sndbuf_expand)(struct sock *sk);
+	/* call when packets are delivered to update cwnd and pacing rate,
+	 * after all the ca_state processing. (optional)
+	 */
+	void (*cong_control)(struct sock *sk, const struct rate_sample *rs);
 	/* get info for inet_diag (optional) */
 	size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
 			   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 882caa4..1294af4 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -69,7 +69,7 @@ int tcp_register_congestion_control(struct tcp_congestion_ops *ca)
 	int ret = 0;
 
 	/* all algorithms must implement ssthresh and cong_avoid ops */
-	if (!ca->ssthresh || !ca->cong_avoid) {
+	if (!ca->ssthresh || !(ca->cong_avoid || ca->cong_control)) {
 		pr_err("%s does not implement required ops\n", ca->name);
 		return -EINVAL;
 	}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5af0bf3..28cfe99 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2536,6 +2536,9 @@ static inline void tcp_end_cwnd_reduction(struct sock *sk)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 
+	if (inet_csk(sk)->icsk_ca_ops->cong_control)
+		return;
+
 	/* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */
 	if (inet_csk(sk)->icsk_ca_state == TCP_CA_CWR ||
 	    (tp->undo_marker && tp->snd_ssthresh < TCP_INFINITE_SSTHRESH)) {
@@ -3312,8 +3315,15 @@ static inline bool tcp_may_raise_cwnd(const struct sock *sk, const int flag)
  * information. All transmission or retransmission are delayed afterwards.
  */
 static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked,
-			     int flag)
+			     int flag, const struct rate_sample *rs)
 {
+	const struct inet_connection_sock *icsk = inet_csk(sk);
+
+	if (icsk->icsk_ca_ops->cong_control) {
+		icsk->icsk_ca_ops->cong_control(sk, rs);
+		return;
+	}
+
 	if (tcp_in_cwnd_reduction(sk)) {
 		/* Reduce cwnd if state mandates */
 		tcp_cwnd_reduction(sk, acked_sacked, flag);
@@ -3683,7 +3693,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
 	delivered = tp->delivered - delivered;	/* freshly ACKed or SACKed */
 	lost = tp->lost - lost;			/* freshly marked lost */
 	tcp_rate_gen(sk, delivered, lost, &now, &rs);
-	tcp_cong_control(sk, ack, delivered, flag);
+	tcp_cong_control(sk, ack, delivered, flag, &rs);
 	tcp_xmit_recovery(sk, rexmit);
 	return 1;
 
@@ -5981,7 +5991,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		} else
 			tcp_init_metrics(sk);
 
-		tcp_update_pacing_rate(sk);
+		if (!inet_csk(sk)->icsk_ca_ops->cong_control)
+			tcp_update_pacing_rate(sk);
 
 		/* Prevent spurious tcp_cwnd_restart() on first data packet */
 		tp->lsndtime = tcp_time_stamp;
-- 
2.8.0.rc3.226.g39d4020

  parent reply	other threads:[~2016-09-18 22:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-18 22:03 [PATCH v3 net-next 00/16] tcp: BBR congestion control algorithm Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 01/16] tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 02/16] lib/win_minmax: windowed min or max estimator Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 03/16] tcp: use windowed min filter library for TCP min_rtt estimation Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 04/16] net_sched: sch_fq: add low_rate_threshold parameter Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 05/16] tcp: switch back to proper tcp_skb_cb size check in tcp_init() Neal Cardwell
2016-09-19 14:37   ` Lance Richardson
2016-09-19 14:41     ` Eric Dumazet
2016-09-18 22:03 ` [PATCH v3 net-next 06/16] tcp: count packets marked lost for a TCP connection Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 07/16] tcp: track data delivery rate " Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 08/16] tcp: track application-limited rate samples Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 09/16] tcp: export data delivery rate Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 10/16] tcp: allow congestion control module to request TSO skb segment count Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 11/16] tcp: export tcp_tso_autosize() and parameterize minimum number of TSO segments Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 12/16] tcp: export tcp_mss_to_mtu() for congestion control modules Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 13/16] tcp: allow congestion control to expand send buffer differently Neal Cardwell
2016-09-18 22:03 ` Neal Cardwell [this message]
2016-09-18 22:03 ` [PATCH v3 net-next 15/16] tcp: increase ICSK_CA_PRIV_SIZE from 64 bytes to 88 Neal Cardwell
2016-09-18 22:03 ` [PATCH v3 net-next 16/16] tcp_bbr: add BBR congestion control Neal Cardwell
     [not found]   ` <CA++eYdtWkMqT1zk_D00H1TciYb_4+aQ6-96YzG1n_h4LLk663g@mail.gmail.com>
2016-09-19  2:43     ` Neal Cardwell
2016-09-19 20:57   ` Stephen Hemminger
2016-09-19 21:10     ` Eric Dumazet
2016-09-19 21:17       ` Rick Jones
2016-09-19 21:23         ` Eric Dumazet
2016-09-19 23:28       ` Stephen Hemminger
2016-09-19 23:33         ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1474236233-28511-15-git-send-email-ncardwell@google.com \
    --to=ncardwell@google.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=nanditad@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=vanj@google.com \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).