All of lore.kernel.org
 help / color / mirror / Atom feed
From: brakmo <brakmo@fb.com>
To: netdev <netdev@vger.kernel.org>
Cc: Martin Lau <kafai@fb.com>, Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Kernel Team <Kernel-team@fb.com>
Subject: [PATCH bpf-next 5/7] bpf: sysctl for probe_on_drop
Date: Sat, 23 Mar 2019 01:05:40 -0700	[thread overview]
Message-ID: <20190323080542.173569-6-brakmo@fb.com> (raw)
In-Reply-To: <20190323080542.173569-1-brakmo@fb.com>

When a packet is dropped when calling queue_xmit in  __tcp_transmit_skb
and packets_out is 0, it is beneficial to set a small probe timer.
Otherwise, the throughput for the flow can suffer because it may need to
depend on the probe timer to start sending again. The default value for
the probe timer is at least 200ms, this patch sets it to 20ms when a
packet is dropped and there are no other packets in flight.

This patch introduces a new sysctl, sysctl_tcp_probe_on_drop_ms, that is
used to specify the duration of the probe timer for the case described
earlier. The allowed values are between 0 and TCP_RTO_MIN. A value of 0
disables setting the probe timer with a small value.

Signed-off-by: Lawrence Brakmo <brakmo@fb.com>
---
 include/net/netns/ipv4.h   |  1 +
 net/ipv4/sysctl_net_ipv4.c | 10 ++++++++++
 net/ipv4/tcp_ipv4.c        |  1 +
 net/ipv4/tcp_output.c      | 18 +++++++++++++++---
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 104a6669e344..d5716a193883 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -165,6 +165,7 @@ struct netns_ipv4 {
 	int sysctl_tcp_wmem[3];
 	int sysctl_tcp_rmem[3];
 	int sysctl_tcp_comp_sack_nr;
+	int sysctl_tcp_probe_on_drop_ms;
 	unsigned long sysctl_tcp_comp_sack_delay_ns;
 	struct inet_timewait_death_row tcp_death_row;
 	int sysctl_max_syn_backlog;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index ba0fc4b18465..50837e66313f 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -49,6 +49,7 @@ static int ip_ping_group_range_min[] = { 0, 0 };
 static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
 static int comp_sack_nr_max = 255;
 static u32 u32_max_div_HZ = UINT_MAX / HZ;
+static int probe_on_drop_max = TCP_RTO_MIN;
 
 /* obsolete */
 static int sysctl_tcp_low_latency __read_mostly;
@@ -1219,6 +1220,15 @@ static struct ctl_table ipv4_net_table[] = {
 		.extra1		= &zero,
 		.extra2		= &comp_sack_nr_max,
 	},
+	{
+		.procname	= "tcp_probe_on_drop_ms",
+		.data		= &init_net.ipv4.sysctl_tcp_probe_on_drop_ms,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &probe_on_drop_max,
+	},
 	{
 		.procname	= "udp_rmem_min",
 		.data		= &init_net.ipv4.sysctl_udp_rmem_min,
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 277d71239d75..5aba95850d61 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2679,6 +2679,7 @@ static int __net_init tcp_sk_init(struct net *net)
 	spin_lock_init(&net->ipv4.tcp_fastopen_ctx_lock);
 	net->ipv4.sysctl_tcp_fastopen_blackhole_timeout = 60 * 60;
 	atomic_set(&net->ipv4.tfo_active_disable_times, 0);
+	net->ipv4.sysctl_tcp_probe_on_drop_ms = 20;
 
 	/* Reno is always built in */
 	if (!net_eq(net, &init_net) &&
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4522579aaca2..95a0102fde3b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1158,9 +1158,21 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
 
 	err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl);
 
-	if (unlikely(err > 0)) {
-		tcp_enter_cwr(sk);
-		err = net_xmit_eval(err);
+	if (unlikely(err)) {
+		if (unlikely(err > 0)) {
+			tcp_enter_cwr(sk);
+			err = net_xmit_eval(err);
+		}
+		/* Packet was dropped. If there are no packets out,
+		 * we may need to depend on probe timer to start sending
+		 * again. Hence, use a smaller value.
+		 */
+		if (!tp->packets_out && !inet_csk(sk)->icsk_pending &&
+		    sock_net(sk)->ipv4.sysctl_tcp_probe_on_drop_ms > 0) {
+			tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0,
+					     sock_net(sk)->ipv4.sysctl_tcp_probe_on_drop_ms,
+					     TCP_RTO_MAX, NULL);
+		}
 	}
 	if (!err && oskb) {
 		tcp_update_skb_after_send(sk, oskb, prior_wstamp);
-- 
2.17.1


  parent reply	other threads:[~2019-03-23  8:07 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-23  8:05 [PATCH bpf-next 0/7] bpf: Propagate cn to TCP brakmo
2019-03-23  8:05 ` [PATCH bpf-next 1/7] bpf: Create BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY brakmo
2019-03-23  8:05 ` [PATCH bpf-next 2/7] bpf: cgroup inet skb programs can return 0 to 3 brakmo
2019-03-23  8:05 ` [PATCH bpf-next 3/7] bpf: Update __cgroup_bpf_run_filter_skb with cn brakmo
2019-03-23  8:05 ` [PATCH bpf-next 4/7] bpf: Update BPF_CGROUP_RUN_PROG_INET_EGRESS calls brakmo
2019-03-23  8:05 ` brakmo [this message]
2019-03-23  8:05 ` [PATCH bpf-next 6/7] bpf: Add cn support to hbm_out_kern.c brakmo
2019-03-23  8:05 ` [PATCH bpf-next 7/7] bpf: Add more stats to HBM brakmo
2019-03-23  9:12 ` [PATCH bpf-next 0/7] bpf: Propagate cn to TCP Eric Dumazet
2019-03-23 15:41   ` Alexei Starovoitov
2019-03-24  5:36     ` Eric Dumazet
2019-03-24 16:19       ` Alexei Starovoitov
2019-03-25  8:33         ` Eric Dumazet
2019-03-25  8:48           ` Eric Dumazet
2019-03-26  4:27             ` Alexei Starovoitov
2019-03-26  8:06               ` Eric Dumazet
2019-03-26 15:07                 ` Alexei Starovoitov
2019-03-26 15:43                   ` Eric Dumazet
2019-03-26 17:01                     ` Alexei Starovoitov
2019-03-26 18:07                       ` Eric Dumazet
2019-03-26  8:13               ` Eric Dumazet
2019-03-24  5:48     ` Eric Dumazet
2019-03-24  1:14   ` Lawrence Brakmo
2019-03-24  5:58     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190323080542.173569-6-brakmo@fb.com \
    --to=brakmo@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.