From: "Bendik Rønning Opstad" <bro.devel@gmail.com>
To: "David S. Miller" <davem@davemloft.net>,
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>,
James Morris <jmorris@namei.org>,
Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
Patrick McHardy <kaber@trash.net>,
Jonathan Corbet <corbet@lwn.net>
Cc: "Eric Dumazet" <edumazet@google.com>,
"Neal Cardwell" <ncardwell@google.com>,
"Tom Herbert" <tom@herbertland.com>,
"Yuchung Cheng" <ycheng@google.com>,
"Paolo Abeni" <pabeni@redhat.com>, "Erik Kline" <ek@google.com>,
"Hannes Frederic Sowa" <hannes@stressinduktion.org>,
"Al Viro" <viro@zeniv.linux.org.uk>,
"Jiri Pirko" <jiri@resnulli.us>,
"Alexander Duyck" <alexander.h.duyck@redhat.com>,
"Florian Westphal" <fw@strlen.de>,
"Daniel Lee" <Longinus00@gmail.com>,
"Marcelo Ricardo Leitner" <mleitner@redhat.com>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"Willem de Bruijn" <willemb@google.com>,
"Linus Lüssing" <linus.luessing@c0d3.blue>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, linux-api@vger.kernel.org,
"Andreas Petlund" <apetlund@simula.no>,
"Carsten Griwodz" <griff@simula.no>,
"Pål Halvorsen" <paalh@simula.no>
Subject: [PATCH RFC net-next 1/2] tcp: Add DPIFL thin stream detection mechanism
Date: Fri, 23 Oct 2015 22:50:12 +0200 [thread overview]
Message-ID: <1445633413-3532-2-git-send-email-bro.devel+kernel@gmail.com> (raw)
In-Reply-To: <1445633413-3532-1-git-send-email-bro.devel+kernel@gmail.com>
The existing mechanism for detecting thin streams (tcp_stream_is_thin)
is based on a static limit of less than 4 packets in flight. This treats
streams differently depending on the connections RTT, such that a stream
on a high RTT link may never be considered thin, whereas the same
application would produce a stream that would always be thin in a low RTT
scenario (e.g. data center).
By calculating a dynamic packets in flight limit (DPIFL), the thin stream
detection will be independent of the RTT and treat streams equally based
on the transmission pattern, i.e. the inter-transmission time (ITT).
Cc: Andreas Petlund <apetlund@simula.no>
Cc: Carsten Griwodz <griff@simula.no>
Cc: Pål Halvorsen <paalh@simula.no>
Cc: Jonas Markussen <jonassm@ifi.uio.no>
Cc: Kristian Evensen <kristian.evensen@gmail.com>
Cc: Kenneth Klette Jonassen <kennetkl@ifi.uio.no>
Signed-off-by: Bendik Rønning Opstad <bro.devel+kernel@gmail.com>
---
Documentation/networking/ip-sysctl.txt | 8 ++++++++
include/linux/tcp.h | 6 ++++++
include/net/tcp.h | 20 ++++++++++++++++++++
net/ipv4/sysctl_net_ipv4.c | 9 +++++++++
net/ipv4/tcp.c | 3 +++
5 files changed, 46 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 85752c8..b841a76 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -700,6 +700,14 @@ tcp_thin_dupack - BOOLEAN
Documentation/networking/tcp-thin.txt
Default: 0
+tcp_thin_dpifl_itt_lower_bound - INTEGER
+ Controls the lower bound for ITT (inter-transmission time) threshold
+ for when a stream is considered thin. The value is specified in
+ microseconds, and may not be lower than 10000 (10 ms). This theshold
+ is used to calculate a dynamic packets in flight limit (DPIFL) which
+ is used to classify whether a stream is thin.
+ Default: 10000
+
tcp_limit_output_bytes - INTEGER
Controls TCP Small Queue limit per tcp socket.
TCP bulk sender tends to increase packets in flight until it
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index c906f45..fc885db 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -269,6 +269,12 @@ struct tcp_sock {
struct sk_buff* lost_skb_hint;
struct sk_buff *retransmit_skb_hint;
+ /* The limit used to identify when a stream is thin based in a minimum
+ * allowed inter-transmission time (ITT) in microseconds. This is used
+ * to dynamically calculate a max packets in flight limit (DPIFL).
+ */
+ int thin_dpifl_itt_lower_bound;
+
/* OOO segments go in this list. Note that socket lock must be held,
* as we do not use sk_buff_head lock.
*/
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 4fc457b..6534836 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -215,6 +215,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
/* TCP thin-stream limits */
#define TCP_THIN_LINEAR_RETRIES 6 /* After 6 linear retries, do exp. backoff */
+#define TCP_THIN_DPIFL_ITT_LOWER_BOUND_MIN 10000 /* Minimum lower bound is 10 ms (10000 usec) */
/* TCP initial congestion window as per draft-hkchu-tcpm-initcwnd-01 */
#define TCP_INIT_CWND 10
@@ -274,6 +275,7 @@ extern int sysctl_tcp_workaround_signed_windows;
extern int sysctl_tcp_slow_start_after_idle;
extern int sysctl_tcp_thin_linear_timeouts;
extern int sysctl_tcp_thin_dupack;
+extern int sysctl_tcp_thin_dpifl_itt_lower_bound;
extern int sysctl_tcp_early_retrans;
extern int sysctl_tcp_limit_output_bytes;
extern int sysctl_tcp_challenge_ack_limit;
@@ -1631,6 +1633,24 @@ static inline bool tcp_stream_is_thin(struct tcp_sock *tp)
return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp);
}
+/**
+ * tcp_stream_is_thin_dpifl() - Tests if the stream is thin based on dynamic PIF
+ * limit
+ * @tp: the tcp_sock struct
+ *
+ * Return: true if current packets in flight (PIF) count is lower than
+ * the dynamic PIF limit, else false
+ */
+static inline bool tcp_stream_is_thin_dpifl(const struct tcp_sock *tp)
+{
+ u64 dpif_lim = tp->srtt_us >> 3;
+ /* Div by is_thin_min_itt_lim, the minimum allowed ITT
+ * (Inter-transmission time) in usecs.
+ */
+ do_div(dpif_lim, tp->thin_dpifl_itt_lower_bound);
+ return tcp_packets_in_flight(tp) < dpif_lim;
+}
+
/* /proc */
enum tcp_seq_states {
TCP_SEQ_STATE_LISTENING,
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 25300c5..917fdde 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -42,6 +42,7 @@ static int tcp_syn_retries_min = 1;
static int tcp_syn_retries_max = MAX_TCP_SYNCNT;
static int ip_ping_group_range_min[] = { 0, 0 };
static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX };
+static int tcp_thin_dpifl_itt_lower_bound_min = TCP_THIN_DPIFL_ITT_LOWER_BOUND_MIN;
/* Update system visible IP port range */
static void set_local_port_range(struct net *net, int range[2])
@@ -709,6 +710,14 @@ static struct ctl_table ipv4_table[] = {
.proc_handler = proc_dointvec
},
{
+ .procname = "tcp_thin_dpifl_itt_lower_bound",
+ .data = &sysctl_tcp_thin_dpifl_itt_lower_bound,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec_minmax,
+ .extra1 = &tcp_thin_dpifl_itt_lower_bound_min,
+ },
+ {
.procname = "tcp_early_retrans",
.data = &sysctl_tcp_early_retrans,
.maxlen = sizeof(int),
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0cfa7c0..f712d7c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -287,6 +287,8 @@ int sysctl_tcp_min_tso_segs __read_mostly = 2;
int sysctl_tcp_autocorking __read_mostly = 1;
+int sysctl_tcp_thin_dpifl_itt_lower_bound __read_mostly = TCP_THIN_DPIFL_ITT_LOWER_BOUND_MIN;
+
struct percpu_counter tcp_orphan_count;
EXPORT_SYMBOL_GPL(tcp_orphan_count);
@@ -406,6 +408,7 @@ void tcp_init_sock(struct sock *sk)
u64_stats_init(&tp->syncp);
tp->reordering = sysctl_tcp_reordering;
+ tp->thin_dpifl_itt_lower_bound = sysctl_tcp_thin_dpifl_itt_lower_bound;
tcp_enable_early_retrans(tp);
tcp_assign_congestion_control(sk);
--
1.9.1
next prev parent reply other threads:[~2015-10-23 20:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-23 20:50 [PATCH RFC net-next 0/2] tcp: Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-23 20:50 ` Bendik Rønning Opstad [this message]
2015-10-23 21:44 ` [PATCH RFC net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Eric Dumazet
[not found] ` <1445636654.22974.193.camel-XN9IlZ5yJG9HTL0Zs8A6p/gx64E7kk8eUsxypvmhUTTZJqsBc5GL+g@public.gmane.org>
2015-10-25 5:56 ` Bendik Rønning Opstad
2015-10-23 20:50 ` [PATCH RFC net-next 2/2] tcp: Add Redundant Data Bundling (RDB) Bendik Rønning Opstad
2015-10-26 14:50 ` Neal Cardwell
2015-10-26 21:35 ` Andreas Petlund
2015-10-26 21:58 ` Yuchung Cheng
2015-10-27 19:15 ` Jonas Markussen
2015-10-29 22:53 ` Bendik Rønning Opstad
2015-11-02 9:18 ` David Laight
2015-11-02 9:37 ` David Laight
2015-11-05 2:06 ` Bendik Rønning Opstad
2015-10-24 6:11 ` [PATCH RFC net-next 0/2] tcp: " Yuchung Cheng
2015-10-24 8:00 ` Jonas Markussen
[not found] ` <61F74109-9FDC-485A-978B-714B7AA27445-6miFZF/5cTBuMpJDpNschA@public.gmane.org>
2015-10-24 12:57 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1445633413-3532-2-git-send-email-bro.devel+kernel@gmail.com \
--to=bro.devel@gmail.com \
--cc=Longinus00@gmail.com \
--cc=alexander.h.duyck@redhat.com \
--cc=apetlund@simula.no \
--cc=corbet@lwn.net \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=ek@google.com \
--cc=fw@strlen.de \
--cc=griff@simula.no \
--cc=hannes@stressinduktion.org \
--cc=jiri@resnulli.us \
--cc=jmorris@namei.org \
--cc=kaber@trash.net \
--cc=kuznet@ms2.inr.ac.ru \
--cc=linus.luessing@c0d3.blue \
--cc=linux-api@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mleitner@redhat.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=paalh@simula.no \
--cc=pabeni@redhat.com \
--cc=tom@herbertland.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willemb@google.com \
--cc=ycheng@google.com \
--cc=yoshfuji@linux-ipv6.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).