From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>,
Neal Cardwell <ncardwell@google.com>,
Yuchung Cheng <ycheng@google.com>,
Soheil Hassas Yeganeh <soheil@google.com>,
Gasper Zejn <zelo.zejn@gmail.com>
Cc: netdev <netdev@vger.kernel.org>,
Eric Dumazet <edumazet@google.com>,
Eric Dumazet <eric.dumazet@gmail.com>
Subject: [PATCH net-next 3/7] tcp: mitigate scheduling jitter in EDT pacing model
Date: Mon, 15 Oct 2018 09:37:54 -0700 [thread overview]
Message-ID: <20181015163758.232436-4-edumazet@google.com> (raw)
In-Reply-To: <20181015163758.232436-1-edumazet@google.com>
In commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers
drifts") we added a mitigation for scheduling jitter in fq packet scheduler.
This patch does the same in TCP stack, now it is using EDT model.
Note that this mitigation is valid for both external (fq packet scheduler)
or internal TCP pacing.
This uses the same strategy than the above commit, allowing
a time credit of half the packet currently sent.
Consider following case :
An skb is sent, after an idle period of 300 usec.
The air-time (skb->len/pacing_rate) is 500 usec
Instead of setting the pacing timer to now+500 usec,
it will use now+min(500/2, 300) -> now+250usec
This is like having a token bucket with a depth of half
an skb.
Tested:
tc qdisc replace dev eth0 root pfifo_fast
Before
netperf -P0 -H remote -- -q 1000000000 # 8000Mbit
540000 262144 262144 10.00 7710.43
After :
netperf -P0 -H remote -- -q 1000000000 # 8000 Mbit
540000 262144 262144 10.00 7999.75 # Much closer to 8000Mbit target
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp_output.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f4aa4109334a043d02b17b18bef346d805dab501..5474c9854f252e50cdb1136435417873861d7618 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -985,7 +985,8 @@ static void tcp_internal_pacing(struct sock *sk)
sock_hold(sk);
}
-static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb)
+static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb,
+ u64 prior_wstamp)
{
struct tcp_sock *tp = tcp_sk(sk);
@@ -998,7 +999,12 @@ static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb)
* this is a minor annoyance.
*/
if (rate != ~0UL && rate && tp->data_segs_out >= 10) {
- tp->tcp_wstamp_ns += div64_ul((u64)skb->len * NSEC_PER_SEC, rate);
+ u64 len_ns = div64_ul((u64)skb->len * NSEC_PER_SEC, rate);
+ u64 credit = tp->tcp_wstamp_ns - prior_wstamp;
+
+ /* take into account OS jitter */
+ len_ns -= min_t(u64, len_ns / 2, credit);
+ tp->tcp_wstamp_ns += len_ns;
tcp_internal_pacing(sk);
}
@@ -1029,6 +1035,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
struct sk_buff *oskb = NULL;
struct tcp_md5sig_key *md5;
struct tcphdr *th;
+ u64 prior_wstamp;
int err;
BUG_ON(!skb || !tcp_skb_pcount(skb));
@@ -1050,7 +1057,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
return -ENOBUFS;
}
- /* TODO: might take care of jitter here */
+ prior_wstamp = tp->tcp_wstamp_ns;
tp->tcp_wstamp_ns = max(tp->tcp_wstamp_ns, tp->tcp_clock_cache);
skb->skb_mstamp_ns = tp->tcp_wstamp_ns;
@@ -1169,7 +1176,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
err = net_xmit_eval(err);
}
if (!err && oskb) {
- tcp_update_skb_after_send(sk, oskb);
+ tcp_update_skb_after_send(sk, oskb, prior_wstamp);
tcp_rate_skb_sent(sk, oskb);
}
return err;
@@ -2321,7 +2328,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
if (unlikely(tp->repair) && tp->repair_queue == TCP_SEND_QUEUE) {
/* "skb_mstamp" is used as a start point for the retransmit timer */
- tcp_update_skb_after_send(sk, skb);
+ tcp_update_skb_after_send(sk, skb, tp->tcp_wstamp_ns);
goto repair; /* Skip network transmission */
}
@@ -2896,7 +2903,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
} tcp_skb_tsorted_restore(skb);
if (!err) {
- tcp_update_skb_after_send(sk, skb);
+ tcp_update_skb_after_send(sk, skb, tp->tcp_wstamp_ns);
tcp_rate_skb_sent(sk, skb);
}
} else {
--
2.19.0.605.g01d371f741-goog
next prev parent reply other threads:[~2018-10-16 0:24 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-15 16:37 [PATCH net-next 0/7] tcp: second round for EDT conversion Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 1/7] tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 2/7] net: extend sk_pacing_rate to unsigned long Eric Dumazet
2018-10-15 16:37 ` Eric Dumazet [this message]
2018-10-15 16:37 ` [PATCH net-next 4/7] net_sched: sch_fq: no longer use skb_is_tcp_pure_ack() Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 5/7] tcp: optimize tcp internal pacing Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 6/7] tcp_bbr: fix typo in bbr_pacing_margin_percent Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 7/7] tcp: cdg: use tcp high resolution clock cache Eric Dumazet
2018-10-16 5:58 ` [PATCH net-next 0/7] tcp: second round for EDT conversion David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181015163758.232436-4-edumazet@google.com \
--to=edumazet@google.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=soheil@google.com \
--cc=ycheng@google.com \
--cc=zelo.zejn@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox