Netdev List
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>,
	Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	Gasper Zejn <zelo.zejn@gmail.com>
Cc: netdev <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>
Subject: [PATCH net-next 5/7] tcp: optimize tcp internal pacing
Date: Mon, 15 Oct 2018 09:37:56 -0700	[thread overview]
Message-ID: <20181015163758.232436-6-edumazet@google.com> (raw)
In-Reply-To: <20181015163758.232436-1-edumazet@google.com>

When TCP implements its own pacing (when no fq packet scheduler is used),
it is arming high resolution timer after a packet is sent.

But in many cases (like TCP_RR kind of workloads), this high resolution
timer expires before the application attempts to write the following
packet. This overhead also happens when the flow is ACK clocked and
cwnd limited instead of being limited by the pacing rate.

This leads to extra overhead (high number of IRQ)

Now tcp_wstamp_ns is reserved for the pacing timer only
(after commit "tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh"),
we can setup the timer only when a packet is about to be sent,
and if tcp_wstamp_ns is in the future.

This leads to a ~10% performance increase in TCP_RR workloads.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_output.c | 31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5474c9854f252e50cdb1136435417873861d7618..d212e4cbc68902e873afb4a12b43b467ccd6069b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -975,16 +975,6 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)
 	return HRTIMER_NORESTART;
 }
 
-static void tcp_internal_pacing(struct sock *sk)
-{
-	if (!tcp_needs_internal_pacing(sk))
-		return;
-	hrtimer_start(&tcp_sk(sk)->pacing_timer,
-		      ns_to_ktime(tcp_sk(sk)->tcp_wstamp_ns),
-		      HRTIMER_MODE_ABS_PINNED_SOFT);
-	sock_hold(sk);
-}
-
 static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb,
 				      u64 prior_wstamp)
 {
@@ -1005,8 +995,6 @@ static void tcp_update_skb_after_send(struct sock *sk, struct sk_buff *skb,
 			/* take into account OS jitter */
 			len_ns -= min_t(u64, len_ns / 2, credit);
 			tp->tcp_wstamp_ns += len_ns;
-
-			tcp_internal_pacing(sk);
 		}
 	}
 	list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
@@ -2186,10 +2174,23 @@ static int tcp_mtu_probe(struct sock *sk)
 	return -1;
 }
 
-static bool tcp_pacing_check(const struct sock *sk)
+static bool tcp_pacing_check(struct sock *sk)
 {
-	return tcp_needs_internal_pacing(sk) &&
-	       hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
+	struct tcp_sock *tp = tcp_sk(sk);
+
+	if (!tcp_needs_internal_pacing(sk))
+		return false;
+
+	if (tp->tcp_wstamp_ns <= tp->tcp_clock_cache)
+		return false;
+
+	if (!hrtimer_is_queued(&tp->pacing_timer)) {
+		hrtimer_start(&tp->pacing_timer,
+			      ns_to_ktime(tp->tcp_wstamp_ns),
+			      HRTIMER_MODE_ABS_PINNED_SOFT);
+		sock_hold(sk);
+	}
+	return true;
 }
 
 /* TCP Small Queues :
-- 
2.19.0.605.g01d371f741-goog

  parent reply	other threads:[~2018-10-16  0:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-15 16:37 [PATCH net-next 0/7] tcp: second round for EDT conversion Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 1/7] tcp: do not change tcp_wstamp_ns in tcp_mstamp_refresh Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 2/7] net: extend sk_pacing_rate to unsigned long Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 3/7] tcp: mitigate scheduling jitter in EDT pacing model Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 4/7] net_sched: sch_fq: no longer use skb_is_tcp_pure_ack() Eric Dumazet
2018-10-15 16:37 ` Eric Dumazet [this message]
2018-10-15 16:37 ` [PATCH net-next 6/7] tcp_bbr: fix typo in bbr_pacing_margin_percent Eric Dumazet
2018-10-15 16:37 ` [PATCH net-next 7/7] tcp: cdg: use tcp high resolution clock cache Eric Dumazet
2018-10-16  5:58 ` [PATCH net-next 0/7] tcp: second round for EDT conversion David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181015163758.232436-6-edumazet@google.com \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=ycheng@google.com \
    --cc=zelo.zejn@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox