[PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: kerneljasonxing@gmail.com
To: gregkh@linuxfoundation.org, edumazet@google.com,
	ncardwell@google.com, davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org
Cc: netdev@vger.kernel.org, kerneljasonxing@gmail.com,
	linux-kernel@vger.kernel.org, liweishi@kuaishou.com,
	lishujin@kuaishou.com
Subject: [PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode
Date: Thu,  4 Jun 2020 17:00:14 +0800	[thread overview]
Message-ID: <20200604090014.23266-1-kerneljasonxing@gmail.com> (raw)
In-Reply-To: <20200602080425.93712-1-kerneljasonxing@gmail.com>

From: Jason Xing <kerneljasonxing@gmail.com>

When using BBR mode, too many tcp socks cannot be released because of
duplicate use of the sock_hold() in the manner of tcp_internal_pacing()
when RTO happens. Therefore, this situation maddly increases the slab
memory and then constantly triggers the OOM until crash.

Besides, in addition to BBR mode, if some mode applies pacing function,
it could trigger what we've discussed above,

Reproduce procedure:
0) cat /proc/slabinfo | grep TCP
1) switch net.ipv4.tcp_congestion_control to bbr
2) using wrk tool something like that to send packages
3) using tc to increase the delay and loss to simulate the RTO case.
4) cat /proc/slabinfo | grep TCP
5) kill the wrk command and observe the number of objects and slabs in
TCP.
6) at last, you could notice that the number would not decrease.

v2: extend the timer which could cover all those related potential risks
(suggested by Eric Dumazet and Neal Cardwell)

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
Signed-off-by: liweishi <liweishi@kuaishou.com>
Signed-off-by: Shujin Li <lishujin@kuaishou.com>
---
 net/ipv4/tcp_output.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cc4ba42..4626f4e 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -966,6 +966,8 @@ enum hrtimer_restart tcp_pace_kick(struct hrtimer *timer)
 
 static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 {
+	struct tcp_sock *tp = tcp_sk(sk);
+	ktime_t expire, now;
 	u64 len_ns;
 	u32 rate;
 
@@ -977,12 +979,28 @@ static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 
 	len_ns = (u64)skb->len * NSEC_PER_SEC;
 	do_div(len_ns, rate);
-	hrtimer_start(&tcp_sk(sk)->pacing_timer,
-		      ktime_add_ns(ktime_get(), len_ns),
+	now = ktime_get();
+	/* If hrtimer is already armed, then our caller has not
+	 * used tcp_pacing_check().
+	 */
+	if (unlikely(hrtimer_is_queued(&tp->pacing_timer))) {
+		expire = hrtimer_get_softexpires(&tp->pacing_timer);
+		if (ktime_after(expire, now))
+			now = expire;
+		if (hrtimer_try_to_cancel(&tp->pacing_timer) == 1)
+			__sock_put(sk);
+	}
+	hrtimer_start(&tp->pacing_timer, ktime_add_ns(now, len_ns),
 		      HRTIMER_MODE_ABS_PINNED_SOFT);
 	sock_hold(sk);
 }
 
+static bool tcp_pacing_check(const struct sock *sk)
+{
+	return tcp_needs_internal_pacing(sk) &&
+	       hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
+}
+
 static void tcp_update_skb_after_send(struct tcp_sock *tp, struct sk_buff *skb)
 {
 	skb->skb_mstamp = tp->tcp_mstamp;
@@ -2117,6 +2135,9 @@ static int tcp_mtu_probe(struct sock *sk)
 	if (!tcp_can_coalesce_send_queue_head(sk, probe_size))
 		return -1;
 
+	if (tcp_pacing_check(sk))
+		return -1;
+
 	/* We're allowed to probe.  Build it now. */
 	nskb = sk_stream_alloc_skb(sk, probe_size, GFP_ATOMIC, false);
 	if (!nskb)
@@ -2190,12 +2211,6 @@ static int tcp_mtu_probe(struct sock *sk)
 	return -1;
 }
 
-static bool tcp_pacing_check(const struct sock *sk)
-{
-	return tcp_needs_internal_pacing(sk) &&
-	       hrtimer_is_queued(&tcp_sk(sk)->pacing_timer);
-}
-
 /* TCP Small Queues :
  * Control number of packets in qdisc/devices to two packets / or ~1 ms.
  * (These limits are doubled for retransmits)
-- 
1.8.3.1

next prev parent reply	other threads:[~2020-06-04  9:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-02  8:04 [PATCH] tcp: fix TCP socks unreleased in BBR mode kerneljasonxing
2020-06-02 13:05 ` Eric Dumazet
2020-06-03  1:53   ` Jason Xing
2020-06-03  2:14     ` David Miller
2020-06-03  2:29     ` Eric Dumazet
2020-06-03  2:41       ` Jason Xing
2020-06-03  2:43         ` Eric Dumazet
2020-06-03  2:48           ` Jason Xing
2020-06-03  5:05           ` Jason Xing
2020-06-03  5:44             ` Eric Dumazet
2020-06-03  6:32               ` Jason Xing
2020-06-03 12:02               ` Neal Cardwell
2020-06-03 13:51                 ` Jason Xing
2020-06-03 13:54                 ` Eric Dumazet
2020-06-03 14:07                   ` Neal Cardwell
2020-06-04  8:40                     ` Jason Xing
2020-06-04  9:00 ` kerneljasonxing [this message]
2020-06-04 13:10   ` [PATCH v2 4.19] " Eric Dumazet
2020-06-04 13:47     ` Jason Xing
2020-08-11 10:37       ` Jason Xing
2020-08-11 15:32         ` Eric Dumazet
2021-04-30 10:51           ` Jason Xing

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:cc4ba42 dfblob:4626f4e )
 OR (
bs:"[PATCH v2 4.19] tcp: fix TCP socks unreleased in BBR mode" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200604090014.23266-1-kerneljasonxing@gmail.com \
    --to=kerneljasonxing@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lishujin@kuaishou.com \
    --cc=liweishi@kuaishou.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).