netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior
@ 2016-04-14  5:05 Eric Dumazet
  2016-04-14  5:05 ` [PATCH net-next 1/2] tcp: do not mess with listener sk_wmem_alloc Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Dumazet @ 2016-04-14  5:05 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

In the first patch, I remove the costly association of SYNACK+COOKIES
to a listener. I believe other parts of the stack should be ready.

The second patch removes a useless write into listener socket
in tcp_rcv_state_process(), incurring false sharing in
tcp_conn_request()

Performance under SYNFLOOD goes from 3.2 Mpps to 6 Mpps.

Test was using a single TCP listener, on a host with 8 RX queues
on the NIC, and 24 cores (48 ht)

Eric Dumazet (2):
  tcp: do not mess with listener sk_wmem_alloc
  tcp: remove false sharing in tcp_rcv_state_process()

 include/net/tcp.h     |  9 +++++++--
 net/ipv4/tcp_input.c  | 11 ++++++-----
 net/ipv4/tcp_ipv4.c   |  4 ++--
 net/ipv4/tcp_output.c | 16 ++++++++++++----
 net/ipv6/tcp_ipv6.c   |  4 ++--
 5 files changed, 29 insertions(+), 15 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH net-next 1/2] tcp: do not mess with listener sk_wmem_alloc
  2016-04-14  5:05 [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior Eric Dumazet
@ 2016-04-14  5:05 ` Eric Dumazet
  2016-04-14  5:05 ` [PATCH net-next 2/2] tcp: remove false sharing in tcp_rcv_state_process() Eric Dumazet
  2016-04-15 20:46 ` [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2016-04-14  5:05 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

When removing sk_refcnt manipulation on synflood, I missed that
using skb_set_owner_w() was racy, if sk->sk_wmem_alloc had already
transitioned to 0.

We should hold sk_refcnt instead, but this is a big deal under attack.
(Doing so increase performance from 3.2 Mpps to 3.8 Mpps only)

In this patch, I chose to not attach a socket to syncookies skb.

Performance is now 5 Mpps instead of 3.2 Mpps.

Following patch will remove last known false sharing in
tcp_rcv_state_process()

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/tcp.h     |  9 +++++++--
 net/ipv4/tcp_input.c  |  7 ++++---
 net/ipv4/tcp_ipv4.c   |  4 ++--
 net/ipv4/tcp_output.c | 16 ++++++++++++----
 net/ipv6/tcp_ipv6.c   |  4 ++--
 5 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 74d3ed5eb219..fd40f8c64d5f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -452,10 +452,15 @@ struct sock *tcp_v4_syn_recv_sock(const struct sock *sk, struct sk_buff *skb,
 int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb);
 int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len);
 int tcp_connect(struct sock *sk);
+enum tcp_synack_type {
+	TCP_SYNACK_NORMAL,
+	TCP_SYNACK_FASTOPEN,
+	TCP_SYNACK_COOKIE,
+};
 struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 				struct request_sock *req,
 				struct tcp_fastopen_cookie *foc,
-				bool attach_req);
+				enum tcp_synack_type synack_type);
 int tcp_disconnect(struct sock *sk, int flags);
 
 void tcp_finish_connect(struct sock *sk, struct sk_buff *skb);
@@ -1728,7 +1733,7 @@ struct tcp_request_sock_ops {
 	int (*send_synack)(const struct sock *sk, struct dst_entry *dst,
 			   struct flowi *fl, struct request_sock *req,
 			   struct tcp_fastopen_cookie *foc,
-			   bool attach_req);
+			   enum tcp_synack_type synack_type);
 };
 
 #ifdef CONFIG_SYN_COOKIES
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 983f04c11177..7ea7034af83f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6327,7 +6327,7 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 	}
 	if (fastopen_sk) {
 		af_ops->send_synack(fastopen_sk, dst, &fl, req,
-				    &foc, false);
+				    &foc, TCP_SYNACK_FASTOPEN);
 		/* Add the child socket directly into the accept queue */
 		inet_csk_reqsk_queue_add(sk, req, fastopen_sk);
 		sk->sk_data_ready(sk);
@@ -6337,8 +6337,9 @@ int tcp_conn_request(struct request_sock_ops *rsk_ops,
 		tcp_rsk(req)->tfo_listener = false;
 		if (!want_cookie)
 			inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
-		af_ops->send_synack(sk, dst, &fl, req,
-				    &foc, !want_cookie);
+		af_ops->send_synack(sk, dst, &fl, req, &foc,
+				    !want_cookie ? TCP_SYNACK_NORMAL :
+						   TCP_SYNACK_COOKIE);
 		if (want_cookie) {
 			reqsk_free(req);
 			return 0;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index f4f2a0a3849d..d2a5763e5abc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -830,7 +830,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
 			      struct flowi *fl,
 			      struct request_sock *req,
 			      struct tcp_fastopen_cookie *foc,
-				  bool attach_req)
+			      enum tcp_synack_type synack_type)
 {
 	const struct inet_request_sock *ireq = inet_rsk(req);
 	struct flowi4 fl4;
@@ -841,7 +841,7 @@ static int tcp_v4_send_synack(const struct sock *sk, struct dst_entry *dst,
 	if (!dst && (dst = inet_csk_route_req(sk, &fl4, req)) == NULL)
 		return -1;
 
-	skb = tcp_make_synack(sk, dst, req, foc, attach_req);
+	skb = tcp_make_synack(sk, dst, req, foc, synack_type);
 
 	if (skb) {
 		__tcp_v4_send_check(skb, ireq->ir_loc_addr, ireq->ir_rmt_addr);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 7d2dc015cd19..6451b83d81e9 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2944,7 +2944,7 @@ int tcp_send_synack(struct sock *sk)
 struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 				struct request_sock *req,
 				struct tcp_fastopen_cookie *foc,
-				bool attach_req)
+				enum tcp_synack_type synack_type)
 {
 	struct inet_request_sock *ireq = inet_rsk(req);
 	const struct tcp_sock *tp = tcp_sk(sk);
@@ -2964,14 +2964,22 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 	/* Reserve space for headers. */
 	skb_reserve(skb, MAX_TCP_HEADER);
 
-	if (attach_req) {
+	switch (synack_type) {
+	case TCP_SYNACK_NORMAL:
 		skb_set_owner_w(skb, req_to_sk(req));
-	} else {
+		break;
+	case TCP_SYNACK_COOKIE:
+		/* Under synflood, we do not attach skb to a socket,
+		 * to avoid false sharing.
+		 */
+		break;
+	case TCP_SYNACK_FASTOPEN:
 		/* sk is a const pointer, because we want to express multiple
 		 * cpu might call us concurrently.
 		 * sk->sk_wmem_alloc in an atomic, we can promote to rw.
 		 */
 		skb_set_owner_w(skb, (struct sock *)sk);
+		break;
 	}
 	skb_dst_set(skb, dst);
 
@@ -3516,7 +3524,7 @@ int tcp_rtx_synack(const struct sock *sk, struct request_sock *req)
 	int res;
 
 	tcp_rsk(req)->txhash = net_tx_rndhash();
-	res = af_ops->send_synack(sk, NULL, &fl, req, NULL, true);
+	res = af_ops->send_synack(sk, NULL, &fl, req, NULL, TCP_SYNACK_NORMAL);
 	if (!res) {
 		TCP_INC_STATS_BH(sock_net(sk), TCP_MIB_RETRANSSEGS);
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 0e621bc1ae11..800265c7fd3f 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -439,7 +439,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
 			      struct flowi *fl,
 			      struct request_sock *req,
 			      struct tcp_fastopen_cookie *foc,
-			      bool attach_req)
+			      enum tcp_synack_type synack_type)
 {
 	struct inet_request_sock *ireq = inet_rsk(req);
 	struct ipv6_pinfo *np = inet6_sk(sk);
@@ -452,7 +452,7 @@ static int tcp_v6_send_synack(const struct sock *sk, struct dst_entry *dst,
 					       IPPROTO_TCP)) == NULL)
 		goto done;
 
-	skb = tcp_make_synack(sk, dst, req, foc, attach_req);
+	skb = tcp_make_synack(sk, dst, req, foc, synack_type);
 
 	if (skb) {
 		__tcp_v6_send_check(skb, &ireq->ir_v6_loc_addr,
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net-next 2/2] tcp: remove false sharing in tcp_rcv_state_process()
  2016-04-14  5:05 [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior Eric Dumazet
  2016-04-14  5:05 ` [PATCH net-next 1/2] tcp: do not mess with listener sk_wmem_alloc Eric Dumazet
@ 2016-04-14  5:05 ` Eric Dumazet
  2016-04-15 20:46 ` [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2016-04-14  5:05 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet

Last known hot point during SYNFLOOD attack is the clearing
of rx_opt.saw_tstamp in tcp_rcv_state_process()

It is not needed for a listener, so we move it where it matters.

Performance while a SYNFLOOD hits a single listener socket
went from 5 Mpps to 6 Mpps on my test server (24 cores, 8 NIC RX queues)

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp_input.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 7ea7034af83f..90e0d9256b74 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5796,8 +5796,6 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 	int queued = 0;
 	bool acceptable;
 
-	tp->rx_opt.saw_tstamp = 0;
-
 	switch (sk->sk_state) {
 	case TCP_CLOSE:
 		goto discard;
@@ -5838,6 +5836,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		goto discard;
 
 	case TCP_SYN_SENT:
+		tp->rx_opt.saw_tstamp = 0;
 		queued = tcp_rcv_synsent_state_process(sk, skb, th);
 		if (queued >= 0)
 			return queued;
@@ -5849,6 +5848,7 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		return 0;
 	}
 
+	tp->rx_opt.saw_tstamp = 0;
 	req = tp->fastopen_rsk;
 	if (req) {
 		WARN_ON_ONCE(sk->sk_state != TCP_SYN_RECV &&
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior
  2016-04-14  5:05 [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior Eric Dumazet
  2016-04-14  5:05 ` [PATCH net-next 1/2] tcp: do not mess with listener sk_wmem_alloc Eric Dumazet
  2016-04-14  5:05 ` [PATCH net-next 2/2] tcp: remove false sharing in tcp_rcv_state_process() Eric Dumazet
@ 2016-04-15 20:46 ` David Miller
  2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2016-04-15 20:46 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet

From: Eric Dumazet <edumazet@google.com>
Date: Wed, 13 Apr 2016 22:05:38 -0700

> In the first patch, I remove the costly association of SYNACK+COOKIES
> to a listener. I believe other parts of the stack should be ready.
> 
> The second patch removes a useless write into listener socket
> in tcp_rcv_state_process(), incurring false sharing in
> tcp_conn_request()
> 
> Performance under SYNFLOOD goes from 3.2 Mpps to 6 Mpps.

Geese, you almost make it look too easy....

> Test was using a single TCP listener, on a host with 8 RX queues
> on the NIC, and 24 cores (48 ht)

Looks good, series applied, thanks Eric.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-15 20:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-14  5:05 [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior Eric Dumazet
2016-04-14  5:05 ` [PATCH net-next 1/2] tcp: do not mess with listener sk_wmem_alloc Eric Dumazet
2016-04-14  5:05 ` [PATCH net-next 2/2] tcp: remove false sharing in tcp_rcv_state_process() Eric Dumazet
2016-04-15 20:46 ` [PATCH net-next 0/2] tcp: final work on SYNFLOOD behavior David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).