[RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
@ 2012-05-28 11:52 Jesper Dangaard Brouer
  2012-05-28 11:52 ` [RFC PATCH 1/2] tcp: extract syncookie part of tcp_v4_conn_request() Jesper Dangaard Brouer
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-28 11:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Christoph Paasch, Eric Dumazet,
	David S. Miller, Martin Topholm
  Cc: Florian Westphal, opurdila, Hans Schillstrom

The following series is a RFC (Request For Comments) for implementing
a faster and parallel handling of TCP SYN connections, to mitigate SYN
flood attacks.  This is against DaveM's net (f0d1b3c2bc), as net-next
is closed, as DaveM has mentioned numerous times ;-)

Only IPv4 TCP is handled here. The IPv6 TCP code also need to be
updated, but I'll deal with that part after we have agreed on a
solution for IPv4 TCP.

 Patch 1/2: Is a cleanup, where I split out the SYN cookie handling
  from tcp_v4_conn_request() into tcp_v4_syn_conn_limit().

 Patch 2/2: Move tcp_v4_syn_conn_limit() outside bh_lock_sock() in
  tcp_v4_rcv().  I would like some input on, (1) if this safe without
  the lock, (2) if we need to do some sock lookup, before calling
  tcp_v4_syn_conn_limit() (Christoph Paasch
  <christoph.paasch@uclouvain.be> mentioned something about SYN
  retransmissions)

---

Jesper Dangaard Brouer (2):
      tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
      tcp: extract syncookie part of tcp_v4_conn_request()

 net/ipv4/tcp_ipv4.c |  131 ++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 107 insertions(+), 24 deletions(-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [RFC PATCH 1/2] tcp: extract syncookie part of tcp_v4_conn_request()
  2012-05-28 11:52 [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods Jesper Dangaard Brouer
@ 2012-05-28 11:52 ` Jesper Dangaard Brouer
  2012-05-28 11:52 ` [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods Jesper Dangaard Brouer
  2012-05-28 16:14 ` [RFC PATCH 0/2] Faster/parallel SYN " Christoph Paasch
  2 siblings, 0 replies; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-28 11:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Christoph Paasch, Eric Dumazet,
	David S. Miller, Martin Topholm
  Cc: Florian Westphal, opurdila, Hans Schillstrom

Place SYN cookie handling, from tcp_v4_conn_request() into seperate
function, named tcp_v4_syn_conn_limit(). The semantics should be
almost the same.

Besides code cleanup, this patch is preparing for handling SYN cookie
in an ealier step, to avoid a spinlock and achive parallel processing.

Signed-off-by: Martin Topholm <mph@hoth.dk>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 net/ipv4/tcp_ipv4.c |  125 +++++++++++++++++++++++++++++++++++++++++----------
 1 files changed, 101 insertions(+), 24 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a43b87d..15958b2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1268,6 +1268,98 @@ static const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops = {
 };
 #endif
 
+/* Check SYN connect limit and send SYN-ACK cookies
+ * - Return 0 = No limitation needed, continue processing
+ * - Return 1 = Stop processing, free SKB, SYN cookie send (if enabled)
+ */
+int tcp_v4_syn_conn_limit(struct sock *sk, struct sk_buff *skb)
+{
+	struct request_sock *req;
+	struct inet_request_sock *ireq;
+	struct tcp_options_received tmp_opt;
+	__be32 saddr = ip_hdr(skb)->saddr;
+	__be32 daddr = ip_hdr(skb)->daddr;
+	__u32 isn = TCP_SKB_CB(skb)->when;
+	const u8 *hash_location; /* No really used */
+
+//	WARN_ON(!tcp_hdr(skb)->syn); /* MUST only be called for SYN req */
+//	WARN_ON(!(sk->sk_state == TCP_LISTEN)); /* On a LISTEN socket */
+
+	/* Never answer to SYNs send to broadcast or multicast */
+	if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
+		goto drop;
+
+	/* If "isn" is not zero, this request hit alive timewait bucket */
+	if (isn)
+		goto no_limit;
+
+	/* Start sending SYN cookies when request sock queue is full*/
+	if (!inet_csk_reqsk_queue_is_full(sk))
+		goto no_limit;
+
+	/* Check if SYN cookies are enabled
+	 * - Side effect: NET_INC_STATS_BH counters + printk logging
+	 */
+	if (!tcp_syn_flood_action(sk, skb, "TCP"))
+		goto drop; /* Not enabled, indicate drop, due to queue full */
+
+	/* Allocate a request_sock */
+	req = inet_reqsk_alloc(&tcp_request_sock_ops);
+	if (!req) {
+		net_warn_ratelimited ("%s: Could not alloc request_sock"
+				      ", drop conn from %pI4",
+				      __FUNCTION__, &saddr);
+		goto drop;
+	}
+
+#ifdef CONFIG_TCP_MD5SIG
+	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
+#endif
+
+	tcp_clear_options(&tmp_opt);
+        tmp_opt.mss_clamp = TCP_MSS_DEFAULT;
+	tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
+	tcp_parse_options(skb, &tmp_opt, &hash_location, 0);
+
+	if (!tmp_opt.saw_tstamp)
+		tcp_clear_options(&tmp_opt);
+
+	tmp_opt.tstamp_ok = tmp_opt.saw_tstamp;
+	tcp_openreq_init(req, &tmp_opt, skb);
+
+	/* Update req as an inet_request_sock (typecast trick)*/
+	ireq = inet_rsk(req);
+	ireq->loc_addr = daddr;
+	ireq->rmt_addr = saddr;
+	ireq->no_srccheck = inet_sk(sk)->transparent;
+	ireq->opt = tcp_v4_save_options(sk, skb);
+
+	if (security_inet_conn_request(sk, skb, req))
+		goto drop_and_free;
+
+	/* Cookie support for ECN if TCP timestamp option avail */
+	if (tmp_opt.tstamp_ok)
+		TCP_ECN_create_request(req, skb);
+
+	/* Encode cookie in InitialSeqNum of SYN-ACK packet */
+	isn = cookie_v4_init_sequence(sk, skb, &req->mss);
+	req->cookie_ts = tmp_opt.tstamp_ok;
+
+	tcp_rsk(req)->snt_isn = isn;
+	tcp_rsk(req)->snt_synack = tcp_time_stamp;
+
+	/* Send SYN-ACK containing cookie */
+	tcp_v4_send_synack(sk, NULL, req, NULL);
+
+drop_and_free:
+	reqsk_free(req);
+drop:
+	return 1;
+no_limit:
+	return 0;
+}
+
+/* Handle SYN request */
 int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_extend_values tmp_ext;
@@ -1280,22 +1372,11 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	__be32 saddr = ip_hdr(skb)->saddr;
 	__be32 daddr = ip_hdr(skb)->daddr;
 	__u32 isn = TCP_SKB_CB(skb)->when;
-	bool want_cookie = false;
 
 	/* Never answer to SYNs send to broadcast or multicast */
 	if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))
 		goto drop;
 
-	/* TW buckets are converted to open requests without
-	 * limitations, they conserve resources and peer is
-	 * evidently real one.
-	 */
-	if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
-		want_cookie = tcp_syn_flood_action(sk, skb, "TCP");
-		if (!want_cookie)
-			goto drop;
-	}
-
 	/* Accept backlog is full. If we have already queued enough
 	 * of warm entries in syn queue, drop request. It is better than
 	 * clogging syn queue with openreqs with exponentially increasing
@@ -1304,6 +1385,10 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	if (sk_acceptq_is_full(sk) && inet_csk_reqsk_queue_young(sk) > 1)
 		goto drop;
 
+	/* SYN cookie handling */
+	if (tcp_v4_syn_conn_limit(sk, skb))
+		goto drop;
+
 	req = inet_reqsk_alloc(&tcp_request_sock_ops);
 	if (!req)
 		goto drop;
@@ -1317,6 +1402,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	tmp_opt.user_mss  = tp->rx_opt.user_mss;
 	tcp_parse_options(skb, &tmp_opt, &hash_location, 0);
 
+	/* Handle RFC6013 - TCP Cookie Transactions (TCPCT) options */
 	if (tmp_opt.cookie_plus > 0 &&
 	    tmp_opt.saw_tstamp &&
 	    !tp->rx_opt.cookie_out_never &&
@@ -1339,7 +1425,6 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		while (l-- > 0)
 			*c++ ^= *hash_location++;
 
-		want_cookie = false;	/* not our kind of cookie */
 		tmp_ext.cookie_out_never = 0; /* false */
 		tmp_ext.cookie_plus = tmp_opt.cookie_plus;
 	} else if (!tp->rx_opt.cookie_in_always) {
@@ -1351,12 +1436,10 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	}
 	tmp_ext.cookie_in_always = tp->rx_opt.cookie_in_always;
 
-	if (want_cookie && !tmp_opt.saw_tstamp)
-		tcp_clear_options(&tmp_opt);
-
 	tmp_opt.tstamp_ok = tmp_opt.saw_tstamp;
 	tcp_openreq_init(req, &tmp_opt, skb);
 
+	/* Update req as an inet_request_sock (typecast trick)*/
 	ireq = inet_rsk(req);
 	ireq->loc_addr = daddr;
 	ireq->rmt_addr = saddr;
@@ -1366,13 +1449,9 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	if (security_inet_conn_request(sk, skb, req))
 		goto drop_and_free;
 
-	if (!want_cookie || tmp_opt.tstamp_ok)
-		TCP_ECN_create_request(req, skb);
+	TCP_ECN_create_request(req, skb);
 
-	if (want_cookie) {
-		isn = cookie_v4_init_sequence(sk, skb, &req->mss);
-		req->cookie_ts = tmp_opt.tstamp_ok;
-	} else if (!isn) {
+	if (!isn) { /* Timewait bucket handling */
 		struct inet_peer *peer = NULL;
 		struct flowi4 fl4;
 
@@ -1422,8 +1501,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	tcp_rsk(req)->snt_synack = tcp_time_stamp;
 
 	if (tcp_v4_send_synack(sk, dst, req,
-			       (struct request_values *)&tmp_ext) ||
-	    want_cookie)
+			       (struct request_values *)&tmp_ext))
 		goto drop_and_free;
 
 	inet_csk_reqsk_queue_hash_add(sk, req, TCP_TIMEOUT_INIT);
@@ -1438,7 +1516,6 @@ drop:
 }
 EXPORT_SYMBOL(tcp_v4_conn_request);
 
-
 /*
  * The three way handshake has completed - we got a valid synack -
  * now create the new socket.

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-28 11:52 [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods Jesper Dangaard Brouer
  2012-05-28 11:52 ` [RFC PATCH 1/2] tcp: extract syncookie part of tcp_v4_conn_request() Jesper Dangaard Brouer
@ 2012-05-28 11:52 ` Jesper Dangaard Brouer
  2012-05-29 19:37   ` Andi Kleen
  2012-05-28 16:14 ` [RFC PATCH 0/2] Faster/parallel SYN " Christoph Paasch
  2 siblings, 1 reply; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-28 11:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, Christoph Paasch, Eric Dumazet,
	David S. Miller, Martin Topholm
  Cc: Florian Westphal, opurdila, Hans Schillstrom

TCP SYN handling is on the slow path via tcp_v4_rcv(), and is
performed while holding spinlock bh_lock_sock().

Real-life and testlab experiments show, that the kernel choks
when reaching 130Kpps SYN floods (powerful Nehalem 16 cores).
Measuring with perf reveals, that its caused by
bh_lock_sock_nested() call in tcp_v4_rcv().

With this patch, the machine can handle 750Kpps (max of the SYN
flood generator) with cycles to spare, CPU load on the big machine
dropped to 1%, from 100%.

Notice we only handle syn cookie early on, normal SYN packets
are still processed under the bh_lock_sock().

Signed-off-by: Martin Topholm <mph@hoth.dk>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---

 net/ipv4/tcp_ipv4.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 15958b2..7480fc2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1386,8 +1386,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		goto drop;
 
 	/* SYN cookie handling */
-	if (tcp_v4_syn_conn_limit(sk, skb))
-		goto drop;
+//	if (tcp_v4_syn_conn_limit(sk, skb))
+//		goto drop;
 
 	req = inet_reqsk_alloc(&tcp_request_sock_ops);
 	if (!req)
@@ -1795,6 +1795,12 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	if (!sk)
 		goto no_tcp_socket;
 
+	/* Early and parallel SYN limit check, that sends syncookies */
+	if (sk->sk_state == TCP_LISTEN && th->syn && !th->ack && !th->fin) {
+		if (tcp_v4_syn_conn_limit(sk, skb))
+			goto discard_and_relse;
+	}
+
 process:
 	if (sk->sk_state == TCP_TIME_WAIT)
 		goto do_time_wait;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-28 11:52 [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods Jesper Dangaard Brouer
  2012-05-28 11:52 ` [RFC PATCH 1/2] tcp: extract syncookie part of tcp_v4_conn_request() Jesper Dangaard Brouer
  2012-05-28 11:52 ` [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods Jesper Dangaard Brouer
@ 2012-05-28 16:14 ` Christoph Paasch
  2012-05-29 20:17   ` Jesper Dangaard Brouer
  2 siblings, 1 reply; 32+ messages in thread
From: Christoph Paasch @ 2012-05-28 16:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Eric Dumazet, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom

Hello,

On 05/28/2012 01:52 PM, Jesper Dangaard Brouer wrote:
> The following series is a RFC (Request For Comments) for implementing
> a faster and parallel handling of TCP SYN connections, to mitigate SYN
> flood attacks.  This is against DaveM's net (f0d1b3c2bc), as net-next
> is closed, as DaveM has mentioned numerous times ;-)
> 
> Only IPv4 TCP is handled here. The IPv6 TCP code also need to be
> updated, but I'll deal with that part after we have agreed on a
> solution for IPv4 TCP.
> 
>  Patch 1/2: Is a cleanup, where I split out the SYN cookie handling
>   from tcp_v4_conn_request() into tcp_v4_syn_conn_limit().
> 
>  Patch 2/2: Move tcp_v4_syn_conn_limit() outside bh_lock_sock() in
>   tcp_v4_rcv().  I would like some input on, (1) if this safe without
>   the lock, (2) if we need to do some sock lookup, before calling
>   tcp_v4_syn_conn_limit() (Christoph Paasch
>   <christoph.paasch@uclouvain.be> mentioned something about SYN
>   retransmissions)

Concerning (1):
I think, there are places where you may have troube because you don't
hold the lock.
E.g., in tcp_make_synack (called by tcp_v4_send_synack from your
tcp_v4_syn_conn_limit) there is:

if (sk->sk_userlocks & SOCK_RCVBUF_LOCK &&
	(req->window_clamp > tcp_full_space(sk) ||
	 req->window_clamp == 0))
	req->window_clamp = tcp_full_space(sk);

Thus, tcp_full_space(sk) may have different values between the check and
setting req->window_clamp.


Concerning (2):

Imagine, a SYN coming in, when the reqsk-queue is not yet full. A
request-sock will be added to the reqsk-queue. Then, a retransmission of
this SYN comes in and the queue got full by the time. This time
tcp_v4_syn_conn_limit will do syn-cookies and thus generate a different
seq-number for the SYN/ACK.


But I don't see how you could fix these issues in your proposed framework.

Cheers,
Christoph

> 
> ---
> 
> Jesper Dangaard Brouer (2):
>       tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
>       tcp: extract syncookie part of tcp_v4_conn_request()
> 
> 
>  net/ipv4/tcp_ipv4.c |  131 ++++++++++++++++++++++++++++++++++++++++++---------
>  1 files changed, 107 insertions(+), 24 deletions(-)
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


-- 
Christoph Paasch
PhD Student

IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
Université Catholique de Louvain
-- 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-28 11:52 ` [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods Jesper Dangaard Brouer
@ 2012-05-29 19:37   ` Andi Kleen
  2012-05-29 20:18     ` David Miller
  2012-05-30  6:41     ` Eric Dumazet
  0 siblings, 2 replies; 32+ messages in thread
From: Andi Kleen @ 2012-05-29 19:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jesper Dangaard Brouer, netdev, Christoph Paasch, Eric Dumazet,
	David S. Miller, Martin Topholm, Florian Westphal, opurdila,
	Hans Schillstrom

Jesper Dangaard Brouer <jbrouer@redhat.com> writes:

> TCP SYN handling is on the slow path via tcp_v4_rcv(), and is
> performed while holding spinlock bh_lock_sock().
>
> Real-life and testlab experiments show, that the kernel choks
> when reaching 130Kpps SYN floods (powerful Nehalem 16 cores).
> Measuring with perf reveals, that its caused by
> bh_lock_sock_nested() call in tcp_v4_rcv().
>
> With this patch, the machine can handle 750Kpps (max of the SYN
> flood generator) with cycles to spare, CPU load on the big machine
> dropped to 1%, from 100%.
>
> Notice we only handle syn cookie early on, normal SYN packets
> are still processed under the bh_lock_sock().

So basically handling syncookie lockless? 

Makes sense. Syncookies is a bit obsolete these days of course, due
to the lack of options. But may be still useful for this.

Obviously you'll need to clean up the patch and support IPv6,
but the basic idea looks good to me.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-28 16:14 ` [RFC PATCH 0/2] Faster/parallel SYN " Christoph Paasch
@ 2012-05-29 20:17   ` Jesper Dangaard Brouer
  2012-05-29 20:36     ` Christoph Paasch
  2012-05-30  4:45     ` Eric Dumazet
  0 siblings, 2 replies; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-29 20:17 UTC (permalink / raw)
  To: christoph.paasch
  Cc: netdev, Eric Dumazet, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Andi Kleen

On Mon, 2012-05-28 at 18:14 +0200, Christoph Paasch wrote:

> On 05/28/2012 01:52 PM, Jesper Dangaard Brouer wrote:
> > The following series is a RFC (Request For Comments) for implementing
> > a faster and parallel handling of TCP SYN connections, to mitigate SYN
> > flood attacks.  This is against DaveM's net (f0d1b3c2bc), as net-next
> > is closed, as DaveM has mentioned numerous times ;-)
> > 
> > Only IPv4 TCP is handled here. The IPv6 TCP code also need to be
> > updated, but I'll deal with that part after we have agreed on a
> > solution for IPv4 TCP.
> > 
> >  Patch 1/2: Is a cleanup, where I split out the SYN cookie handling
> >   from tcp_v4_conn_request() into tcp_v4_syn_conn_limit().
> > 
> >  Patch 2/2: Move tcp_v4_syn_conn_limit() outside bh_lock_sock() in
> >   tcp_v4_rcv().  I would like some input on, (1) if this safe without
> >   the lock, (2) if we need to do some sock lookup, before calling
> >   tcp_v4_syn_conn_limit() (Christoph Paasch
> >   <christoph.paasch@uclouvain.be> mentioned something about SYN
> >   retransmissions)
> 
> Concerning (1):
> I think, there are places where you may have troube because you don't
> hold the lock.
> E.g., in tcp_make_synack (called by tcp_v4_send_synack from your
> tcp_v4_syn_conn_limit) there is:
> 
> if (sk->sk_userlocks & SOCK_RCVBUF_LOCK &&
> 	(req->window_clamp > tcp_full_space(sk) ||
> 	 req->window_clamp == 0))
> 	req->window_clamp = tcp_full_space(sk);
> 
> Thus, tcp_full_space(sk) may have different values between the check and
> setting req->window_clamp.

This should be simply solved by using a local stack variable, for
storing the result from tcp_full_space(sk).  Its likely that GCC already
does this behind our back.


> Concerning (2):
> 
> Imagine, a SYN coming in, when the reqsk-queue is not yet full. A
> request-sock will be added to the reqsk-queue. Then, a retransmission of
> this SYN comes in and the queue got full by the time. This time
> tcp_v4_syn_conn_limit will do syn-cookies and thus generate a different
> seq-number for the SYN/ACK.

I have addressed your issue, by checking the reqsk_queue in
tcp_v4_syn_conn_limit() before allocating a new req via
inet_reqsk_alloc().
If I find an existing reqsk, I choose to drop it, so the SYN cookie
SYN-ACK takes precedence, as the path/handling of the last ACK doesn't
find this reqsk. This is done under the lock.

Test results show that I can provoke the SYN retransmit situation, and
that performance is still very good. Func call inet_csk_search_req()
only sneaks up to a top 20 on perf report.

Patch on top of this patch:

[RFC PATCH 3/2] tcp: Detect SYN retransmits during SYN flood

 Check for existing connection request (reqsk) as this might
 be a retransmitted SYN which have gotten into the
 reqsk_queue.  If so, we choose to drop the reqsk, and use
 SYN cookies to restore the state later.

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7480fc2..e0c9ba3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1274,8 +1274,10 @@ static const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops = {
  */
 int tcp_v4_syn_conn_limit(struct sock *sk, struct sk_buff *skb)
 {
-       struct request_sock *req;
+       struct request_sock *req = NULL;
        struct inet_request_sock *ireq;
+       struct request_sock *exist_req;
+       struct request_sock **prev;
        struct tcp_options_received tmp_opt;
        __be32 saddr = ip_hdr(skb)->saddr;
        __be32 daddr = ip_hdr(skb)->daddr;
@@ -1303,6 +1305,22 @@ int tcp_v4_syn_conn_limit(struct sock *sk, struct sk_buff *skb)
        if (!tcp_syn_flood_action(sk, skb, "TCP"))
                goto drop; /* Not enabled, indicate drop, due to queue full */
 
+       /* Check for existing connection request (reqsk) as this might
+        *   be a retransmitted SYN which have gotten into the
+        *   reqsk_queue.  If so, we choose to drop the reqsk, and use
+        *   SYN cookies to restore the state later.
+        */
+       bh_lock_sock(sk);
+       exist_req = inet_csk_search_req(sk, &prev, tcp_hdr(skb)->source, saddr, daddr);
+       if (exist_req) { /* Drop existing reqsk */
+               if (TCP_SKB_CB(skb)->seq == tcp_rsk(exist_req)->rcv_isn)
+                       net_warn_ratelimited("Retransmitted SYN from %pI4"
+                                            " (orig reqsk dropped)", &saddr);
+
+               inet_csk_reqsk_queue_drop(sk, exist_req, prev);
+       }
+       bh_unlock_sock(sk);
+
        /* Allocate a request_sock */
        req = inet_reqsk_alloc(&tcp_request_sock_ops);
        if (!req) {



I'll post some V2 patches tomorrow, which integrates this changes in
patch 2/2.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-29 19:37   ` Andi Kleen
@ 2012-05-29 20:18     ` David Miller
  2012-05-30  6:41     ` Eric Dumazet
  1 sibling, 0 replies; 32+ messages in thread
From: David Miller @ 2012-05-29 20:18 UTC (permalink / raw)
  To: andi
  Cc: jbrouer, brouer, netdev, christoph.paasch, eric.dumazet, mph, fw,
	opurdila, hans.schillstrom

From: Andi Kleen <andi@firstfloor.org>
Date: Tue, 29 May 2012 12:37:07 -0700

> Makes sense. Syncookies is a bit obsolete these days of course, due
> to the lack of options. But may be still useful for this.

Please crawl out of your cave, syncookies fully supports all TCP
options these days.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-29 20:17   ` Jesper Dangaard Brouer
@ 2012-05-29 20:36     ` Christoph Paasch
  2012-05-30  8:44       ` Jesper Dangaard Brouer
  2012-05-30  4:45     ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Christoph Paasch @ 2012-05-29 20:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Eric Dumazet, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Andi Kleen

Hello,

On 05/29/2012 10:17 PM, Jesper Dangaard Brouer wrote:
> On Mon, 2012-05-28 at 18:14 +0200, Christoph Paasch wrote:
> 
>> On 05/28/2012 01:52 PM, Jesper Dangaard Brouer wrote:
>>> The following series is a RFC (Request For Comments) for implementing
>>> a faster and parallel handling of TCP SYN connections, to mitigate SYN
>>> flood attacks.  This is against DaveM's net (f0d1b3c2bc), as net-next
>>> is closed, as DaveM has mentioned numerous times ;-)
>>>
>>> Only IPv4 TCP is handled here. The IPv6 TCP code also need to be
>>> updated, but I'll deal with that part after we have agreed on a
>>> solution for IPv4 TCP.
>>>
>>>  Patch 1/2: Is a cleanup, where I split out the SYN cookie handling
>>>   from tcp_v4_conn_request() into tcp_v4_syn_conn_limit().
>>>
>>>  Patch 2/2: Move tcp_v4_syn_conn_limit() outside bh_lock_sock() in
>>>   tcp_v4_rcv().  I would like some input on, (1) if this safe without
>>>   the lock, (2) if we need to do some sock lookup, before calling
>>>   tcp_v4_syn_conn_limit() (Christoph Paasch
>>>   <christoph.paasch@uclouvain.be> mentioned something about SYN
>>>   retransmissions)
>>
>> Concerning (1):
>> I think, there are places where you may have troube because you don't
>> hold the lock.
>> E.g., in tcp_make_synack (called by tcp_v4_send_synack from your
>> tcp_v4_syn_conn_limit) there is:
>>
>> if (sk->sk_userlocks & SOCK_RCVBUF_LOCK &&
>> 	(req->window_clamp > tcp_full_space(sk) ||
>> 	 req->window_clamp == 0))
>> 	req->window_clamp = tcp_full_space(sk);
>>
>> Thus, tcp_full_space(sk) may have different values between the check and
>> setting req->window_clamp.
> 
> This should be simply solved by using a local stack variable, for
> storing the result from tcp_full_space(sk).  Its likely that GCC already
> does this behind our back.

The place in tcp_make_synack is not the only one where we may have a race.

E.g., tcp_syn_flood_action or inet_csk_reqsk_queue_is_full.

And you never know which module is loaded behind
security_inet_conn_request and what it will do.

It must be carefully checked if the race really isn't an issue.

>> Concerning (2):
>>
>> Imagine, a SYN coming in, when the reqsk-queue is not yet full. A
>> request-sock will be added to the reqsk-queue. Then, a retransmission of
>> this SYN comes in and the queue got full by the time. This time
>> tcp_v4_syn_conn_limit will do syn-cookies and thus generate a different
>> seq-number for the SYN/ACK.
> 
> I have addressed your issue, by checking the reqsk_queue in
> tcp_v4_syn_conn_limit() before allocating a new req via
> inet_reqsk_alloc().
> If I find an existing reqsk, I choose to drop it, so the SYN cookie
> SYN-ACK takes precedence, as the path/handling of the last ACK doesn't
> find this reqsk. This is done under the lock.

Then the receiver will receive two SYN/ACK's for the same SYN with
different sequence-numbers. As the "SYN cookie SYN-ACK" will arrive
second, it will be discarded and seq-numbers from the first one will be
taken on the client-side.

Then, the connection will never establish, as both sides "agreed" on
different sequence numbers.

I would say, you have to handle the retransmitted SYN as in
tcp_v4_hnd_req by calling tcp_check_req.


Cheers,
Christoph


> Test results show that I can provoke the SYN retransmit situation, and
> that performance is still very good. Func call inet_csk_search_req()
> only sneaks up to a top 20 on perf report.
> 
> Patch on top of this patch:
> 
> [RFC PATCH 3/2] tcp: Detect SYN retransmits during SYN flood
> 
>  Check for existing connection request (reqsk) as this might
>  be a retransmitted SYN which have gotten into the
>  reqsk_queue.  If so, we choose to drop the reqsk, and use
>  SYN cookies to restore the state later.
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 7480fc2..e0c9ba3 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1274,8 +1274,10 @@ static const struct tcp_request_sock_ops tcp_request_sock_ipv4_ops = {
>   */
>  int tcp_v4_syn_conn_limit(struct sock *sk, struct sk_buff *skb)
>  {
> -       struct request_sock *req;
> +       struct request_sock *req = NULL;
>         struct inet_request_sock *ireq;
> +       struct request_sock *exist_req;
> +       struct request_sock **prev;
>         struct tcp_options_received tmp_opt;
>         __be32 saddr = ip_hdr(skb)->saddr;
>         __be32 daddr = ip_hdr(skb)->daddr;
> @@ -1303,6 +1305,22 @@ int tcp_v4_syn_conn_limit(struct sock *sk, struct sk_buff *skb)
>         if (!tcp_syn_flood_action(sk, skb, "TCP"))
>                 goto drop; /* Not enabled, indicate drop, due to queue full */
>  
> +       /* Check for existing connection request (reqsk) as this might
> +        *   be a retransmitted SYN which have gotten into the
> +        *   reqsk_queue.  If so, we choose to drop the reqsk, and use
> +        *   SYN cookies to restore the state later.
> +        */
> +       bh_lock_sock(sk);
> +       exist_req = inet_csk_search_req(sk, &prev, tcp_hdr(skb)->source, saddr, daddr);
> +       if (exist_req) { /* Drop existing reqsk */
> +               if (TCP_SKB_CB(skb)->seq == tcp_rsk(exist_req)->rcv_isn)
> +                       net_warn_ratelimited("Retransmitted SYN from %pI4"
> +                                            " (orig reqsk dropped)", &saddr);
> +
> +               inet_csk_reqsk_queue_drop(sk, exist_req, prev);
> +       }
> +       bh_unlock_sock(sk);
> +
>         /* Allocate a request_sock */
>         req = inet_reqsk_alloc(&tcp_request_sock_ops);
>         if (!req) {
> 
> 
> 
> I'll post some V2 patches tomorrow, which integrates this changes in
> patch 2/2.
> 
> 


-- 
Christoph Paasch
PhD Student

IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
Université Catholique de Louvain
-- 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-29 20:17   ` Jesper Dangaard Brouer
  2012-05-29 20:36     ` Christoph Paasch
@ 2012-05-30  4:45     ` Eric Dumazet
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2012-05-30  4:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: christoph.paasch, netdev, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Andi Kleen

On Tue, 2012-05-29 at 22:17 +0200, Jesper Dangaard Brouer wrote:
> On Mon, 2012-05-28 at 18:14 +0200, Christoph Paasch wrote:
> 

> > Concerning (1):
> > I think, there are places where you may have troube because you don't
> > hold the lock.
> > E.g., in tcp_make_synack (called by tcp_v4_send_synack from your
> > tcp_v4_syn_conn_limit) there is:
> > 
> > if (sk->sk_userlocks & SOCK_RCVBUF_LOCK &&
> > 	(req->window_clamp > tcp_full_space(sk) ||
> > 	 req->window_clamp == 0))
> > 	req->window_clamp = tcp_full_space(sk);
> > 
> > Thus, tcp_full_space(sk) may have different values between the check and
> > setting req->window_clamp.
> 
> This should be simply solved by using a local stack variable, for
> storing the result from tcp_full_space(sk).  Its likely that GCC already
> does this behind our back.
> 

Thats not the proper way to handle that situation.

A local stack variable makes no such guarantee. You need ACCESS_ONCE().

This is exactly the kind of things that RCU takes care of.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-29 19:37   ` Andi Kleen
  2012-05-29 20:18     ` David Miller
@ 2012-05-30  6:41     ` Eric Dumazet
  2012-05-30  7:45       ` Jesper Dangaard Brouer
  2012-05-30  8:03       ` Hans Schillstrom
  1 sibling, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2012-05-30  6:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jesper Dangaard Brouer, Jesper Dangaard Brouer, netdev,
	Christoph Paasch, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Tom Herbert

On Tue, 2012-05-29 at 12:37 -0700, Andi Kleen wrote:

> So basically handling syncookie lockless? 
> 
> Makes sense. Syncookies is a bit obsolete these days of course, due
> to the lack of options. But may be still useful for this.
> 
> Obviously you'll need to clean up the patch and support IPv6,
> but the basic idea looks good to me.

Also TCP Fast Open should be a good way to make the SYN flood no more
effective.

Yuchung Cheng and Jerry Chu should upstream this code in a very near
future.

Another way to mitigate SYN scalability issues before the full RCU
solution I was cooking is to either :

1) Use a hardware filter (like on Intel NICS) to force all SYN packets
going to one queue (so that they are all serviced on one CPU)

2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
dependent on src port/address, to get same effect (All SYN packets
processed by one cpu). Note this only address the SYN flood problem, not
the general 3WHS scalability one, since if real connection is
established, the third packet (ACK from client) will have the 'real'
rxhash and will be processed by another cpu.

(Of course, RPS must be enabled to benefit from this)

Untested patch to get the idea :

 include/net/flow_keys.h   |    1 +
 net/core/dev.c            |    8 ++++++++
 net/core/flow_dissector.c |    9 +++++++++
 3 files changed, 18 insertions(+)

diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
index 80461c1..b5bae21 100644
--- a/include/net/flow_keys.h
+++ b/include/net/flow_keys.h
@@ -10,6 +10,7 @@ struct flow_keys {
 		__be16 port16[2];
 	};
 	u8 ip_proto;
+	u8 tcpflags;
 };
 
 extern bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
diff --git a/net/core/dev.c b/net/core/dev.c
index cd09819..c9c039e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -135,6 +135,7 @@
 #include <linux/net_tstamp.h>
 #include <linux/static_key.h>
 #include <net/flow_keys.h>
+#include <net/tcp.h>
 
 #include "net-sysfs.h"
 
@@ -2614,6 +2615,12 @@ void __skb_get_rxhash(struct sk_buff *skb)
 		return;
 
 	if (keys.ports) {
+		if ((keys.tcpflags & (TCPHDR_SYN | TCPHDR_ACK)) == TCPHDR_SYN) {
+			hash = jhash_2words((__force u32)keys.dst,
+					    (__force u32)keys.port16[1],
+					    hashrnd);
+			goto end;
+		}
 		if ((__force u16)keys.port16[1] < (__force u16)keys.port16[0])
 			swap(keys.port16[0], keys.port16[1]);
 		skb->l4_rxhash = 1;
@@ -2626,6 +2633,7 @@ void __skb_get_rxhash(struct sk_buff *skb)
 	hash = jhash_3words((__force u32)keys.dst,
 			    (__force u32)keys.src,
 			    (__force u32)keys.ports, hashrnd);
+end:
 	if (!hash)
 		hash = 1;
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index a225089..cd4aedf 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -137,6 +137,15 @@ ipv6:
 		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
 		if (ports)
 			flow->ports = *ports;
+		if (ip_proto == IPPROTO_TCP) {
+			__u8 *tcpflags, _tcpflags;
+
+			tcpflags = skb_header_pointer(skb, nhoff + 13,
+						      sizeof(_tcpflags),
+						      &_tcpflags);
+			if (tcpflags)
+				flow->tcpflags = *tcpflags;
+		}
 	}
 
 	return true;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  6:41     ` Eric Dumazet
@ 2012-05-30  7:45       ` Jesper Dangaard Brouer
  2012-05-30  8:15         ` Eric Dumazet
  2012-05-30  8:03       ` Hans Schillstrom
  1 sibling, 1 reply; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-30  7:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, netdev, Christoph Paasch, David S. Miller,
	Martin Topholm, Florian Westphal, opurdila, Hans Schillstrom,
	Tom Herbert

On Wed, 2012-05-30 at 08:41 +0200, Eric Dumazet wrote:
> On Tue, 2012-05-29 at 12:37 -0700, Andi Kleen wrote:
> 
> > So basically handling syncookie lockless? 
> > 
> > Makes sense. Syncookies is a bit obsolete these days of course, due
> > to the lack of options. But may be still useful for this.
> > 
> > Obviously you'll need to clean up the patch and support IPv6,
> > but the basic idea looks good to me.
> 
> Also TCP Fast Open should be a good way to make the SYN flood no more
> effective.

Sounds interesting, but TCP Fast Open is primarily concerned with
enabling data exchange during SYN establishment.  I don't see any
indication that they have implemented parallel SYN handling.

Implementing parallel SYN handling, should also benefit their work.
After studying this code path, I also see great performance benefit in
also optimizing the normal 3WHS on sock's in sk_state == LISTEN.
Perhaps we should split up the code path for LISTEN vs. ESTABLISHED, as
they are very entangled at the moment AFAIKS.

> Yuchung Cheng and Jerry Chu should upstream this code in a very near
> future.

Looking forward to see the code, and the fallout discussions, on
transferring data on SYN packets.

> Another way to mitigate SYN scalability issues before the full RCU
> solution I was cooking is to either :
> 
> 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> going to one queue (so that they are all serviced on one CPU)
> 
> 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> dependent on src port/address, to get same effect (All SYN packets
> processed by one cpu). Note this only address the SYN flood problem, not
> the general 3WHS scalability one, since if real connection is
> established, the third packet (ACK from client) will have the 'real'
> rxhash and will be processed by another cpu.

I don't like the idea of overloading one CPU with SYN packets. As the
attacker can still cause a DoS on new connections.

My "unlocked" parallel SYN cookie approach, should favor established
connections, as they are allowed to run under a BH lock, and thus don't
let new SYN packets in (on this CPU), until the establish conn packet is
finished.  Unless I have misunderstood something... I think I have,
established connections have their own/seperate struck sock, and thus
this is another slock spinlock, right?. (Well let Eric bash me for
this ;-))

[...cut...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  6:41     ` Eric Dumazet
  2012-05-30  7:45       ` Jesper Dangaard Brouer
@ 2012-05-30  8:03       ` Hans Schillstrom
  2012-05-30  8:24         ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Hans Schillstrom @ 2012-05-30  8:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Jesper Dangaard Brouer, Jesper Dangaard Brouer,
	netdev@vger.kernel.org, Christoph Paasch, David S. Miller,
	Martin Topholm, Florian Westphal, opurdila@ixiacom.com,
	Tom Herbert

On Wednesday 30 May 2012 08:41:13 Eric Dumazet wrote:
> On Tue, 2012-05-29 at 12:37 -0700, Andi Kleen wrote:
> 
> > So basically handling syncookie lockless? 
> > 
> > Makes sense. Syncookies is a bit obsolete these days of course, due
> > to the lack of options. But may be still useful for this.
> > 
> > Obviously you'll need to clean up the patch and support IPv6,
> > but the basic idea looks good to me.
> 
> Also TCP Fast Open should be a good way to make the SYN flood no more
> effective.
> 
> Yuchung Cheng and Jerry Chu should upstream this code in a very near
> future.
> 
> Another way to mitigate SYN scalability issues before the full RCU
> solution I was cooking is to either :
> 
> 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> going to one queue (so that they are all serviced on one CPU)

We have this option running right now, and it gave slightly higher values.
The upside is only one core is running at 100% load.

To be able to process more SYN an attempt was made to spread them with RPS to 
2 other cores gave 60% more SYN:s per sec
i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
Adding more cores than two didn't help that much.

> 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> dependent on src port/address, to get same effect (All SYN packets
> processed by one cpu). Note this only address the SYN flood problem, not
> the general 3WHS scalability one, since if real connection is
> established, the third packet (ACK from client) will have the 'real'
> rxhash and will be processed by another cpu.

Neither the NIC:s SYN filter or this scale that well..

> (Of course, RPS must be enabled to benefit from this)
> 
> Untested patch to get the idea :
> 
>  include/net/flow_keys.h   |    1 +
>  net/core/dev.c            |    8 ++++++++
>  net/core/flow_dissector.c |    9 +++++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/include/net/flow_keys.h b/include/net/flow_keys.h
> index 80461c1..b5bae21 100644
> --- a/include/net/flow_keys.h
> +++ b/include/net/flow_keys.h
> @@ -10,6 +10,7 @@ struct flow_keys {
>  		__be16 port16[2];
>  	};
>  	u8 ip_proto;
> +	u8 tcpflags;
>  };
>  
>  extern bool skb_flow_dissect(const struct sk_buff *skb, struct flow_keys *flow);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cd09819..c9c039e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -135,6 +135,7 @@
>  #include <linux/net_tstamp.h>
>  #include <linux/static_key.h>
>  #include <net/flow_keys.h>
> +#include <net/tcp.h>
>  
>  #include "net-sysfs.h"
>  
> @@ -2614,6 +2615,12 @@ void __skb_get_rxhash(struct sk_buff *skb)
>  		return;
>  
>  	if (keys.ports) {
> +		if ((keys.tcpflags & (TCPHDR_SYN | TCPHDR_ACK)) == TCPHDR_SYN) {
> +			hash = jhash_2words((__force u32)keys.dst,
> +					    (__force u32)keys.port16[1],
> +					    hashrnd);
> +			goto end;
> +		}
>  		if ((__force u16)keys.port16[1] < (__force u16)keys.port16[0])
>  			swap(keys.port16[0], keys.port16[1]);
>  		skb->l4_rxhash = 1;
> @@ -2626,6 +2633,7 @@ void __skb_get_rxhash(struct sk_buff *skb)
>  	hash = jhash_3words((__force u32)keys.dst,
>  			    (__force u32)keys.src,
>  			    (__force u32)keys.ports, hashrnd);
> +end:
>  	if (!hash)
>  		hash = 1;
>  
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index a225089..cd4aedf 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -137,6 +137,15 @@ ipv6:
>  		ports = skb_header_pointer(skb, nhoff, sizeof(_ports), &_ports);
>  		if (ports)
>  			flow->ports = *ports;
> +		if (ip_proto == IPPROTO_TCP) {
> +			__u8 *tcpflags, _tcpflags;
> +
> +			tcpflags = skb_header_pointer(skb, nhoff + 13,
> +						      sizeof(_tcpflags),
> +						      &_tcpflags);
> +			if (tcpflags)
> +				flow->tcpflags = *tcpflags;
> +		}
>  	}
>  
>  	return true;
> 
> 
> 

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  7:45       ` Jesper Dangaard Brouer
@ 2012-05-30  8:15         ` Eric Dumazet
  2012-05-30  9:24           ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2012-05-30  8:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Andi Kleen, netdev, Christoph Paasch, David S. Miller,
	Martin Topholm, Florian Westphal, opurdila, Hans Schillstrom,
	Tom Herbert

On Wed, 2012-05-30 at 09:45 +0200, Jesper Dangaard Brouer wrote:

> Sounds interesting, but TCP Fast Open is primarily concerned with
> enabling data exchange during SYN establishment.  I don't see any
> indication that they have implemented parallel SYN handling.
> 

Not at all, TCP fast open main goal is to allow connection establishment
with a single packet (thus removing one RTT). This also removes the
whole idea of having half-sockets (in SYN_RCV state)

Then, allowing DATA in the SYN packet is an extra bonus, only if the
whole request can fit in the packet (it is unlikely for typical http
requests)


> Implementing parallel SYN handling, should also benefit their work.

Why do you think I am working on this ? Hint : I am a Google coworker.

> After studying this code path, I also see great performance benefit in
> also optimizing the normal 3WHS on sock's in sk_state == LISTEN.
> Perhaps we should split up the code path for LISTEN vs. ESTABLISHED, as
> they are very entangled at the moment AFAIKS.
> 
> > Yuchung Cheng and Jerry Chu should upstream this code in a very near
> > future.
> 
> Looking forward to see the code, and the fallout discussions, on
> transferring data on SYN packets.
> 

Problem is this code will be delayed if we change net-next code in this
area, because we'll have to rebase and retest everything.

> 
> > Another way to mitigate SYN scalability issues before the full RCU
> > solution I was cooking is to either :
> > 
> > 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> > going to one queue (so that they are all serviced on one CPU)
> > 
> > 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> > dependent on src port/address, to get same effect (All SYN packets
> > processed by one cpu). Note this only address the SYN flood problem, not
> > the general 3WHS scalability one, since if real connection is
> > established, the third packet (ACK from client) will have the 'real'
> > rxhash and will be processed by another cpu.
> 
> I don't like the idea of overloading one CPU with SYN packets. As the
> attacker can still cause a DoS on new connections.
> 

One CPU can handle more than one million SYN per second, while 32 cpus
fighting on socket lock can not handle 1 % of this load.

If Intel chose to implement this hardware filter in their NIC, its for a
good reason.


> My "unlocked" parallel SYN cookie approach, should favor established
> connections, as they are allowed to run under a BH lock, and thus don't
> let new SYN packets in (on this CPU), until the establish conn packet is
> finished.  Unless I have misunderstood something... I think I have,
> established connections have their own/seperate struck sock, and thus
> this is another slock spinlock, right?. (Well let Eric bash me for
> this ;-))

It seems you forgot I have patches to have full parallelism, not only
the SYNCOOKIE hack.

I am still polishing them, its a _long_ process, especially if network
tree changes a lot.

If you believe you can beat me on this, please let me know so that I can
switch to other tasks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  8:03       ` Hans Schillstrom
@ 2012-05-30  8:24         ` Eric Dumazet
  2012-05-30 11:14           ` Hans Schillstrom
  2012-05-30 21:20           ` Rick Jones
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2012-05-30  8:24 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Andi Kleen, Jesper Dangaard Brouer, Jesper Dangaard Brouer,
	netdev@vger.kernel.org, Christoph Paasch, David S. Miller,
	Martin Topholm, Florian Westphal, Tom Herbert

On Wed, 2012-05-30 at 10:03 +0200, Hans Schillstrom wrote:

> We have this option running right now, and it gave slightly higher values.
> The upside is only one core is running at 100% load.
> 
> To be able to process more SYN an attempt was made to spread them with RPS to 
> 2 other cores gave 60% more SYN:s per sec
> i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
> adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
> Adding more cores than two didn't help that much.

When you say 52.000 pkt/s, is that for fully established sockets, or
SYNFLOOD ?

19.23 us to handle _one_ SYN message seems pretty wrong to me, if there
is no contention on listener socket.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-29 20:36     ` Christoph Paasch
@ 2012-05-30  8:44       ` Jesper Dangaard Brouer
  2012-05-30  8:50         ` Eric Dumazet
  2012-05-30  8:53         ` Christoph Paasch
  0 siblings, 2 replies; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-30  8:44 UTC (permalink / raw)
  To: christoph.paasch
  Cc: netdev, Eric Dumazet, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Andi Kleen

On Tue, 2012-05-29 at 22:36 +0200, Christoph Paasch wrote:
[...cut...]

> >> Concerning (2):
> >>
> >> Imagine, a SYN coming in, when the reqsk-queue is not yet full. A
> >> request-sock will be added to the reqsk-queue. Then, a retransmission of
> >> this SYN comes in and the queue got full by the time. This time
> >> tcp_v4_syn_conn_limit will do syn-cookies and thus generate a different
> >> seq-number for the SYN/ACK.
> > 
> > I have addressed your issue, by checking the reqsk_queue in
> > tcp_v4_syn_conn_limit() before allocating a new req via
> > inet_reqsk_alloc().
> > If I find an existing reqsk, I choose to drop it, so the SYN cookie
> > SYN-ACK takes precedence, as the path/handling of the last ACK doesn't
> > find this reqsk. This is done under the lock.
> 
> Then the receiver will receive two SYN/ACK's for the same SYN with
> different sequence-numbers. As the "SYN cookie SYN-ACK" will arrive
> second, it will be discarded and seq-numbers from the first one will be
> taken on the client-side.

I thought that the retransmitted SYN packet, were caused by the SYN-ACK
didn't reach the client?

> Then, the connection will never establish, as both sides "agreed" on
> different sequence numbers.
> 
> I would say, you have to handle the retransmitted SYN as in
> tcp_v4_hnd_req by calling tcp_check_req.

Choosing that code path, should be easy by simply returning 0 (no_limit)
from my function tcp_v4_syn_conn_limit(), to indicate that the normal
slow code path should be chosen.

I guess this will not pose a big attack angle, as the entries in
reqsk_queue will be fairly small.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-30  8:44       ` Jesper Dangaard Brouer
@ 2012-05-30  8:50         ` Eric Dumazet
  2012-05-30  8:53         ` Christoph Paasch
  1 sibling, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2012-05-30  8:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: christoph.paasch, netdev, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Andi Kleen

On Wed, 2012-05-30 at 10:44 +0200, Jesper Dangaard Brouer wrote:

> Choosing that code path, should be easy by simply returning 0 (no_limit)
> from my function tcp_v4_syn_conn_limit(), to indicate that the normal
> slow code path should be chosen.
> 
> I guess this will not pose a big attack angle, as the entries in
> reqsk_queue will be fairly small.

Not sure what you mean.

I know some people have 64K entries in it.

(sk_ack_backlog / sk_max_ack_backlog being 16bits, 
listen(fd, 65536 + 1) can give unexpected results)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-30  8:44       ` Jesper Dangaard Brouer
  2012-05-30  8:50         ` Eric Dumazet
@ 2012-05-30  8:53         ` Christoph Paasch
  2012-05-30 22:40           ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 32+ messages in thread
From: Christoph Paasch @ 2012-05-30  8:53 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, Eric Dumazet, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Andi Kleen

On 05/30/2012 10:44 AM, Jesper Dangaard Brouer wrote:
>> > 
>> > Then the receiver will receive two SYN/ACK's for the same SYN with
>> > different sequence-numbers. As the "SYN cookie SYN-ACK" will arrive
>> > second, it will be discarded and seq-numbers from the first one will be
>> > taken on the client-side.
> I thought that the retransmitted SYN packet, were caused by the SYN-ACK
> didn't reach the client?

Or, if the SYN/ACK got somehow delayed in the network and the
SYN-retransmission timer on the client-side fires before the SYN/ACK
reaches the client.


Christoph


-- 
Christoph Paasch
PhD Student

IP Networking Lab --- http://inl.info.ucl.ac.be
MultiPath TCP in the Linux Kernel --- http://mptcp.info.ucl.ac.be
Université Catholique de Louvain
-- 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  8:15         ` Eric Dumazet
@ 2012-05-30  9:24           ` Jesper Dangaard Brouer
  2012-05-30  9:46             ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-30  9:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, netdev, Christoph Paasch, David S. Miller,
	Martin Topholm, Florian Westphal, Hans Schillstrom,
	Martin Topholm

On Wed, 2012-05-30 at 10:15 +0200, Eric Dumazet wrote:
> On Wed, 2012-05-30 at 09:45 +0200, Jesper Dangaard Brouer wrote:
> 
> > Sounds interesting, but TCP Fast Open is primarily concerned with
> > enabling data exchange during SYN establishment.  I don't see any
> > indication that they have implemented parallel SYN handling.
> > 
> 
> Not at all, TCP fast open main goal is to allow connection establishment
> with a single packet (thus removing one RTT). This also removes the
> whole idea of having half-sockets (in SYN_RCV state)
> 
> Then, allowing DATA in the SYN packet is an extra bonus, only if the
> whole request can fit in the packet (it is unlikely for typical http
> requests)
> 
> 
> > Implementing parallel SYN handling, should also benefit their work.
> 
> Why do you think I am working on this ? Hint : I am a Google coworker.

Did know you work for Google, but didn't know you worked actively on
parallel SYN handling.  Your previous quote "eventually in a short
time", indicated to me, that I should solve the issue my self first, and
then we would replace my code with your full solution later.


> > After studying this code path, I also see great performance benefit in
> > also optimizing the normal 3WHS on sock's in sk_state == LISTEN.
> > Perhaps we should split up the code path for LISTEN vs. ESTABLISHED, as
> > they are very entangled at the moment AFAIKS.
> > 
> > > Yuchung Cheng and Jerry Chu should upstream this code in a very near
> > > future.
> > 
> > Looking forward to see the code, and the fallout discussions, on
> > transferring data on SYN packets.
> > 
> 
> Problem is this code will be delayed if we change net-next code in this
> area, because we'll have to rebase and retest everything.

Okay, don't want to delay your work.  We can wait merging my cleanup
patches, and I can take the pain of rebasing them after your work is
merged.  And then we will see if my performance patches have gotten
obsolete.

I'm going to post some updated v2 patches, just because I know some
people that are desperate for a quick solution to their DDoS issues, and
are willing patch their kernels for production.

 
> > > Another way to mitigate SYN scalability issues before the full RCU
> > > solution I was cooking is to either :
> > > 
> > > 1) Use a hardware filter (like on Intel NICS) to force all SYN packets
> > > going to one queue (so that they are all serviced on one CPU)
> > > 
> > > 2) Tweak RPS (__skb_get_rxhash()) so that SYN packets rxhash is not
> > > dependent on src port/address, to get same effect (All SYN packets
> > > processed by one cpu). Note this only address the SYN flood problem, not
> > > the general 3WHS scalability one, since if real connection is
> > > established, the third packet (ACK from client) will have the 'real'
> > > rxhash and will be processed by another cpu.
> > 
> > I don't like the idea of overloading one CPU with SYN packets. As the
> > attacker can still cause a DoS on new connections.
> > 
> 
> One CPU can handle more than one million SYN per second, while 32 cpus
> fighting on socket lock can not handle 1 % of this load.

Not sure, one CPU can handle 1Mpps on this particular path.  And Hans
have some other measurements, although I'm assuming he has small CPUs.
But if you are working on the real solution, we don't need to discuss
this :-)


> If Intel chose to implement this hardware filter in their NIC, its for a
> good reason.
> 
> 
> > My "unlocked" parallel SYN cookie approach, should favor established
> > connections, as they are allowed to run under a BH lock, and thus don't
> > let new SYN packets in (on this CPU), until the establish conn packet is
> > finished.  Unless I have misunderstood something... I think I have,
> > established connections have their own/seperate struck sock, and thus
> > this is another slock spinlock, right?. (Well let Eric bash me for
> > this ;-))
> 
> It seems you forgot I have patches to have full parallelism, not only
> the SYNCOOKIE hack.

I'm so much, looking forward to this :-)

> I am still polishing them, its a _long_ process, especially if network
> tree changes a lot.
> 
> If you believe you can beat me on this, please let me know so that I can
> switch to other tasks.

I don't dare to go into that battle with the network ninja, I surrender.
DaveM, Eric's patches take precedence over mine...

/me Crawing back into my cave, and switching to boring bugzilla cases of
backporting kernel patches instead...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  9:24           ` Jesper Dangaard Brouer
@ 2012-05-30  9:46             ` Eric Dumazet
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2012-05-30  9:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Andi Kleen, netdev, Christoph Paasch, David S. Miller,
	Martin Topholm, Florian Westphal, Hans Schillstrom

On Wed, 2012-05-30 at 11:24 +0200, Jesper Dangaard Brouer wrote:

> I don't dare to go into that battle with the network ninja, I surrender.
> DaveM, Eric's patches take precedence over mine...
> 
> /me Crawing back into my cave, and switching to boring bugzilla cases of
> backporting kernel patches instead...
> 

Hey, I only wanted to say that we were working on the same area and that
we should expect conflicts.

In the long term, we want a scalable listener solution, but I can
understand if some customers want an immediate solution (SYN flood
mitigation)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  8:24         ` Eric Dumazet
@ 2012-05-30 11:14           ` Hans Schillstrom
  2012-05-30 21:20           ` Rick Jones
  1 sibling, 0 replies; 32+ messages in thread
From: Hans Schillstrom @ 2012-05-30 11:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andi Kleen, Jesper Dangaard Brouer, Jesper Dangaard Brouer,
	netdev@vger.kernel.org, Christoph Paasch, David S. Miller,
	Martin Topholm, Florian Westphal, Tom Herbert

On Wednesday 30 May 2012 10:24:48 Eric Dumazet wrote:
> On Wed, 2012-05-30 at 10:03 +0200, Hans Schillstrom wrote:
> 
> > We have this option running right now, and it gave slightly higher values.
> > The upside is only one core is running at 100% load.
> > 
> > To be able to process more SYN an attempt was made to spread them with RPS to 
> > 2 other cores gave 60% more SYN:s per sec
> > i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
> > adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
> > Adding more cores than two didn't help that much.
> 
> When you say 52.000 pkt/s, is that for fully established sockets, or
> SYNFLOOD ?

SYN Flood with hping3  random source ip, dest port 5060
and there is a listener on that port.
(kernel 3.0.13)

> 19.23 us to handle _one_ SYN message seems pretty wrong to me, if there
> is no contention on listener socket.
> 

BTW. 
I also see a strange behavior during SYN flood.
The client starts data sending directly in the ack, 
and that first packet is more or less always retransmitted once.

I'll dig into that later, or do anyone have an idea of the reason ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30  8:24         ` Eric Dumazet
  2012-05-30 11:14           ` Hans Schillstrom
@ 2012-05-30 21:20           ` Rick Jones
  2012-05-31  8:28             ` Eric Dumazet
  1 sibling, 1 reply; 32+ messages in thread
From: Rick Jones @ 2012-05-30 21:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hans Schillstrom, Andi Kleen, Jesper Dangaard Brouer,
	Jesper Dangaard Brouer, netdev@vger.kernel.org, Christoph Paasch,
	David S. Miller, Martin Topholm, Florian Westphal, Tom Herbert

On 05/30/2012 01:24 AM, Eric Dumazet wrote:
> On Wed, 2012-05-30 at 10:03 +0200, Hans Schillstrom wrote:
>
>> We have this option running right now, and it gave slightly higher values.
>> The upside is only one core is running at 100% load.
>>
>> To be able to process more SYN an attempt was made to spread them with RPS to
>> 2 other cores gave 60% more SYN:s per sec
>> i.e. syn filter in NIC sending all irq:s to one core gave ~ 52k syn. pkts/sec
>> adding RPS and sending syn to two other core:s gave ~80k  syn. pkts/sec
>> Adding more cores than two didn't help that much.
>
> When you say 52.000 pkt/s, is that for fully established sockets, or
> SYNFLOOD ?
>
> 19.23 us to handle _one_ SYN message seems pretty wrong to me, if there
> is no contention on listener socket.

It may still be high, but a very quick netperf TCP_CC test over loopback 
on a W3550 system running a 2.6.38 kernel shows:

raj@tardy:~/netperf2_trunk/src$ ./netperf -t TCP_CC -l 60 -c -C
TCP Connect/Close TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
localhost.localdomain () port 0 AF_INET
Local /Remote
Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
Send   Recv   Size    Size   Time    Rate     local  remote local   remote
bytes  bytes  bytes   bytes  secs.   per sec  %      %      us/Tr   us/Tr

16384  87380  1       1      60.00   21515.29   30.68  30.96  57.042  57.557
16384  87380

57 microseconds per "transaction" which in this case is establishing and 
tearing-down the connection, with nothing else (no data packets) makes 
19 microseconds for a SYN seem perhaps not all that beyond the realm of 
possibility?

rick jones

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-30  8:53         ` Christoph Paasch
@ 2012-05-30 22:40           ` Jesper Dangaard Brouer
  2012-05-31 12:51             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-30 22:40 UTC (permalink / raw)
  To: christoph.paasch
  Cc: netdev, Eric Dumazet, David S. Miller, Martin Topholm,
	Florian Westphal, opurdila, Hans Schillstrom, Andi Kleen

On Wed, 2012-05-30 at 10:53 +0200, Christoph Paasch wrote:
> On 05/30/2012 10:44 AM, Jesper Dangaard Brouer wrote:
> >> > 
> >> > Then the receiver will receive two SYN/ACK's for the same SYN with
> >> > different sequence-numbers. As the "SYN cookie SYN-ACK" will arrive
> >> > second, it will be discarded and seq-numbers from the first one will be
> >> > taken on the client-side.
> > I thought that the retransmitted SYN packet, were caused by the SYN-ACK
> > didn't reach the client?
> 
> Or, if the SYN/ACK got somehow delayed in the network and the
> SYN-retransmission timer on the client-side fires before the SYN/ACK
> reaches the client.

That seems like a very unlikely situation, which we perhaps should
neglect as we are under SYN attack.

I will test the attack vector, if we instead of dropping the reqsk, fall
back into the slow locked path.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-30 21:20           ` Rick Jones
@ 2012-05-31  8:28             ` Eric Dumazet
  2012-05-31  8:45               ` Hans Schillstrom
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2012-05-31  8:28 UTC (permalink / raw)
  To: Rick Jones
  Cc: Hans Schillstrom, Andi Kleen, Jesper Dangaard Brouer,
	Jesper Dangaard Brouer, netdev@vger.kernel.org, Christoph Paasch,
	David S. Miller, Martin Topholm, Florian Westphal, Tom Herbert

On Wed, 2012-05-30 at 14:20 -0700, Rick Jones wrote:

> It may still be high, but a very quick netperf TCP_CC test over loopback 
> on a W3550 system running a 2.6.38 kernel shows:
> 
> raj@tardy:~/netperf2_trunk/src$ ./netperf -t TCP_CC -l 60 -c -C
> TCP Connect/Close TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> localhost.localdomain () port 0 AF_INET
> Local /Remote
> Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
> Send   Recv   Size    Size   Time    Rate     local  remote local   remote
> bytes  bytes  bytes   bytes  secs.   per sec  %      %      us/Tr   us/Tr
> 
> 16384  87380  1       1      60.00   21515.29   30.68  30.96  57.042  57.557
> 16384  87380
> 
> 57 microseconds per "transaction" which in this case is establishing and 
> tearing-down the connection, with nothing else (no data packets) makes 
> 19 microseconds for a SYN seem perhaps not all that beyond the realm of 
> possibility?

Thats a different story, on loopback device (without stressing IP route
cache by the way)

Your netperf test is a full userspace transactions, and 5 frames per
transaction. Two sockets creation/destruction, process scheduler
activations, and not enter syncookie mode.

In case of synflood/(syncookies on), we receive a packet and send one
from softirq.

One expensive thing might be the md5 to compute the SYNACK sequence.

I suspect other things :

1) Of course we have to take into account the timer responsible for
SYNACK retransmits of previously queued requests. Its cost depends on
the listen backlog. When this timer runs, listen socket is locked.

2) IP route cache overflows.
   In case of SYNFLOOD, we should not store dst(s) in route cache but
destroy them immediately.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-31  8:28             ` Eric Dumazet
@ 2012-05-31  8:45               ` Hans Schillstrom
  2012-05-31 14:09                 ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Hans Schillstrom @ 2012-05-31  8:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Rick Jones, Andi Kleen, Jesper Dangaard Brouer,
	Jesper Dangaard Brouer, netdev@vger.kernel.org, Christoph Paasch,
	David S. Miller, Martin Topholm, Florian Westphal, Tom Herbert

On Thursday 31 May 2012 10:28:37 Eric Dumazet wrote:
> On Wed, 2012-05-30 at 14:20 -0700, Rick Jones wrote:
> 
> > It may still be high, but a very quick netperf TCP_CC test over loopback 
> > on a W3550 system running a 2.6.38 kernel shows:
> > 
> > raj@tardy:~/netperf2_trunk/src$ ./netperf -t TCP_CC -l 60 -c -C
> > TCP Connect/Close TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
> > localhost.localdomain () port 0 AF_INET
> > Local /Remote
> > Socket Size   Request Resp.  Elapsed Trans.   CPU    CPU    S.dem   S.dem
> > Send   Recv   Size    Size   Time    Rate     local  remote local   remote
> > bytes  bytes  bytes   bytes  secs.   per sec  %      %      us/Tr   us/Tr
> > 
> > 16384  87380  1       1      60.00   21515.29   30.68  30.96  57.042  57.557
> > 16384  87380
> > 
> > 57 microseconds per "transaction" which in this case is establishing and 
> > tearing-down the connection, with nothing else (no data packets) makes 
> > 19 microseconds for a SYN seem perhaps not all that beyond the realm of 
> > possibility?
> 
> Thats a different story, on loopback device (without stressing IP route
> cache by the way)
> 
> Your netperf test is a full userspace transactions, and 5 frames per
> transaction. Two sockets creation/destruction, process scheduler
> activations, and not enter syncookie mode.
> 
> In case of synflood/(syncookies on), we receive a packet and send one
> from softirq.
> 
> One expensive thing might be the md5 to compute the SYNACK sequence.
> 
> I suspect other things :
> 
> 1) Of course we have to take into account the timer responsible for
> SYNACK retransmits of previously queued requests. Its cost depends on
> the listen backlog. When this timer runs, listen socket is locked.
> 
> 2) IP route cache overflows.
>    In case of SYNFLOOD, we should not store dst(s) in route cache but
> destroy them immediately.
> 
I can see plenty "IPv4: dst cache overflow"

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-30 22:40           ` Jesper Dangaard Brouer
@ 2012-05-31 12:51             ` Jesper Dangaard Brouer
  2012-05-31 12:58               ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-31 12:51 UTC (permalink / raw)
  To: christoph.paasch
  Cc: netdev, Eric Dumazet, David S. Miller, Martin Topholm,
	Florian Westphal, Hans Schillstrom, Andi Kleen

On Thu, 2012-05-31 at 00:40 +0200, Jesper Dangaard Brouer wrote:
> That seems like a very unlikely situation, which we perhaps should
> neglect as we are under SYN attack.
>
> I will test the attack vector, if we instead of dropping the reqsk,
> fall back into the slow locked path.

I can provoke this attack vector, and performance is worse, if not
dropping the reqsk early.

Generator SYN flood at 750Kpps, sending false retransmits mixture.

- With early drop: 406 Kpps
- With return to locked processing: 251 Kpps

Its still better than the approx 150Kpps, without any patches.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-31 12:51             ` Jesper Dangaard Brouer
@ 2012-05-31 12:58               ` Eric Dumazet
  2012-05-31 13:04                 ` Jesper Dangaard Brouer
  2012-05-31 13:10                 ` Eric Dumazet
  0 siblings, 2 replies; 32+ messages in thread
From: Eric Dumazet @ 2012-05-31 12:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: christoph.paasch, netdev, David S. Miller, Martin Topholm,
	Florian Westphal, Hans Schillstrom, Andi Kleen

On Thu, 2012-05-31 at 14:51 +0200, Jesper Dangaard Brouer wrote:
> On Thu, 2012-05-31 at 00:40 +0200, Jesper Dangaard Brouer wrote:
> > That seems like a very unlikely situation, which we perhaps should
> > neglect as we are under SYN attack.
> >
> > I will test the attack vector, if we instead of dropping the reqsk,
> > fall back into the slow locked path.
> 
> I can provoke this attack vector, and performance is worse, if not
> dropping the reqsk early.
> 
> Generator SYN flood at 750Kpps, sending false retransmits mixture.
> 
> - With early drop: 406 Kpps
> - With return to locked processing: 251 Kpps
> 
> Its still better than the approx 150Kpps, without any patches.
> 

How many different IP addresses are used by your generator ?

Or maybe you disabled IP route cache ?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-31 12:58               ` Eric Dumazet
@ 2012-05-31 13:04                 ` Jesper Dangaard Brouer
  2012-05-31 13:10                 ` Eric Dumazet
  1 sibling, 0 replies; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-31 13:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: christoph.paasch, netdev, David S. Miller, Martin Topholm,
	Florian Westphal, Hans Schillstrom, Andi Kleen

On Thu, 2012-05-31 at 14:58 +0200, Eric Dumazet wrote:
> On Thu, 2012-05-31 at 14:51 +0200, Jesper Dangaard Brouer wrote:
> > On Thu, 2012-05-31 at 00:40 +0200, Jesper Dangaard Brouer wrote:
> > > That seems like a very unlikely situation, which we perhaps should
> > > neglect as we are under SYN attack.
> > >
> > > I will test the attack vector, if we instead of dropping the reqsk,
> > > fall back into the slow locked path.
> > 
> > I can provoke this attack vector, and performance is worse, if not
> > dropping the reqsk early.
> > 
> > Generator SYN flood at 750Kpps, sending false retransmits mixture.
> > 
> > - With early drop: 406 Kpps
> > - With return to locked processing: 251 Kpps
> > 
> > Its still better than the approx 150Kpps, without any patches.
> > 
> 
> How many different IP addresses are used by your generator ?

In this attack I reduced the IPs to 255, and also the source port
numbers, and then simply cloned some of the SKBs.  But normally I use
65535 IPs 198.18.0.0/16 (the range reserved for benchmarking).


> Or maybe you disabled IP route cache ?

Why do you think I have disabled the IP dst route cache?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-31 12:58               ` Eric Dumazet
  2012-05-31 13:04                 ` Jesper Dangaard Brouer
@ 2012-05-31 13:10                 ` Eric Dumazet
  2012-05-31 13:24                   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2012-05-31 13:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: christoph.paasch, netdev, David S. Miller, Martin Topholm,
	Florian Westphal, Hans Schillstrom, Andi Kleen

On Thu, 2012-05-31 at 14:58 +0200, Eric Dumazet wrote:

> 
> How many different IP addresses are used by your generator ?
> 
> Or maybe you disabled IP route cache ?

With no route cache problems, I sustain 4 us per SYN packet, if all load
serviced by one cpu only.

perf profile is : (I have CONFIG_DEBUG_PAGEALLOC=y)

+   9,55%  ksoftirqd/0  [kernel.kallsyms]  [k] sha_transform
+   3,56%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_route_input_common
+   3,40%  ksoftirqd/0  [kernel.kallsyms]  [k] __ip_route_output_key
+   3,28%  ksoftirqd/0  [kernel.kallsyms]  [k] __inet_lookup_established
+   3,13%  ksoftirqd/0  [kernel.kallsyms]  [k] tg3_poll_work
+   2,68%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_make_synack
+   2,67%  ksoftirqd/0  [kernel.kallsyms]  [k] __netif_receive_skb
+   2,51%  ksoftirqd/0  [kernel.kallsyms]  [k] ipt_do_table
+   2,17%  ksoftirqd/0  [kernel.kallsyms]  [k] memcpy
+   1,99%  ksoftirqd/0  [kernel.kallsyms]  [k] kernel_map_pages
+   1,96%  ksoftirqd/0  [kernel.kallsyms]  [k] inet_csk_search_req
+   1,69%  ksoftirqd/0  [kernel.kallsyms]  [k] tg3_recycle_rx.isra.36
+   1,63%  ksoftirqd/0  [kernel.kallsyms]  [k] kmem_cache_free
+   1,61%  ksoftirqd/0  [kernel.kallsyms]  [k] copy_user_generic_string
+   1,49%  ksoftirqd/0  [kernel.kallsyms]  [k] kmem_cache_alloc
+   1,47%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_rcv
+   1,11%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_v4_conn_request
+   1,07%  ksoftirqd/0  [kernel.kallsyms]  [k] nf_iterate
+   1,07%      swapper  [kernel.kallsyms]  [k] sha_transform
+   1,05%  ksoftirqd/0  [kernel.kallsyms]  [k] kfree
+   1,05%  ksoftirqd/0  [kernel.kallsyms]  [k] skb_release_data
+   0,99%  ksoftirqd/0  [kernel.kallsyms]  [k] __alloc_skb
+   0,98%  ksoftirqd/0  [kernel.kallsyms]  [k] __kmalloc_node_track_caller
+   0,97%  ksoftirqd/0  [kernel.kallsyms]  [k] netdev_alloc_frag
+   0,96%  ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive
+   0,94%  ksoftirqd/0  [kernel.kallsyms]  [k] inet_gro_receive
+   0,85%  ksoftirqd/0  [kernel.kallsyms]  [k] build_skb
+   0,85%  ksoftirqd/0  [kernel.kallsyms]  [k] cookie_v4_init_sequence
+   0,85%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_build_and_send_pkt
+   0,84%  ksoftirqd/0  [kernel.kallsyms]  [k] __copy_skb_header
+   0,82%  ksoftirqd/0  [kernel.kallsyms]  [k] nf_hook_slow
+   0,77%  ksoftirqd/0  [kernel.kallsyms]  [k] __skb_clone
+   0,73%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_v4_rcv
+   0,72%  ksoftirqd/0  [kernel.kallsyms]  [k] xfrm_lookup
+   0,69%  ksoftirqd/0  [kernel.kallsyms]  [k] dev_hard_start_xmit
+   0,68%  ksoftirqd/0  [kernel.kallsyms]  [k] local_bh_enable
+   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_gro_receive
+   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] kfree_skb
+   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] __probe_kernel_read
+   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] skb_release_head_state
+   0,66%  ksoftirqd/0  [kernel.kallsyms]  [k] __phys_addr
+   0,66%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_finish_output
+   0,65%  ksoftirqd/0  [kernel.kallsyms]  [k] dst_release
+   0,64%  ksoftirqd/0  [kernel.kallsyms]  [k] __ip_local_out
+   0,61%  ksoftirqd/0  [kernel.kallsyms]  [k] packet_rcv_spkt
+   0,57%  ksoftirqd/0  [kernel.kallsyms]  [k] __kfree_skb

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods
  2012-05-31 13:10                 ` Eric Dumazet
@ 2012-05-31 13:24                   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 32+ messages in thread
From: Jesper Dangaard Brouer @ 2012-05-31 13:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: christoph.paasch, netdev, David S. Miller, Martin Topholm,
	Florian Westphal, Hans Schillstrom, Andi Kleen

On Thu, 2012-05-31 at 15:10 +0200, Eric Dumazet wrote:
> On Thu, 2012-05-31 at 14:58 +0200, Eric Dumazet wrote:
> 
> > 
> > How many different IP addresses are used by your generator ?
> > 
> > Or maybe you disabled IP route cache ?
> 
> With no route cache problems, I sustain 4 us per SYN packet, if all load
> serviced by one cpu only.

Yes that is also my experience, in this SYN-flood scenario one CPU does
a lot better.  My old home brew AMD quad-core CPU also outperform, the
big testlab machine dual socket quad-core Nehalem.

The route cache problem, should not be too big with my SYN cookie
solution.  I think... as tcp_v4_send_synack() handles alloc of a dst
route cache, but also releases it immediately afterwards.

How do you/I measure the usec per packet?

How do I disable the route cache?

What test tools do you use?
(I have modified pktgen to send TCP SYN packets)

(ps. I'll post my updated patch series, in a bit, and then I'll try not
to disturb your work on the fully parallel solution).



> perf profile is : (I have CONFIG_DEBUG_PAGEALLOC=y)
> 
> +   9,55%  ksoftirqd/0  [kernel.kallsyms]  [k] sha_transform
> +   3,56%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_route_input_common
> +   3,40%  ksoftirqd/0  [kernel.kallsyms]  [k] __ip_route_output_key
> +   3,28%  ksoftirqd/0  [kernel.kallsyms]  [k] __inet_lookup_established
> +   3,13%  ksoftirqd/0  [kernel.kallsyms]  [k] tg3_poll_work
> +   2,68%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_make_synack
> +   2,67%  ksoftirqd/0  [kernel.kallsyms]  [k] __netif_receive_skb
> +   2,51%  ksoftirqd/0  [kernel.kallsyms]  [k] ipt_do_table
> +   2,17%  ksoftirqd/0  [kernel.kallsyms]  [k] memcpy
> +   1,99%  ksoftirqd/0  [kernel.kallsyms]  [k] kernel_map_pages
> +   1,96%  ksoftirqd/0  [kernel.kallsyms]  [k] inet_csk_search_req
> +   1,69%  ksoftirqd/0  [kernel.kallsyms]  [k] tg3_recycle_rx.isra.36
> +   1,63%  ksoftirqd/0  [kernel.kallsyms]  [k] kmem_cache_free
> +   1,61%  ksoftirqd/0  [kernel.kallsyms]  [k] copy_user_generic_string
> +   1,49%  ksoftirqd/0  [kernel.kallsyms]  [k] kmem_cache_alloc
> +   1,47%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_rcv
> +   1,11%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_v4_conn_request
> +   1,07%  ksoftirqd/0  [kernel.kallsyms]  [k] nf_iterate
> +   1,07%      swapper  [kernel.kallsyms]  [k] sha_transform
> +   1,05%  ksoftirqd/0  [kernel.kallsyms]  [k] kfree
> +   1,05%  ksoftirqd/0  [kernel.kallsyms]  [k] skb_release_data
> +   0,99%  ksoftirqd/0  [kernel.kallsyms]  [k] __alloc_skb
> +   0,98%  ksoftirqd/0  [kernel.kallsyms]  [k] __kmalloc_node_track_caller
> +   0,97%  ksoftirqd/0  [kernel.kallsyms]  [k] netdev_alloc_frag
> +   0,96%  ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive
> +   0,94%  ksoftirqd/0  [kernel.kallsyms]  [k] inet_gro_receive
> +   0,85%  ksoftirqd/0  [kernel.kallsyms]  [k] build_skb
> +   0,85%  ksoftirqd/0  [kernel.kallsyms]  [k] cookie_v4_init_sequence
> +   0,85%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_build_and_send_pkt
> +   0,84%  ksoftirqd/0  [kernel.kallsyms]  [k] __copy_skb_header
> +   0,82%  ksoftirqd/0  [kernel.kallsyms]  [k] nf_hook_slow
> +   0,77%  ksoftirqd/0  [kernel.kallsyms]  [k] __skb_clone
> +   0,73%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_v4_rcv
> +   0,72%  ksoftirqd/0  [kernel.kallsyms]  [k] xfrm_lookup
> +   0,69%  ksoftirqd/0  [kernel.kallsyms]  [k] dev_hard_start_xmit
> +   0,68%  ksoftirqd/0  [kernel.kallsyms]  [k] local_bh_enable
> +   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] tcp_gro_receive
> +   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] kfree_skb
> +   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] __probe_kernel_read
> +   0,67%  ksoftirqd/0  [kernel.kallsyms]  [k] skb_release_head_state
> +   0,66%  ksoftirqd/0  [kernel.kallsyms]  [k] __phys_addr
> +   0,66%  ksoftirqd/0  [kernel.kallsyms]  [k] ip_finish_output
> +   0,65%  ksoftirqd/0  [kernel.kallsyms]  [k] dst_release
> +   0,64%  ksoftirqd/0  [kernel.kallsyms]  [k] __ip_local_out
> +   0,61%  ksoftirqd/0  [kernel.kallsyms]  [k] packet_rcv_spkt
> +   0,57%  ksoftirqd/0  [kernel.kallsyms]  [k] __kfree_skb

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-31  8:45               ` Hans Schillstrom
@ 2012-05-31 14:09                 ` Eric Dumazet
  2012-05-31 15:31                   ` Hans Schillstrom
  0 siblings, 1 reply; 32+ messages in thread
From: Eric Dumazet @ 2012-05-31 14:09 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Rick Jones, Andi Kleen, Jesper Dangaard Brouer,
	Jesper Dangaard Brouer, netdev@vger.kernel.org, Christoph Paasch,
	David S. Miller, Martin Topholm, Florian Westphal, Tom Herbert

On Thu, 2012-05-31 at 10:45 +0200, Hans Schillstrom wrote:

> I can see plenty "IPv4: dst cache overflow"
> 

This is probably the most problematic problem in DDOS attacks.

I have a patch for this problem.

Idea is to not cache dst entries for following cases :

1) Input dst, if listener queue is full (syncookies possibly engaged)

2) Output dst of SYNACK messages.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-31 14:09                 ` Eric Dumazet
@ 2012-05-31 15:31                   ` Hans Schillstrom
  2012-05-31 17:16                     ` Eric Dumazet
  0 siblings, 1 reply; 32+ messages in thread
From: Hans Schillstrom @ 2012-05-31 15:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Rick Jones, Andi Kleen, Jesper Dangaard Brouer,
	Jesper Dangaard Brouer, netdev@vger.kernel.org, Christoph Paasch,
	David S. Miller, Martin Topholm, Florian Westphal, Tom Herbert

On Thursday 31 May 2012 16:09:21 Eric Dumazet wrote:
> On Thu, 2012-05-31 at 10:45 +0200, Hans Schillstrom wrote:
> 
> > I can see plenty "IPv4: dst cache overflow"
> > 
> 
> This is probably the most problematic problem in DDOS attacks.
> 
> I have a patch for this problem.
> 
> Idea is to not cache dst entries for following cases :
> 
> 1) Input dst, if listener queue is full (syncookies possibly engaged)
> 
> 2) Output dst of SYNACK messages.
> 
Sound like a good idea, 
if you need some testing just the patches 

-- 
Regards
Hans Schillstrom <hans.schillstrom@ericsson.com>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods
  2012-05-31 15:31                   ` Hans Schillstrom
@ 2012-05-31 17:16                     ` Eric Dumazet
  0 siblings, 0 replies; 32+ messages in thread
From: Eric Dumazet @ 2012-05-31 17:16 UTC (permalink / raw)
  To: Hans Schillstrom
  Cc: Rick Jones, Andi Kleen, Jesper Dangaard Brouer,
	Jesper Dangaard Brouer, netdev@vger.kernel.org, Christoph Paasch,
	David S. Miller, Martin Topholm, Florian Westphal, Tom Herbert

On Thu, 2012-05-31 at 17:31 +0200, Hans Schillstrom wrote:
> On Thursday 31 May 2012 16:09:21 Eric Dumazet wrote:
> > On Thu, 2012-05-31 at 10:45 +0200, Hans Schillstrom wrote:
> > 
> > > I can see plenty "IPv4: dst cache overflow"
> > > 
> > 
> > This is probably the most problematic problem in DDOS attacks.
> > 
> > I have a patch for this problem.
> > 
> > Idea is to not cache dst entries for following cases :
> > 
> > 1) Input dst, if listener queue is full (syncookies possibly engaged)
> > 
> > 2) Output dst of SYNACK messages.
> > 
> Sound like a good idea, 
> if you need some testing just the patches 
> 

Here is the patch, works pretty well for me

 include/net/dst.h   |    1 +
 net/ipv4/route.c    |   20 +++++++++++++++-----
 net/ipv4/tcp_ipv4.c |    6 ++++++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index bed833d..e0109c4 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -60,6 +60,7 @@ struct dst_entry {
 #define DST_NOCOUNT		0x0020
 #define DST_NOPEER		0x0040
 #define DST_FAKE_RTABLE		0x0080
+#define DST_EPHEMERAL		0x0100
 
 	short			error;
 	short			obsolete;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 98b30d0..51b3e78 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -754,6 +754,15 @@ static inline int rt_is_expired(struct rtable *rth)
 	return rth->rt_genid != rt_genid(dev_net(rth->dst.dev));
 }
 
+static bool rt_is_expired_or_ephemeral(struct rtable *rth)
+{
+	if (rt_is_expired(rth))
+		return true;
+
+	return (atomic_read(&rth->dst.__refcnt) == 0) && 
+	       (rth->dst.flags & DST_EPHEMERAL);
+}
+
 /*
  * Perform a full scan of hash table and free all entries.
  * Can be called by a softirq or a process.
@@ -873,7 +882,7 @@ static void rt_check_expire(void)
 		while ((rth = rcu_dereference_protected(*rthp,
 					lockdep_is_held(rt_hash_lock_addr(i)))) != NULL) {
 			prefetch(rth->dst.rt_next);
-			if (rt_is_expired(rth)) {
+			if (rt_is_expired_or_ephemeral(rth)) {
 				*rthp = rth->dst.rt_next;
 				rt_free(rth);
 				continue;
@@ -1040,7 +1049,7 @@ static int rt_garbage_collect(struct dst_ops *ops)
 			spin_lock_bh(rt_hash_lock_addr(k));
 			while ((rth = rcu_dereference_protected(*rthp,
 					lockdep_is_held(rt_hash_lock_addr(k)))) != NULL) {
-				if (!rt_is_expired(rth) &&
+				if (!rt_is_expired_or_ephemeral(rth) &&
 					!rt_may_expire(rth, tmo, expire)) {
 					tmo >>= 1;
 					rthp = &rth->dst.rt_next;
@@ -1159,7 +1168,8 @@ restart:
 	candp = NULL;
 	now = jiffies;
 
-	if (!rt_caching(dev_net(rt->dst.dev))) {
+	if (!rt_caching(dev_net(rt->dst.dev)) ||
+	    dst_entries_get_fast(&ipv4_dst_ops) > (ip_rt_max_size >> 1)) {
 		/*
 		 * If we're not caching, just tell the caller we
 		 * were successful and don't touch the route.  The
@@ -1194,7 +1204,7 @@ restart:
 	spin_lock_bh(rt_hash_lock_addr(hash));
 	while ((rth = rcu_dereference_protected(*rthp,
 			lockdep_is_held(rt_hash_lock_addr(hash)))) != NULL) {
-		if (rt_is_expired(rth)) {
+		if (rt_is_expired_or_ephemeral(rth)) {
 			*rthp = rth->dst.rt_next;
 			rt_free(rth);
 			continue;
@@ -1390,7 +1400,7 @@ static void rt_del(unsigned int hash, struct rtable *rt)
 	ip_rt_put(rt);
 	while ((aux = rcu_dereference_protected(*rthp,
 			lockdep_is_held(rt_hash_lock_addr(hash)))) != NULL) {
-		if (aux == rt || rt_is_expired(aux)) {
+		if (aux == rt || rt_is_expired_or_ephemeral(aux)) {
 			*rthp = aux->dst.rt_next;
 			rt_free(aux);
 			continue;
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index a43b87d..30c5275 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -835,6 +835,9 @@ static int tcp_v4_send_synack(struct sock *sk, struct dst_entry *dst,
 	if (!dst && (dst = inet_csk_route_req(sk, &fl4, req)) == NULL)
 		return -1;
 
+	if (atomic_read(&dst->__refcnt) == 1)
+		dst->flags |= DST_EPHEMERAL;
+
 	skb = tcp_make_synack(sk, dst, req, rvp);
 
 	if (skb) {
@@ -1291,6 +1294,9 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	 * evidently real one.
 	 */
 	if (inet_csk_reqsk_queue_is_full(sk) && !isn) {
+		/* under attack, free dst as soon as possible */
+		skb_dst(skb)->flags |= DST_EPHEMERAL;
+
 		want_cookie = tcp_syn_flood_action(sk, skb, "TCP");
 		if (!want_cookie)
 			goto drop;

^ permalink raw reply related	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-05-31 17:16 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-28 11:52 [RFC PATCH 0/2] Faster/parallel SYN handling to mitigate SYN floods Jesper Dangaard Brouer
2012-05-28 11:52 ` [RFC PATCH 1/2] tcp: extract syncookie part of tcp_v4_conn_request() Jesper Dangaard Brouer
2012-05-28 11:52 ` [RFC PATCH 2/2] tcp: Early SYN limit and SYN cookie handling to mitigate SYN floods Jesper Dangaard Brouer
2012-05-29 19:37   ` Andi Kleen
2012-05-29 20:18     ` David Miller
2012-05-30  6:41     ` Eric Dumazet
2012-05-30  7:45       ` Jesper Dangaard Brouer
2012-05-30  8:15         ` Eric Dumazet
2012-05-30  9:24           ` Jesper Dangaard Brouer
2012-05-30  9:46             ` Eric Dumazet
2012-05-30  8:03       ` Hans Schillstrom
2012-05-30  8:24         ` Eric Dumazet
2012-05-30 11:14           ` Hans Schillstrom
2012-05-30 21:20           ` Rick Jones
2012-05-31  8:28             ` Eric Dumazet
2012-05-31  8:45               ` Hans Schillstrom
2012-05-31 14:09                 ` Eric Dumazet
2012-05-31 15:31                   ` Hans Schillstrom
2012-05-31 17:16                     ` Eric Dumazet
2012-05-28 16:14 ` [RFC PATCH 0/2] Faster/parallel SYN " Christoph Paasch
2012-05-29 20:17   ` Jesper Dangaard Brouer
2012-05-29 20:36     ` Christoph Paasch
2012-05-30  8:44       ` Jesper Dangaard Brouer
2012-05-30  8:50         ` Eric Dumazet
2012-05-30  8:53         ` Christoph Paasch
2012-05-30 22:40           ` Jesper Dangaard Brouer
2012-05-31 12:51             ` Jesper Dangaard Brouer
2012-05-31 12:58               ` Eric Dumazet
2012-05-31 13:04                 ` Jesper Dangaard Brouer
2012-05-31 13:10                 ` Eric Dumazet
2012-05-31 13:24                   ` Jesper Dangaard Brouer
2012-05-30  4:45     ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).