[PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account

Netdev List
 help / color / mirror / Atom feed

From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: bmb@athenacr.com, therbert@google.com, netdev@vger.kernel.org,
	rick.jones2@hp.com
Subject: [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
Date: Tue, 27 Apr 2010 22:21:02 +0200	[thread overview]
Message-ID: <1272399662.2343.12.camel@edumazet-laptop> (raw)
In-Reply-To: <1272389872.2295.405.camel@edumazet-laptop>

Le mardi 27 avril 2010 à 19:37 +0200, Eric Dumazet a écrit :

> We might use the ticket spinlock paradigm to let writers go in parallel
> and let the user the socket lock
> 
> Instead of having the bh_lock_sock() to protect receive_queue *and*
> backlog, writers get a unique slot in a table, that 'user' can handle
> later.
> 
> Or serialize writers (before they try to bh_lock_sock()) with a
> dedicated lock, so that user has 50% chances to get the sock lock,
> contending with at most one writer.

Following patch fixes the issue for me, with little performance hit on
fast path.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on my dev machine.

Thanks !

[PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account

Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |   13 +++++++++++--
 net/core/sock.c    |    5 ++++-
 net/ipv4/udp.c     |    4 ++++
 net/ipv6/udp.c     |    8 ++++++++
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 86a8ca1..4b0097d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -255,7 +255,6 @@ struct sock {
 		struct sk_buff *head;
 		struct sk_buff *tail;
 		int len;
-		int limit;
 	} sk_backlog;
 	wait_queue_head_t	*sk_sleep;
 	struct dst_entry	*sk_dst_cache;
@@ -604,10 +603,20 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 	skb->next = NULL;
 }
 
+/*
+ * Take into account size of receive queue and backlog queue
+ */
+static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
+{
+	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
+
+	return qsize + skb->truesize > sk->sk_rcvbuf;
+}
+
 /* The per-socket spinlock must be held here. */
 static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
 {
-	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1))
+	if (sk_rcvqueues_full(sk, skb))
 		return -ENOBUFS;
 
 	__sk_add_backlog(sk, skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 58ebd14..5104175 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -327,6 +327,10 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
 
 	skb->dev = NULL;
 
+	if (sk_rcvqueues_full(sk, skb)) {
+		atomic_inc(&sk->sk_drops);
+		goto discard_and_relse;
+	}
 	if (nested)
 		bh_lock_sock_nested(sk);
 	else
@@ -1885,7 +1889,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 	sk->sk_allocation	=	GFP_KERNEL;
 	sk->sk_rcvbuf		=	sysctl_rmem_default;
 	sk->sk_sndbuf		=	sysctl_wmem_default;
-	sk->sk_backlog.limit	=	sk->sk_rcvbuf << 1;
 	sk->sk_state		=	TCP_CLOSE;
 	sk_set_socket(sk, sock);
 
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 1e18f9c..776c844 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1372,6 +1372,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 			goto drop;
 	}
 
+
+	if (sk_rcvqueues_full(sk, skb))
+		goto drop;
+
 	rc = 0;
 
 	bh_lock_sock(sk);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2850e35..3ead20a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -584,6 +584,10 @@ static void flush_stack(struct sock **stack, unsigned int count,
 
 		sk = stack[i];
 		if (skb1) {
+			if (sk_rcvqueues_full(sk, skb)) {
+				kfree_skb(skb1);
+				goto drop;
+			}
 			bh_lock_sock(sk);
 			if (!sock_owned_by_user(sk))
 				udpv6_queue_rcv_skb(sk, skb1);
@@ -759,6 +763,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	/* deliver */
 
+	if (sk_rcvqueues_full(sk, skb)) {
+		sock_put(sk);
+		goto discard;
+	}
 	bh_lock_sock(sk);
 	if (!sock_owned_by_user(sk))
 		udpv6_queue_rcv_skb(sk, skb);

next prev parent reply	other threads:[~2010-04-27 20:21 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-23  5:54 [PATCH] bnx2x: add support for receive hashing Tom Herbert
2010-04-23  7:11 ` David Miller
2010-04-26 17:20 ` Eric Dumazet
2010-04-26 17:38   ` Tom Herbert
2010-04-26 17:47     ` Ben Hutchings
2010-04-26 17:56       ` David Miller
2010-04-26 18:04     ` David Miller
2010-04-26 18:19       ` Tom Herbert
2010-04-26 18:22         ` David Miller
2010-04-26 20:19           ` Rick Jones
2010-04-26 20:40             ` David Miller
2010-04-26 20:48               ` Rick Jones
2010-04-26 20:53                 ` David Miller
2010-04-26 21:12                   ` Rick Jones
2010-04-27 18:31                     ` Eilon Greenstein
2010-04-27 19:30                       ` Eric Dumazet
2010-04-26 20:58               ` Stephen Hemminger
2010-04-26 21:11               ` jamal
2010-04-26 21:14                 ` jamal
2010-04-26 23:27             ` Brian Bloniarz
2010-04-27 13:37       ` Brian Bloniarz
2010-04-27 14:30         ` Eric Dumazet
2010-04-27 15:44           ` Brian Bloniarz
2010-04-27 16:51         ` David Miller
2010-04-27 17:02           ` Brian Bloniarz
2010-04-27 17:06             ` David Miller
2010-04-27 17:13           ` Eric Dumazet
2010-04-27 17:20             ` David Miller
2010-04-27 17:37               ` Eric Dumazet
2010-04-27 20:21                 ` Eric Dumazet [this message]
2010-04-27 21:43                   ` [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account David Miller
2010-04-27 22:11                     ` David Miller
2010-04-27 22:19                       ` Eric Dumazet
2010-04-28 15:41                   ` Brian Bloniarz
2010-04-28 15:52                     ` Eric Dumazet
2010-04-27 17:31             ` [PATCH] bnx2x: add support for receive hashing Tom Herbert
2010-04-27 17:36               ` Eric Dumazet
2010-07-04 16:36 ` Vladislav Zolotarov
2010-07-04 16:46   ` Vladislav Zolotarov
2010-07-06  7:16   ` Vladislav Zolotarov
2010-07-07 19:17     ` Tom Herbert
2010-07-08  8:40       ` Vladislav Zolotarov
2010-07-11  2:12       ` David Miller
2010-07-11 10:02         ` Vladislav Zolotarov
2010-07-11 13:16           ` Vladislav Zolotarov
2010-07-11 16:45             ` Eric Dumazet
2010-07-11 17:22               ` Vladislav Zolotarov
2010-07-11 17:41                 ` Eric Dumazet
2010-07-11 18:18                   ` Vladislav Zolotarov
2010-07-11 22:34           ` David Miller

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:86a8ca1 dfblob:4b0097d dfblob:58ebd14 dfblob:5104175
dfblob:1e18f9c dfblob:776c844 dfblob:2850e35 dfblob:3ead20a )
 OR (
bs:"[PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1272399662.2343.12.camel@edumazet-laptop \
    --to=eric.dumazet@gmail.com \
    --cc=bmb@athenacr.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox