All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Bloniarz <bmb@athenacr.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>,
	therbert@google.com, netdev@vger.kernel.org, rick.jones2@hp.com
Subject: Re: [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
Date: Wed, 28 Apr 2010 11:41:12 -0400	[thread overview]
Message-ID: <4BD85718.5000404@athenacr.com> (raw)
In-Reply-To: <1272399662.2343.12.camel@edumazet-laptop>

Eric Dumazet wrote:
> Le mardi 27 avril 2010 à 19:37 +0200, Eric Dumazet a écrit :
> 
>> We might use the ticket spinlock paradigm to let writers go in parallel
>> and let the user the socket lock
>>
>> Instead of having the bh_lock_sock() to protect receive_queue *and*
>> backlog, writers get a unique slot in a table, that 'user' can handle
>> later.
>>
>> Or serialize writers (before they try to bh_lock_sock()) with a
>> dedicated lock, so that user has 50% chances to get the sock lock,
>> contending with at most one writer.
> 
> Following patch fixes the issue for me, with little performance hit on
> fast path.
> 
> Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
> receiver can now process ~200.000 pps (instead of ~100 pps before the
> patch) on my dev machine.
> 
> Thanks !
> 
> [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account
> 
> Current socket backlog limit is not enough to really stop DDOS attacks,
> because user thread spend many time to process a full backlog each
> round, and user might crazy spin on socket lock.
> 
> We should add backlog size and receive_queue size (aka rmem_alloc) to
> pace writers, and let user run without being slow down too much.
> 
> Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
> stress situations.
> 
> Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
> receiver can now process ~200.000 pps (instead of ~100 pps before the
> patch) on a 8 core machine.
> 
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Wow that was awesome.

> ---
>  include/net/sock.h |   13 +++++++++++--
>  net/core/sock.c    |    5 ++++-
>  net/ipv4/udp.c     |    4 ++++
>  net/ipv6/udp.c     |    8 ++++++++
>  4 files changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 86a8ca1..4b0097d 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -255,7 +255,6 @@ struct sock {
>  		struct sk_buff *head;
>  		struct sk_buff *tail;
>  		int len;
> -		int limit;
>  	} sk_backlog;
>  	wait_queue_head_t	*sk_sleep;
>  	struct dst_entry	*sk_dst_cache;
> @@ -604,10 +603,20 @@ static inline void __sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  	skb->next = NULL;
>  }
>  
> +/*
> + * Take into account size of receive queue and backlog queue
> + */
> +static inline bool sk_rcvqueues_full(const struct sock *sk, const struct sk_buff *skb)
> +{
> +	unsigned int qsize = sk->sk_backlog.len + atomic_read(&sk->sk_rmem_alloc);
> +
> +	return qsize + skb->truesize > sk->sk_rcvbuf;
> +}
> +

Reading sk_backlog.len without the socket lock held seems to
contradict the comment in sock.h:
  *	@sk_backlog: always used with the per-socket spinlock held
  ...
struct sock {

        ...
	/*
	 * The backlog queue is special, it is always used with
	 * the per-socket spinlock held and requires low latency
	 * access. Therefore we special case it's implementation.
	 */
	struct { ... } sk_backlog;

Is this just a doc mismatch or does sk_backlog.len need to use
atomic_read & atomic_set?

>  /* The per-socket spinlock must be held here. */
>  static inline __must_check int sk_add_backlog(struct sock *sk, struct sk_buff *skb)
>  {
> -	if (sk->sk_backlog.len >= max(sk->sk_backlog.limit, sk->sk_rcvbuf << 1))
> +	if (sk_rcvqueues_full(sk, skb))
>  		return -ENOBUFS;
>  
>  	__sk_add_backlog(sk, skb);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 58ebd14..5104175 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -327,6 +327,10 @@ int sk_receive_skb(struct sock *sk, struct sk_buff *skb, const int nested)
>  
>  	skb->dev = NULL;
>  
> +	if (sk_rcvqueues_full(sk, skb)) {
> +		atomic_inc(&sk->sk_drops);
> +		goto discard_and_relse;
> +	}
>  	if (nested)
>  		bh_lock_sock_nested(sk);
>  	else
> @@ -1885,7 +1889,6 @@ void sock_init_data(struct socket *sock, struct sock *sk)
>  	sk->sk_allocation	=	GFP_KERNEL;
>  	sk->sk_rcvbuf		=	sysctl_rmem_default;
>  	sk->sk_sndbuf		=	sysctl_wmem_default;
> -	sk->sk_backlog.limit	=	sk->sk_rcvbuf << 1;
>  	sk->sk_state		=	TCP_CLOSE;
>  	sk_set_socket(sk, sock);
>  
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 1e18f9c..776c844 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1372,6 +1372,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
>  			goto drop;
>  	}
>  
> +
> +	if (sk_rcvqueues_full(sk, skb))
> +		goto drop;
> +
>  	rc = 0;
>  
>  	bh_lock_sock(sk);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 2850e35..3ead20a 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -584,6 +584,10 @@ static void flush_stack(struct sock **stack, unsigned int count,
>  
>  		sk = stack[i];
>  		if (skb1) {
> +			if (sk_rcvqueues_full(sk, skb)) {
> +				kfree_skb(skb1);
> +				goto drop;
> +			}
>  			bh_lock_sock(sk);
>  			if (!sock_owned_by_user(sk))
>  				udpv6_queue_rcv_skb(sk, skb1);
> @@ -759,6 +763,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
>  
>  	/* deliver */
>  
> +	if (sk_rcvqueues_full(sk, skb)) {
> +		sock_put(sk);
> +		goto discard;
> +	}
>  	bh_lock_sock(sk);
>  	if (!sock_owned_by_user(sk))
>  		udpv6_queue_rcv_skb(sk, skb);
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  parent reply	other threads:[~2010-04-28 15:41 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-23  5:54 [PATCH] bnx2x: add support for receive hashing Tom Herbert
2010-04-23  7:11 ` David Miller
2010-04-26 17:20 ` Eric Dumazet
2010-04-26 17:38   ` Tom Herbert
2010-04-26 17:47     ` Ben Hutchings
2010-04-26 17:56       ` David Miller
2010-04-26 18:04     ` David Miller
2010-04-26 18:19       ` Tom Herbert
2010-04-26 18:22         ` David Miller
2010-04-26 20:19           ` Rick Jones
2010-04-26 20:40             ` David Miller
2010-04-26 20:48               ` Rick Jones
2010-04-26 20:53                 ` David Miller
2010-04-26 21:12                   ` Rick Jones
2010-04-27 18:31                     ` Eilon Greenstein
2010-04-27 19:30                       ` Eric Dumazet
2010-04-26 20:58               ` Stephen Hemminger
2010-04-26 21:11               ` jamal
2010-04-26 21:14                 ` jamal
2010-04-26 23:27             ` Brian Bloniarz
2010-04-27 13:37       ` Brian Bloniarz
2010-04-27 14:30         ` Eric Dumazet
2010-04-27 15:44           ` Brian Bloniarz
2010-04-27 16:51         ` David Miller
2010-04-27 17:02           ` Brian Bloniarz
2010-04-27 17:06             ` David Miller
2010-04-27 17:13           ` Eric Dumazet
2010-04-27 17:20             ` David Miller
2010-04-27 17:37               ` Eric Dumazet
2010-04-27 20:21                 ` [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account Eric Dumazet
2010-04-27 21:43                   ` David Miller
2010-04-27 22:11                     ` David Miller
2010-04-27 22:19                       ` Eric Dumazet
2010-04-28 15:41                   ` Brian Bloniarz [this message]
2010-04-28 15:52                     ` Eric Dumazet
2010-04-27 17:31             ` [PATCH] bnx2x: add support for receive hashing Tom Herbert
2010-04-27 17:36               ` Eric Dumazet
2010-07-04 16:36 ` Vladislav Zolotarov
2010-07-04 16:46   ` Vladislav Zolotarov
2010-07-06  7:16   ` Vladislav Zolotarov
2010-07-07 19:17     ` Tom Herbert
2010-07-08  8:40       ` Vladislav Zolotarov
2010-07-11  2:12       ` David Miller
2010-07-11 10:02         ` Vladislav Zolotarov
2010-07-11 13:16           ` Vladislav Zolotarov
2010-07-11 16:45             ` Eric Dumazet
2010-07-11 17:22               ` Vladislav Zolotarov
2010-07-11 17:41                 ` Eric Dumazet
2010-07-11 18:18                   ` Vladislav Zolotarov
2010-07-11 22:34           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4BD85718.5000404@athenacr.com \
    --to=bmb@athenacr.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=rick.jones2@hp.com \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.