From mboxrd@z Thu Jan 1 00:00:00 1970 From: Brian Bloniarz Subject: Re: [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into account Date: Wed, 28 Apr 2010 11:41:12 -0400 Message-ID: <4BD85718.5000404@athenacr.com> References: <4BD6E887.3000804@athenacr.com> <20100427.095108.68126984.davem@davemloft.net> <1272388439.2295.369.camel@edumazet-laptop> <20100427.102038.57469310.davem@davemloft.net> <1272389872.2295.405.camel@edumazet-laptop> <1272399662.2343.12.camel@edumazet-laptop> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: David Miller , therbert@google.com, netdev@vger.kernel.org, rick.jones2@hp.com To: Eric Dumazet Return-path: Received: from sprinkles.athenacr.com ([64.95.46.210]:12908 "EHLO sprinkles.inp.in.athenacr.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754907Ab0D1PlQ (ORCPT ); Wed, 28 Apr 2010 11:41:16 -0400 In-Reply-To: <1272399662.2343.12.camel@edumazet-laptop> Sender: netdev-owner@vger.kernel.org List-ID: Eric Dumazet wrote: > Le mardi 27 avril 2010 =C3=A0 19:37 +0200, Eric Dumazet a =C3=A9crit = : >=20 >> We might use the ticket spinlock paradigm to let writers go in paral= lel >> and let the user the socket lock >> >> Instead of having the bh_lock_sock() to protect receive_queue *and* >> backlog, writers get a unique slot in a table, that 'user' can handl= e >> later. >> >> Or serialize writers (before they try to bh_lock_sock()) with a >> dedicated lock, so that user has 50% chances to get the sock lock, >> contending with at most one writer. >=20 > Following patch fixes the issue for me, with little performance hit o= n > fast path. >=20 > Under huge stress from a multiqueue/RPS enabled NIC, a single flow ud= p > receiver can now process ~200.000 pps (instead of ~100 pps before the > patch) on my dev machine. >=20 > Thanks ! >=20 > [PATCH net-next-2.6] net: sk_add_backlog() take rmem_alloc into accou= nt >=20 > Current socket backlog limit is not enough to really stop DDOS attack= s, > because user thread spend many time to process a full backlog each > round, and user might crazy spin on socket lock. >=20 > We should add backlog size and receive_queue size (aka rmem_alloc) to > pace writers, and let user run without being slow down too much. >=20 > Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock i= n > stress situations. >=20 > Under huge stress from a multiqueue/RPS enabled NIC, a single flow ud= p > receiver can now process ~200.000 pps (instead of ~100 pps before the > patch) on a 8 core machine. >=20 > Signed-off-by: Eric Dumazet Wow that was awesome. > --- > include/net/sock.h | 13 +++++++++++-- > net/core/sock.c | 5 ++++- > net/ipv4/udp.c | 4 ++++ > net/ipv6/udp.c | 8 ++++++++ > 4 files changed, 27 insertions(+), 3 deletions(-) >=20 > diff --git a/include/net/sock.h b/include/net/sock.h > index 86a8ca1..4b0097d 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -255,7 +255,6 @@ struct sock { > struct sk_buff *head; > struct sk_buff *tail; > int len; > - int limit; > } sk_backlog; > wait_queue_head_t *sk_sleep; > struct dst_entry *sk_dst_cache; > @@ -604,10 +603,20 @@ static inline void __sk_add_backlog(struct sock= *sk, struct sk_buff *skb) > skb->next =3D NULL; > } > =20 > +/* > + * Take into account size of receive queue and backlog queue > + */ > +static inline bool sk_rcvqueues_full(const struct sock *sk, const st= ruct sk_buff *skb) > +{ > + unsigned int qsize =3D sk->sk_backlog.len + atomic_read(&sk->sk_rme= m_alloc); > + > + return qsize + skb->truesize > sk->sk_rcvbuf; > +} > + Reading sk_backlog.len without the socket lock held seems to contradict the comment in sock.h: * @sk_backlog: always used with the per-socket spinlock held ... struct sock { ... /* * The backlog queue is special, it is always used with * the per-socket spinlock held and requires low latency * access. Therefore we special case it's implementation. */ struct { ... } sk_backlog; Is this just a doc mismatch or does sk_backlog.len need to use atomic_read & atomic_set? > /* The per-socket spinlock must be held here. */ > static inline __must_check int sk_add_backlog(struct sock *sk, struc= t sk_buff *skb) > { > - if (sk->sk_backlog.len >=3D max(sk->sk_backlog.limit, sk->sk_rcvbuf= << 1)) > + if (sk_rcvqueues_full(sk, skb)) > return -ENOBUFS; > =20 > __sk_add_backlog(sk, skb); > diff --git a/net/core/sock.c b/net/core/sock.c > index 58ebd14..5104175 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -327,6 +327,10 @@ int sk_receive_skb(struct sock *sk, struct sk_bu= ff *skb, const int nested) > =20 > skb->dev =3D NULL; > =20 > + if (sk_rcvqueues_full(sk, skb)) { > + atomic_inc(&sk->sk_drops); > + goto discard_and_relse; > + } > if (nested) > bh_lock_sock_nested(sk); > else > @@ -1885,7 +1889,6 @@ void sock_init_data(struct socket *sock, struct= sock *sk) > sk->sk_allocation =3D GFP_KERNEL; > sk->sk_rcvbuf =3D sysctl_rmem_default; > sk->sk_sndbuf =3D sysctl_wmem_default; > - sk->sk_backlog.limit =3D sk->sk_rcvbuf << 1; > sk->sk_state =3D TCP_CLOSE; > sk_set_socket(sk, sock); > =20 > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index 1e18f9c..776c844 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -1372,6 +1372,10 @@ int udp_queue_rcv_skb(struct sock *sk, struct = sk_buff *skb) > goto drop; > } > =20 > + > + if (sk_rcvqueues_full(sk, skb)) > + goto drop; > + > rc =3D 0; > =20 > bh_lock_sock(sk); > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c > index 2850e35..3ead20a 100644 > --- a/net/ipv6/udp.c > +++ b/net/ipv6/udp.c > @@ -584,6 +584,10 @@ static void flush_stack(struct sock **stack, uns= igned int count, > =20 > sk =3D stack[i]; > if (skb1) { > + if (sk_rcvqueues_full(sk, skb)) { > + kfree_skb(skb1); > + goto drop; > + } > bh_lock_sock(sk); > if (!sock_owned_by_user(sk)) > udpv6_queue_rcv_skb(sk, skb1); > @@ -759,6 +763,10 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct u= dp_table *udptable, > =20 > /* deliver */ > =20 > + if (sk_rcvqueues_full(sk, skb)) { > + sock_put(sk); > + goto discard; > + } > bh_lock_sock(sk); > if (!sock_owned_by_user(sk)) > udpv6_queue_rcv_skb(sk, skb); >=20 >=20 > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html