netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Yingliang <yangyingliang@huawei.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <netdev@vger.kernel.org>, <davem@davemloft.net>,
	Ding Tianhong <dingtianhong@huawei.com>
Subject: Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
Date: Thu, 7 Apr 2016 13:59:53 +0800	[thread overview]
Message-ID: <5705F759.9020003@huawei.com> (raw)
In-Reply-To: <1459345637.6473.205.camel@edumazet-glaptop3.roam.corp.google.com>



On 2016/3/30 21:47, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:
>
>> Sorry, I made a mistake. I am very sure my kernel has these two patches.
>> And I can get some dropping of the packets in 10Gb eth.
>>
>> # netstat -s | grep -i backlog
>>       TCPBacklogDrop: 4135
>> # netstat -s | grep -i backlog
>>       TCPBacklogDrop: 4167
>
> Sender will retransmit and the receiver backlog will lilely be emptied
> before the packets arrive again.
>
> Are you sure these are TCP drops ?
Yes.

>
> Which 10Gb NIC is it ? (ethtool -i eth0)
The NIC driver is not upstream. And my system is arm64.

>
> What is the max size of sendmsg() chunks are generated by your apps ?
256KB

>
> Are they forcing small SO_RCVBUF or SO_SNDBUF ?
I am not sure.
I add some debug message in kernel:
[2016-04-06 10:56:55][ 1365.477140] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12402232 rmem_alloc:0 truesize:53320
[2016-04-06 10:56:55][ 1365.477170] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12460884 rmem_alloc:55986 truesize:58652
[2016-04-06 10:56:55][ 1365.477192] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12506206 rmem_alloc:0 truesize:45322
[2016-04-06 10:56:55][ 1365.477226] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12519536 rmem_alloc:7998 truesize:13330
[2016-04-06 10:56:55][ 1365.477254] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12575522 rmem_alloc:0 truesize:55986
[2016-04-06 10:56:55][ 1365.477282] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477301] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:26660 truesize:31992
[2016-04-06 10:56:55][ 1365.477321] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:26660
[2016-04-06 10:56:55][ 1365.477341] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:42656
[2016-04-06 10:56:55][ 1365.477384] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477403] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:34658

>
> What percentage of drops do you have ?
netstat -s | grep -i TCPBacklogDrop increases 20-40 per second.
It's about 1.2% (117724(TCPBacklogDrop)/214502873(InSegs of cat 
/proc/net/snmp)).

>
> Here (at Google), we have less than one backlog drop per billion
> packets, on host facing the public Internet.
>
> If a TCP sender sends a burst of tiny packets because it is misbehaving,
> you absolutely will drop packets, especially if applications use
> sendmsg() with very big lengths and big SO_SNDBUF.
>
> Trying to not drop these hostile packets as you did is simply opening
> your host to DOS attacks.
>
> Eventually, we should even drop earlier in TCP stack (before taking
> socket lock).
>
>
How about expand the buffer like:

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d204f3..da1bc16 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
  extern int sysctl_tcp_min_tso_segs;
  extern int sysctl_tcp_autocorking;
  extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_backlog_buf_multi;

  extern atomic_long_t tcp_memory_allocated;
  extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index f0e8297..9511410 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -631,6 +631,13 @@ static struct ctl_table ipv4_table[] = {
  		.mode		= 0644,
  		.proc_handler	= proc_dointvec
  	},
+	{
+		.procname	= "tcp_backlog_buf_multi",
+		.data		= &sysctl_tcp_backlog_buf_multi,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
  #ifdef CONFIG_NETLABEL
  	{
  		.procname	= "cipso_cache_enable",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87463c8..337ad55 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
  int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
  int sysctl_tcp_early_retrans __read_mostly = 3;
  int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
+int sysctl_tcp_backlog_buf_multi __read_mostly = 1;
+EXPORT_SYMBOL(sysctl_tcp_backlog_buf_multi);

  #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
  #define FLAG_WIN_UPDATE		0x02 /* Incoming ACK was a window update.	*/
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13b92d5..39272f3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1635,7 +1635,8 @@ process:
  		if (!tcp_prequeue(sk, skb))
  			ret = tcp_v4_do_rcv(sk, skb);
  	} else if (unlikely(sk_add_backlog(sk, skb,
-					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
+					   (sk->sk_rcvbuf + sk->sk_sndbuf) *
+					   sysctl_tcp_backlog_buf_multi))) {
  		bh_unlock_sock(sk);
  		NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
  		goto discard_and_relse;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c1147ac..1e8f709 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1433,7 +1433,8 @@ process:
  		if (!tcp_prequeue(sk, skb))
  			ret = tcp_v6_do_rcv(sk, skb);
  	} else if (unlikely(sk_add_backlog(sk, skb,
-					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
+					   (sk->sk_rcvbuf + sk->sk_sndbuf) *
+					   sysctl_tcp_backlog_buf_multi))) {
  		bh_unlock_sock(sk);
  		NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
  		goto discard_and_relse;
-- 

  reply	other threads:[~2016-04-07  6:06 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-30  5:16 [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk Yang Yingliang
2016-03-30  5:25 ` Eric Dumazet
2016-03-30  5:34   ` Eric Dumazet
2016-03-30  5:56     ` Yang Yingliang
2016-03-30 13:47       ` Eric Dumazet
2016-04-07  5:59         ` Yang Yingliang [this message]
2016-04-07 10:21           ` Eric Dumazet
2016-04-07 14:51             ` Eric Dumazet
2016-04-08 11:18               ` Yang Yingliang
2016-04-08 14:44                 ` Eric Dumazet
2016-04-08 16:53                   ` David Miller
2016-04-08 17:04                     ` Eric Dumazet
2016-04-11 14:42                       ` Yang Yingliang
2016-04-11 11:57                   ` Yang Yingliang
2016-04-11 12:13                     ` Eric Dumazet
2016-04-12  2:59                       ` Yang Yingliang
2016-04-12 12:31                         ` Yang Yingliang
2016-04-13  2:42                           ` Eric Dumazet
2016-03-30  5:38   ` Yang Yingliang
2016-03-30 12:56 ` Sergei Shtylyov
2016-04-07  6:01   ` Yang Yingliang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5705F759.9020003@huawei.com \
    --to=yangyingliang@huawei.com \
    --cc=davem@davemloft.net \
    --cc=dingtianhong@huawei.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).