From: Yang Yingliang <yangyingliang@huawei.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: <netdev@vger.kernel.org>, <davem@davemloft.net>,
Ding Tianhong <dingtianhong@huawei.com>
Subject: Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
Date: Thu, 7 Apr 2016 13:59:53 +0800 [thread overview]
Message-ID: <5705F759.9020003@huawei.com> (raw)
In-Reply-To: <1459345637.6473.205.camel@edumazet-glaptop3.roam.corp.google.com>
On 2016/3/30 21:47, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:
>
>> Sorry, I made a mistake. I am very sure my kernel has these two patches.
>> And I can get some dropping of the packets in 10Gb eth.
>>
>> # netstat -s | grep -i backlog
>> TCPBacklogDrop: 4135
>> # netstat -s | grep -i backlog
>> TCPBacklogDrop: 4167
>
> Sender will retransmit and the receiver backlog will lilely be emptied
> before the packets arrive again.
>
> Are you sure these are TCP drops ?
Yes.
>
> Which 10Gb NIC is it ? (ethtool -i eth0)
The NIC driver is not upstream. And my system is arm64.
>
> What is the max size of sendmsg() chunks are generated by your apps ?
256KB
>
> Are they forcing small SO_RCVBUF or SO_SNDBUF ?
I am not sure.
I add some debug message in kernel:
[2016-04-06 10:56:55][ 1365.477140] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12402232 rmem_alloc:0 truesize:53320
[2016-04-06 10:56:55][ 1365.477170] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12460884 rmem_alloc:55986 truesize:58652
[2016-04-06 10:56:55][ 1365.477192] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12506206 rmem_alloc:0 truesize:45322
[2016-04-06 10:56:55][ 1365.477226] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12519536 rmem_alloc:7998 truesize:13330
[2016-04-06 10:56:55][ 1365.477254] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12575522 rmem_alloc:0 truesize:55986
[2016-04-06 10:56:55][ 1365.477282] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477301] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12634174 rmem_alloc:26660 truesize:31992
[2016-04-06 10:56:55][ 1365.477321] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:26660
[2016-04-06 10:56:55][ 1365.477341] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:42656
[2016-04-06 10:56:55][ 1365.477384] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477403] TCP: rcvbuf:10485760 sndbuf:2097152
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:34658
>
> What percentage of drops do you have ?
netstat -s | grep -i TCPBacklogDrop increases 20-40 per second.
It's about 1.2% (117724(TCPBacklogDrop)/214502873(InSegs of cat
/proc/net/snmp)).
>
> Here (at Google), we have less than one backlog drop per billion
> packets, on host facing the public Internet.
>
> If a TCP sender sends a burst of tiny packets because it is misbehaving,
> you absolutely will drop packets, especially if applications use
> sendmsg() with very big lengths and big SO_SNDBUF.
>
> Trying to not drop these hostile packets as you did is simply opening
> your host to DOS attacks.
>
> Eventually, we should even drop earlier in TCP stack (before taking
> socket lock).
>
>
How about expand the buffer like:
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d204f3..da1bc16 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
extern int sysctl_tcp_min_tso_segs;
extern int sysctl_tcp_autocorking;
extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_backlog_buf_multi;
extern atomic_long_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index f0e8297..9511410 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -631,6 +631,13 @@ static struct ctl_table ipv4_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "tcp_backlog_buf_multi",
+ .data = &sysctl_tcp_backlog_buf_multi,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
#ifdef CONFIG_NETLABEL
{
.procname = "cipso_cache_enable",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87463c8..337ad55 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_early_retrans __read_mostly = 3;
int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
+int sysctl_tcp_backlog_buf_multi __read_mostly = 1;
+EXPORT_SYMBOL(sysctl_tcp_backlog_buf_multi);
#define FLAG_DATA 0x01 /* Incoming frame contained data. */
#define FLAG_WIN_UPDATE 0x02 /* Incoming ACK was a window update. */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13b92d5..39272f3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1635,7 +1635,8 @@ process:
if (!tcp_prequeue(sk, skb))
ret = tcp_v4_do_rcv(sk, skb);
} else if (unlikely(sk_add_backlog(sk, skb,
- sk->sk_rcvbuf + sk->sk_sndbuf))) {
+ (sk->sk_rcvbuf + sk->sk_sndbuf) *
+ sysctl_tcp_backlog_buf_multi))) {
bh_unlock_sock(sk);
NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
goto discard_and_relse;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c1147ac..1e8f709 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1433,7 +1433,8 @@ process:
if (!tcp_prequeue(sk, skb))
ret = tcp_v6_do_rcv(sk, skb);
} else if (unlikely(sk_add_backlog(sk, skb,
- sk->sk_rcvbuf + sk->sk_sndbuf))) {
+ (sk->sk_rcvbuf + sk->sk_sndbuf) *
+ sysctl_tcp_backlog_buf_multi))) {
bh_unlock_sock(sk);
NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
goto discard_and_relse;
--
next prev parent reply other threads:[~2016-04-07 6:06 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-30 5:16 [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk Yang Yingliang
2016-03-30 5:25 ` Eric Dumazet
2016-03-30 5:34 ` Eric Dumazet
2016-03-30 5:56 ` Yang Yingliang
2016-03-30 13:47 ` Eric Dumazet
2016-04-07 5:59 ` Yang Yingliang [this message]
2016-04-07 10:21 ` Eric Dumazet
2016-04-07 14:51 ` Eric Dumazet
2016-04-08 11:18 ` Yang Yingliang
2016-04-08 14:44 ` Eric Dumazet
2016-04-08 16:53 ` David Miller
2016-04-08 17:04 ` Eric Dumazet
2016-04-11 14:42 ` Yang Yingliang
2016-04-11 11:57 ` Yang Yingliang
2016-04-11 12:13 ` Eric Dumazet
2016-04-12 2:59 ` Yang Yingliang
2016-04-12 12:31 ` Yang Yingliang
2016-04-13 2:42 ` Eric Dumazet
2016-03-30 5:38 ` Yang Yingliang
2016-03-30 12:56 ` Sergei Shtylyov
2016-04-07 6:01 ` Yang Yingliang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5705F759.9020003@huawei.com \
--to=yangyingliang@huawei.com \
--cc=davem@davemloft.net \
--cc=dingtianhong@huawei.com \
--cc=eric.dumazet@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.