* /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? @ 2014-07-17 10:52 Denys Fedoryshchenko 2014-07-17 11:51 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-07-17 10:52 UTC (permalink / raw) To: netdev; +Cc: kaber, davem Hi I noticed TCP transfer rate slowdown after few days of operation on kernel 3.15.3, after some digging found out this: balancer-backup ~ # cat /proc/net/sockstat sockets: used 118236 TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339 UDP: inuse 1 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 1 memory 2 after shutting down program balancer-backup ~ # cat /proc/net/sockstat sockets: used 47 TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061 UDP: inuse 0 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0 sysctl settings: net.ipv4.tcp_mem = 1767103 2045612 3068412 I restarted recently process, and mem value didnt changed (while because it is sockets should release all memory), also it looks incorrect, because at same time: balancer-backup ~ # cat /proc/meminfo MemTotal: 32939492 kB MemFree: 29876564 kB While 1955339 * 4096 should be around 8GB. Probably it is just accounting issue or is it real memory leak? What other info i can provide to troubleshoot this info more properly? I will upgrade to 3.15.5 also now, to see if issue persist there. Also i noticed several warnings: [1116634.378936] ------------[ cut here ]------------ [1116634.379169] WARNING: CPU: 0 PID: 28350 at net/core/stream.c:201 sk_stream_kill_queues+0xff/0x104() [1116634.379606] Modules linked in: microcode xt_tcpudp xt_mark iptable_mangle ip_tables x_tables 8021q garp stp mrp llc [1116634.380069] CPU: 0 PID: 28350 Comm: haproxy Tainted: G W 3.15.3-build-0007 #2 [1116634.380492] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 2.0.11 02/26/2010 [1116634.380921] 0000000000000000 ffff880778393db0 ffffffff8160042b 0000000000000000 [1116634.381352] ffff880778393de8 ffffffff810b4e03 ffffffff81584db2 ffff8807e4df9380 [1116634.381780] ffff8807e4df94c8 0000000000000007 ffff8807e4df93f0 ffff880778393df8 [1116634.382212] Call Trace: [1116634.382440] [<ffffffff8160042b>] dump_stack+0x45/0x56 [1116634.382659] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e [1116634.382871] [<ffffffff81584db2>] ? sk_stream_kill_queues+0xff/0x104 [1116634.383087] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17 [1116634.383308] [<ffffffff81584db2>] sk_stream_kill_queues+0xff/0x104 [1116634.383522] [<ffffffff815bf066>] inet_csk_destroy_sock+0x77/0xb7 [1116634.383741] [<ffffffff815c31c1>] tcp_close+0x287/0x37a [1116634.383953] [<ffffffff815e03d5>] inet_release+0x6f/0x76 [1116634.384167] [<ffffffff81578bfe>] sock_release+0x1a/0x79 [1116634.384379] [<ffffffff81578c6a>] sock_close+0xd/0x11 [1116634.384600] [<ffffffff8115750e>] __fput+0xdc/0x18d [1116634.384826] [<ffffffff811575eb>] ____fput+0x9/0xb [1116634.385052] [<ffffffff810ca5e2>] task_work_run+0x78/0x8e [1116634.385276] [<ffffffff81002880>] do_notify_resume+0x52/0x60 [1116634.385504] [<ffffffff81606970>] int_signal+0x12/0x17 [1116634.385728] ---[ end trace fb11499084e23ab6 ]--- [1116634.386531] ------------[ cut here ]------------ [1116634.386792] WARNING: CPU: 0 PID: 28350 at net/ipv4/af_inet.c:153 inet_sock_destruct+0x160/0x189() [1116634.387264] Modules linked in: microcode xt_tcpudp xt_mark iptable_mangle ip_tables x_tables 8021q garp stp mrp llc [1116634.387781] CPU: 0 PID: 28350 Comm: haproxy Tainted: G W 3.15.3-build-0007 #2 [1116634.388236] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 2.0.11 02/26/2010 [1116634.388680] 0000000000000000 ffff880778393d98 ffffffff8160042b 0000000000000000 [1116634.389157] ffff880778393dd0 ffffffff810b4e03 ffffffff815e00d7 ffff8807e4df9380 [1116634.389602] ffff8807e4df94c8 0000000000000007 ffff8807e4df93f0 ffff880778393de0 [1116634.390046] Call Trace: [1116634.390270] [<ffffffff8160042b>] dump_stack+0x45/0x56 [1116634.390499] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e [1116634.390742] [<ffffffff815e00d7>] ? inet_sock_destruct+0x160/0x189 [1116634.390979] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17 [1116634.391221] [<ffffffff815e00d7>] inet_sock_destruct+0x160/0x189 [1116634.391457] [<ffffffff8157ca3a>] __sk_free+0x18/0xd5 [1116634.391688] [<ffffffff8157cb0a>] sk_free+0x13/0x15 [1116634.391927] [<ffffffff815c32a8>] tcp_close+0x36e/0x37a [1116634.392141] [<ffffffff815e03d5>] inet_release+0x6f/0x76 [1116634.392484] [<ffffffff81578bfe>] sock_release+0x1a/0x79 [1116634.392774] [<ffffffff81578c6a>] sock_close+0xd/0x11 [1116634.392996] [<ffffffff8115750e>] __fput+0xdc/0x18d [1116634.393226] [<ffffffff811575eb>] ____fput+0x9/0xb [1116634.393450] [<ffffffff810ca5e2>] task_work_run+0x78/0x8e [1116634.393696] [<ffffffff81002880>] do_notify_resume+0x52/0x60 [1116634.393935] [<ffffffff81606970>] int_signal+0x12/0x17 [1116634.394159] ---[ end trace fb11499084e23ab7 ]--- P.S. After restarting server and around 5 minutes of operation: sockets: used 109439 TCP: inuse 110642 orphan 1372 tw 98904 alloc 110768 mem 215254 UDP: inuse 1 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0 ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? 2014-07-17 10:52 /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? Denys Fedoryshchenko @ 2014-07-17 11:51 ` Eric Dumazet 2014-07-17 12:13 ` Denys Fedoryshchenko 2014-07-17 13:12 ` Denys Fedoryshchenko 0 siblings, 2 replies; 25+ messages in thread From: Eric Dumazet @ 2014-07-17 11:51 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote: > Hi > > I noticed TCP transfer rate slowdown after few days of operation on > kernel 3.15.3, after some digging found out this: What was previous version you were using without this problem ? > > balancer-backup ~ # cat /proc/net/sockstat > sockets: used 118236 > TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339 > UDP: inuse 1 mem 0 > UDPLITE: inuse 0 > RAW: inuse 0 > FRAG: inuse 1 memory 2 > > after shutting down program > balancer-backup ~ # cat /proc/net/sockstat > sockets: used 47 > TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061 > UDP: inuse 0 mem 0 > UDPLITE: inuse 0 > RAW: inuse 0 > FRAG: inuse 0 memory 0 > > sysctl settings: > net.ipv4.tcp_mem = 1767103 2045612 3068412 > > I restarted recently process, and mem value didnt changed (while because > it is sockets should release all memory), also it looks incorrect, > because at same time: > balancer-backup ~ # cat /proc/meminfo > MemTotal: 32939492 kB > MemFree: 29876564 kB > > While 1955339 * 4096 should be around 8GB. > Probably it is just accounting issue or is it real memory leak? > What other info i can provide to troubleshoot this info more properly? > I will upgrade to 3.15.5 also now, to see if issue persist there. I see nothing really wrong in your report. It looks like you have a lot of sockets around after shutdown of the program. Each FIN-WAIT socket might consume a lot of buffers in its write queue, unless you use/force SO_LINGER or something. If you try the following command, you might see how many sockets have outstanding data. ss -amn ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? 2014-07-17 11:51 ` Eric Dumazet @ 2014-07-17 12:13 ` Denys Fedoryshchenko 2014-07-17 13:12 ` Denys Fedoryshchenko 1 sibling, 0 replies; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-07-17 12:13 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, kaber, davem On 2014-07-17 14:51, Eric Dumazet wrote: > On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote: >> Hi >> >> I noticed TCP transfer rate slowdown after few days of operation on >> kernel 3.15.3, after some digging found out this: > > What was previous version you were using without this problem ? It is fresh installations, just noticed on units with heavy load (200-300k sockets at peak time) after few days it will exhaust memory completely, in one case it makes unit completely unreachable, while trying to ssh there i can establish it establish tcp, but no ssh banner message shown from server side. But because unit is remote, and no kvm - i can't see what is really wrong there, seems i should try netconsole. On other units it just causes tcp performance degradation. > >> >> balancer-backup ~ # cat /proc/net/sockstat >> sockets: used 118236 >> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339 >> UDP: inuse 1 mem 0 >> UDPLITE: inuse 0 >> RAW: inuse 0 >> FRAG: inuse 1 memory 2 >> >> after shutting down program >> balancer-backup ~ # cat /proc/net/sockstat >> sockets: used 47 >> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061 >> UDP: inuse 0 mem 0 >> UDPLITE: inuse 0 >> RAW: inuse 0 >> FRAG: inuse 0 memory 0 >> >> sysctl settings: >> net.ipv4.tcp_mem = 1767103 2045612 3068412 >> >> I restarted recently process, and mem value didnt changed (while >> because >> it is sockets should release all memory), also it looks incorrect, >> because at same time: >> balancer-backup ~ # cat /proc/meminfo >> MemTotal: 32939492 kB >> MemFree: 29876564 kB >> >> While 1955339 * 4096 should be around 8GB. >> Probably it is just accounting issue or is it real memory leak? >> What other info i can provide to troubleshoot this info more properly? >> I will upgrade to 3.15.5 also now, to see if issue persist there. > > I see nothing really wrong in your report. > > It looks like you have a lot of sockets around after shutdown of the > program. Each FIN-WAIT socket might consume a lot of buffers in its > write queue, unless you use/force SO_LINGER or something. > > If you try the following command, you might see how many sockets have > outstanding data. > > ss -amn I will check that next time, thanks a lot! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? 2014-07-17 11:51 ` Eric Dumazet 2014-07-17 12:13 ` Denys Fedoryshchenko @ 2014-07-17 13:12 ` Denys Fedoryshchenko 2014-07-17 13:42 ` Eric Dumazet 1 sibling, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-07-17 13:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, kaber, davem Well, it seems not normal. But i am not sure, is mem value means this memory is allocated and not released? If it is allocated pages, MemFree in /proc/memstat should be different value. I just shut down balancer process now and bypassed all traffic, and waited all(ok, almost all) sockets to close, this particular server had 300k connections before test. sockets: used 78 TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593 UDP: inuse 1 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory 0 MemTotal: 32664776 kB MemFree: 32462456 kB MemAvailable: 32449704 kB ss output: Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port nl UNCONN 0 0 0:0 * nl UNCONN 0 0 4:0 * nl UNCONN 0 0 10:0 * nl UNCONN 0 0 11:0 * nl UNCONN 0 0 15:0 * nl UNCONN 0 0 16:0 * nl UNCONN 0 0 18:0 * nl UNCONN 0 0 21:0 * tcp UNCONN 0 0 *:49484 *:* skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0) tcp LISTEN 0 1280 *:8880 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 20 *:22 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 1 *:23 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 1280 *:443 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp FIN-WAIT-1 0 4545 157.56.194.74:443 172.17.37.4:52473 skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0) tcp ESTAB 0 3008 10.0.22.22:22 194.146.153.22:59156 skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0) tcp FIN-WAIT-1 0 1 54.225.250.234:443 172.17.32.251:35969 skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0) tcp FIN-WAIT-2 0 0 206.53.159.203:443 172.17.76.10:54688 tcp FIN-WAIT-2 0 0 173.194.39.40:443 172.17.64.221:39482 tcp FIN-WAIT-1 0 1 69.171.235.48:443 172.17.32.251:52731 skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0) tcp FIN-WAIT-2 0 0 173.194.39.38:443 172.17.64.228:52013 skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0) tcp FIN-WAIT-1 0 4645 2.16.216.121:443 172.17.40.215:51985 skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0) tcp FIN-WAIT-2 0 0 173.192.231.39:443 172.17.10.41:56736 skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0) tcp FIN-WAIT-2 0 0 173.194.70.95:443 172.17.64.221:59055 tcp FIN-WAIT-2 0 0 46.33.74.129:443 172.17.74.223:53207 skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0) tcp FIN-WAIT-2 0 0 173.194.70.99:443 172.17.43.95:57045 skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0) tcp FIN-WAIT-2 0 0 198.11.217.194:443 172.17.75.46:63384 skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0) tcp FIN-WAIT-1 0 3601 31.13.81.33:443 172.17.14.192:54971 skmem:(r0,rb8192,t0,tb8192,f1280,w6912,o0,bl0) tcp FIN-WAIT-1 0 4561 174.36.208.130:443 172.17.45.227:56934 skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0) tcp FIN-WAIT-2 0 0 93.186.18.171:443 172.17.38.62:57496 tcp FIN-WAIT-2 0 0 173.194.70.104:443 172.17.70.233:42575 skmem:(r0,rb8192,t0,tb8192,f0,w0,o0,bl0) tcp LAST-ACK 0 3752 31.13.93.33:443 172.17.46.52:6869 skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0) On 2014-07-17 14:51, Eric Dumazet wrote: > On Thu, 2014-07-17 at 13:52 +0300, Denys Fedoryshchenko wrote: >> Hi >> >> I noticed TCP transfer rate slowdown after few days of operation on >> kernel 3.15.3, after some digging found out this: > > What was previous version you were using without this problem ? > >> >> balancer-backup ~ # cat /proc/net/sockstat >> sockets: used 118236 >> TCP: inuse 122958 orphan 4986 tw 108010 alloc 123179 mem 1955339 >> UDP: inuse 1 mem 0 >> UDPLITE: inuse 0 >> RAW: inuse 0 >> FRAG: inuse 1 memory 2 >> >> after shutting down program >> balancer-backup ~ # cat /proc/net/sockstat >> sockets: used 47 >> TCP: inuse 10552 orphan 10547 tw 142645 alloc 10552 mem 1877061 >> UDP: inuse 0 mem 0 >> UDPLITE: inuse 0 >> RAW: inuse 0 >> FRAG: inuse 0 memory 0 >> >> sysctl settings: >> net.ipv4.tcp_mem = 1767103 2045612 3068412 >> >> I restarted recently process, and mem value didnt changed (while >> because >> it is sockets should release all memory), also it looks incorrect, >> because at same time: >> balancer-backup ~ # cat /proc/meminfo >> MemTotal: 32939492 kB >> MemFree: 29876564 kB >> >> While 1955339 * 4096 should be around 8GB. >> Probably it is just accounting issue or is it real memory leak? >> What other info i can provide to troubleshoot this info more properly? >> I will upgrade to 3.15.5 also now, to see if issue persist there. > > I see nothing really wrong in your report. > > It looks like you have a lot of sockets around after shutdown of the > program. Each FIN-WAIT socket might consume a lot of buffers in its > write queue, unless you use/force SO_LINGER or something. > > If you try the following command, you might see how many sockets have > outstanding data. > > ss -amn ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? 2014-07-17 13:12 ` Denys Fedoryshchenko @ 2014-07-17 13:42 ` Eric Dumazet 2014-07-18 8:12 ` Denys Fedoryshchenko 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-07-17 13:42 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem On Thu, 2014-07-17 at 16:12 +0300, Denys Fedoryshchenko wrote: > Well, it seems not normal. But i am not sure, is mem value means this > memory is allocated and not released? mem means : sockets have forward_alloc reservations. They can use this amount of memory without worrying. > If it is allocated pages, MemFree in /proc/memstat should be different > value. Its not allocated pages. > I just shut down balancer process now and bypassed all traffic, and > waited all(ok, almost all) sockets to close, this particular server had > 300k connections before test. > > sockets: used 78 > TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593 > UDP: inuse 1 mem 0 > UDPLITE: inuse 0 > RAW: inuse 0 > FRAG: inuse 0 memory 0 > > MemTotal: 32664776 kB > MemFree: 32462456 kB > MemAvailable: 32449704 kB > > ss output: > > Netid State Recv-Q Send-Q > Local > Address:Port > Peer Address:Port > nl UNCONN 0 0 > > 0:0 > * > nl UNCONN 0 0 > > 4:0 > * > nl UNCONN 0 0 > > 10:0 > * > nl UNCONN 0 0 > > 11:0 > * > nl UNCONN 0 0 > > 15:0 > * > nl UNCONN 0 0 > > 16:0 > * > nl UNCONN 0 0 > > 18:0 > * > nl UNCONN 0 0 > > 21:0 > * > tcp UNCONN 0 0 > > *:49484 > *:* > skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0) > tcp LISTEN 0 1280 > > *:8880 > *:* > skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) > tcp LISTEN 0 20 > > *:22 > *:* > skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) > tcp LISTEN 0 1 > > *:23 > *:* > skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) > tcp LISTEN 0 1280 > > *:443 > *:* > skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) > tcp FIN-WAIT-1 0 4545 > > 157.56.194.74:443 > 172.17.37.4:52473 > skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0) > tcp ESTAB 0 3008 > > 10.0.22.22:22 > 194.146.153.22:59156 > skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0) > tcp FIN-WAIT-1 0 1 > > 54.225.250.234:443 > 172.17.32.251:35969 > skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0) > tcp FIN-WAIT-2 0 0 > > 206.53.159.203:443 > 172.17.76.10:54688 > > tcp FIN-WAIT-2 0 0 > > 173.194.39.40:443 > 172.17.64.221:39482 > > tcp FIN-WAIT-1 0 1 > > 69.171.235.48:443 > 172.17.32.251:52731 > skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0) This looks bad : f4294966016 : 0xFFFFFB00 There is a bug of some sort, that leads to overflows. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? 2014-07-17 13:42 ` Eric Dumazet @ 2014-07-18 8:12 ` Denys Fedoryshchenko 2014-07-18 8:41 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-07-18 8:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev, kaber, davem More updates. Now i found out how to see if tcp are under memory pressure mode, i guess when it is active, speed might degrade. Now almost all sockets are released, is it normal that tcp pressure still "yes" ? balancer-backup ~ # cat /proc/net/protocols protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em PACKET 1216 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n ALG 624 0 0 NI 0 no kernel n n n n n n n n n n n n n n n n n n n UNIX 832 1 -1 NI 0 yes kernel n n n n n n n n n n n n n n n n n n n UDP-Lite 800 0 -1 NI 0 yes kernel y y y n y y y n y y y y y n y y y y n PING 760 0 -1 NI 0 yes kernel y y y n n y n n y y y y n y y y y y n RAW 768 0 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n UDP 800 1 0 NI 0 yes kernel y y y n y n y n y y y y y n y y y y n TCP 1608 24 1823119 yes 208 yes kernel y y y y y y y y y y y y y n y y y y y NETLINK 992 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n balancer-backup ~ # cat /proc/net/sockstat sockets: used 51 TCP: inuse 21 orphan 13 tw 2 alloc 21 mem 1823120 UDP: inuse 1 mem 0 UDPLITE: inuse 0 RAW: inuse 0 FRAG: inuse 0 memory Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port nl UNCONN 0 0 0:0 * nl UNCONN 0 0 4:0 * nl UNCONN 0 0 10:0 * nl UNCONN 0 0 11:0 * nl UNCONN 0 0 15:0 * nl UNCONN 0 0 16:0 * nl UNCONN 0 0 18:0 * nl UNCONN 0 0 21:0 * u_str LISTEN 0 0 /tmp/dropbear-b661327d/auth-ac4c35f5-6 398981316 * 0 tcp UNCONN 0 0 *:21084 *:* skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0) tcp LISTEN 0 10000 *:80 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 10000 *:8880 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 9 *:8080 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 20 *:22 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 1 *:23 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 10000 *:443 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp ESTAB 0 176 212.22.93.7:22 109.110.116.35:59925 skmem:(r0,rb359040,t0,tb46080,f3584,w4608,o0,bl0) tcp FIN-WAIT-1 0 27545 173.194.19.109:80 185.4.252.44:61532 skmem:(r0,rb359040,t0,tb46080,f0,w45056,o0,bl0) tcp SYN-SENT 0 1 212.22.93.7:63910 54.225.188.51:80 skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0) tcp FIN-WAIT-1 0 28333 151.249.89.224:80 185.4.252.27:56892 skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0) tcp FIN-WAIT-1 0 28009 2.16.61.24:80 185.4.252.139:52608 skmem:(r0,rb359040,t0,tb46080,f0,w45056,o0,bl0) tcp FIN-WAIT-1 0 21753 173.194.19.46:80 185.4.252.214:51174 skmem:(r0,rb359040,t0,tb46080,f1024,w35840,o0,bl0) tcp FIN-WAIT-1 0 28437 87.248.223.79:80 185.4.253.107:60924 skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0) tcp FIN-WAIT-1 0 46194 212.25.73.21:80 185.4.253.107:50242 skmem:(r0,rb359040,t0,tb78336,f0,w73728,o0,bl0) tcp FIN-WAIT-1 0 65001 173.194.35.119:443 185.4.252.58:50040 skmem:(r0,rb333440,t0,tb119808,f3584,w115200,o0,bl0) tcp SYN-SENT 0 1 212.22.93.7:27514 54.225.188.51:80 skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0) tcp FIN-WAIT-1 0 3863 198.23.87.67:443 185.4.253.115:1693 skmem:(r0,rb333440,t0,tb46080,f3072,w9216,o0,bl0) tcp SYN-SENT 0 1 212.22.93.7:24487 54.225.188.51:80 skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0) tcp FIN-WAIT-2 0 0 54.225.251.189:443 185.4.253.19:49317 tcp FIN-WAIT-2 0 0 184.173.136.67:443 185.4.252.214:56383 skmem:(r0,rb333440,t0,tb46080,f0,w0,o0,bl0) tcp SYN-SENT 0 1 212.22.93.7:7830 54.225.188.51:80 skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0) tcp FIN-WAIT-1 0 6042 23.200.86.144:80 185.4.252.139:54631 skmem:(r0,rb359040,t0,tb46080,f1792,w10496,o0,bl0) tcp FIN-WAIT-1 0 28961 87.248.223.141:80 185.4.253.18:61083 skmem:(r0,rb359040,t0,tb46080,f3072,w46080,o0,bl0) There was many like this one (with overflow?) tcp SYN-SENT 0 1 212.22.93.7:7830 54.225.188.51:80 skmem:(r0,rb87380,t0,tb16384,f4294966016,w1280,o0,bl0) After 5-10 minutes they disappeared, but still it is in tcp pressure mode: balancer-backup ~ # ss -amn Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port nl UNCONN 0 0 0:0 * nl UNCONN 0 0 4:0 * nl UNCONN 0 0 10:0 * nl UNCONN 0 0 11:0 * nl UNCONN 0 0 15:0 * nl UNCONN 0 0 16:0 * nl UNCONN 0 0 18:0 * nl UNCONN 0 0 21:0 * u_str LISTEN 0 0 /tmp/dropbear-b661327d/auth-ac4c35f5-6 398981316 * 0 tcp UNCONN 0 0 *:56684 *:* skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0) tcp LISTEN 0 10000 *:80 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 10000 *:8880 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 9 *:8080 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 20 *:22 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 1 *:23 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp LISTEN 0 10000 *:443 *:* skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) tcp ESTAB 0 2688 212.22.93.7:22 109.110.116.35:59925 skmem:(r0,rb359040,t2304,tb46080,f256,w16128,o0,bl0) tcp FIN-WAIT-1 0 21753 173.194.19.46:80 185.4.252.214:51174 skmem:(r0,rb359040,t0,tb46080,f1024,w35840,o0,bl0) tcp FIN-WAIT-1 0 46194 212.25.73.21:80 185.4.253.107:50242 skmem:(r0,rb359040,t0,tb78336,f0,w73728,o0,bl0) tcp FIN-WAIT-1 0 65001 173.194.35.119:443 185.4.252.58:50040 skmem:(r0,rb333440,t0,tb119808,f3584,w115200,o0,bl0) tcp FIN-WAIT-1 0 3863 198.23.87.67:443 185.4.253.115:1693 skmem:(r0,rb333440,t0,tb46080,f3072,w9216,o0,bl0) balancer-backup ~ # cat /proc/net/protocols protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em PACKET 1216 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n ALG 624 0 0 NI 0 no kernel n n n n n n n n n n n n n n n n n n n UNIX 832 1 -1 NI 0 yes kernel n n n n n n n n n n n n n n n n n n n UDP-Lite 800 0 -1 NI 0 yes kernel y y y n y y y n y y y y y n y y y y n PING 760 0 -1 NI 0 yes kernel y y y n n y n n y y y y n y y y y y n RAW 768 0 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n UDP 800 1 0 NI 0 yes kernel y y y n y n y n y y y y y n y y y y n TCP 1608 15 1823049 yes 208 yes kernel y y y y y y y y y y y y y n y y y y y NETLINK 992 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n Also two warnings shown on latest stable (well almost latest, 3.15.6 released very recently, i have 3.15.5). [73563.552667] ------------[ cut here ]------------ [73563.552889] WARNING: CPU: 3 PID: 0 at net/core/stream.c:201 sk_stream_kill_queues+0xff/0x104() [73563.553308] Modules linked in: iptable_filter tcp_diag udp_diag inet_diag unix_diag microcode xt_tcpudp xt_mark iptable_mangle ip_tables x_tables 8021q garp stp mrp llc [73563.553975] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 3.15.5-build-0007 #1 [73563.554190] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 2.0.11 02/26/2010 [73563.559607] 0000000000000000 ffff88080f263a38 ffffffff8160054b 0000000000000000 [73563.560029] ffff88080f263a70 ffffffff810b4e03 ffffffff81584ed6 ffff880093b9d480 [73563.560451] ffff880093b9d5c8 ffff8806dcecb0a8 0000000000000000 ffff88080f263a80 [73563.560874] Call Trace: [73563.561080] <IRQ> [<ffffffff8160054b>] dump_stack+0x45/0x56 [73563.561304] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e [73563.561517] [<ffffffff81584ed6>] ? sk_stream_kill_queues+0xff/0x104 [73563.561730] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17 [73563.561943] [<ffffffff81584ed6>] sk_stream_kill_queues+0xff/0x104 [73563.562158] [<ffffffff815bf18a>] inet_csk_destroy_sock+0x77/0xb7 [73563.562372] [<ffffffff815c0555>] tcp_done+0x78/0x7d [73563.562583] [<ffffffff815c8d2c>] tcp_reset+0x53/0x56 [73563.562795] [<ffffffff815c8e96>] tcp_validate_incoming+0x167/0x261 [73563.563012] [<ffffffff815c9d7c>] tcp_rcv_state_process+0x613/0xaa3 [73563.563228] [<ffffffff815cfae3>] tcp_v4_do_rcv+0x291/0x2db [73563.563440] [<ffffffff815b531c>] ? inet_del_offload+0x38/0x38 [73563.563653] [<ffffffff815d2197>] tcp_v4_rcv+0x329/0x617 [73563.563868] [<ffffffff815b06ff>] ? nf_hook_slow+0x6a/0xfb [73563.564080] [<ffffffff815b53d1>] ip_local_deliver_finish+0xb5/0x125 [73563.564293] [<ffffffff815b584f>] ip_local_deliver+0x6d/0x71 [73563.564505] [<ffffffff815b56b3>] ip_rcv_finish+0x272/0x2b4 [73563.564717] [<ffffffff815b5aea>] ip_rcv+0x297/0x305 [73563.564929] [<ffffffff8158b3c3>] __netif_receive_skb_core+0x55b/0x5a4 [73563.565142] [<ffffffff8158b41f>] __netif_receive_skb+0x13/0x55 [73563.565353] [<ffffffff8158b49c>] netif_receive_skb_internal+0x3b/0x72 [73563.565565] [<ffffffff8158ba45>] napi_gro_receive+0x39/0x7b [73563.565779] [<ffffffff813b2bd2>] bnx2_poll_work+0xf8a/0x10be [73563.565990] [<ffffffff815c13d1>] ? tcp_sendmsg+0x435/0xb18 [73563.566201] [<ffffffff813b2d32>] bnx2_poll_msix+0x2c/0xae [73563.566413] [<ffffffff8158c133>] net_rx_action+0xc9/0x162 [73563.566628] [<ffffffff810b846a>] __do_softirq+0xd2/0x1c3 [73563.566841] [<ffffffff810b86b2>] irq_exit+0x3a/0x81 [73563.567059] [<ffffffff81004021>] do_IRQ+0xb4/0xcd [73563.567275] [<ffffffff816061aa>] common_interrupt+0x6a/0x6a [73563.567490] <EOI> [<ffffffff81559e73>] ? cpuidle_enter_state+0x4d/0x9e [73563.567713] [<ffffffff81559ee6>] cpuidle_enter+0x12/0x14 [73563.567933] [<ffffffff810e27f7>] cpu_startup_entry+0x163/0x204 [73563.568150] [<ffffffff810239af>] start_secondary+0x1b4/0x1b9 [73563.568366] ---[ end trace 098a1058f104bc5a ]--- [73563.568577] ------------[ cut here ]------------ [73563.568791] WARNING: CPU: 3 PID: 0 at net/ipv4/af_inet.c:153 inet_sock_destruct+0x160/0x189() [73563.569211] Modules linked in: iptable_filter tcp_diag udp_diag inet_diag unix_diag microcode xt_tcpudp xt_mark iptable_mangle ip_tables x_tables 8021q garp stp mrp llc [73563.569878] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G W 3.15.5-build-0007 #1 [73563.570300] Hardware name: Dell Inc. PowerEdge R710/0HYPX2, BIOS 2.0.11 02/26/2010 [73563.570719] 0000000000000000 ffff88080f263b38 ffffffff8160054b 0000000000000000 [73563.571145] ffff88080f263b70 ffffffff810b4e03 ffffffff815e01fb ffff880093b9d480 [73563.571573] ffff880093b9d5c8 ffff880093b9d4d0 0000000000000000 ffff88080f263b80 [73563.572000] Call Trace: [73563.572210] <IRQ> [<ffffffff8160054b>] dump_stack+0x45/0x56 [73563.572430] [<ffffffff810b4e03>] warn_slowpath_common+0x75/0x8e [73563.572646] [<ffffffff815e01fb>] ? inet_sock_destruct+0x160/0x189 [73563.572862] [<ffffffff810b4ebb>] warn_slowpath_null+0x15/0x17 [73563.573076] [<ffffffff815e01fb>] inet_sock_destruct+0x160/0x189 [73563.573292] [<ffffffff8157cb5e>] __sk_free+0x18/0xd5 [73563.573504] [<ffffffff8157cc2e>] sk_free+0x13/0x15 [73563.573719] [<ffffffff815d2464>] tcp_v4_rcv+0x5f6/0x617 [73563.573933] [<ffffffff815b06ff>] ? nf_hook_slow+0x6a/0xfb [73563.574147] [<ffffffff815b53d1>] ip_local_deliver_finish+0xb5/0x125 [73563.574362] [<ffffffff815b584f>] ip_local_deliver+0x6d/0x71 [73563.574576] [<ffffffff815b56b3>] ip_rcv_finish+0x272/0x2b4 [73563.574789] [<ffffffff815b5aea>] ip_rcv+0x297/0x305 [73563.575002] [<ffffffff8158b3c3>] __netif_receive_skb_core+0x55b/0x5a4 [73563.575218] [<ffffffff8158b41f>] __netif_receive_skb+0x13/0x55 [73563.575432] [<ffffffff8158b49c>] netif_receive_skb_internal+0x3b/0x72 [73563.575650] [<ffffffff8158ba45>] napi_gro_receive+0x39/0x7b [73563.575866] [<ffffffff813b2bd2>] bnx2_poll_work+0xf8a/0x10be [73563.576083] [<ffffffff815c13d1>] ? tcp_sendmsg+0x435/0xb18 [73563.576298] [<ffffffff813b2d32>] bnx2_poll_msix+0x2c/0xae [73563.576512] [<ffffffff8158c133>] net_rx_action+0xc9/0x162 [73563.576729] [<ffffffff810b846a>] __do_softirq+0xd2/0x1c3 [73563.576944] [<ffffffff810b86b2>] irq_exit+0x3a/0x81 [73563.577159] [<ffffffff81004021>] do_IRQ+0xb4/0xcd [73563.577377] [<ffffffff816061aa>] common_interrupt+0x6a/0x6a [73563.577591] <EOI> [<ffffffff81559e73>] ? cpuidle_enter_state+0x4d/0x9e [73563.577814] [<ffffffff81559ee6>] cpuidle_enter+0x12/0x14 [73563.578029] [<ffffffff810e27f7>] cpu_startup_entry+0x163/0x204 [73563.578246] [<ffffffff810239af>] start_secondary+0x1b4/0x1b9 [73563.578460] ---[ end trace 098a1058f104bc5b ]--- On 2014-07-17 16:42, Eric Dumazet wrote: > On Thu, 2014-07-17 at 16:12 +0300, Denys Fedoryshchenko wrote: >> Well, it seems not normal. But i am not sure, is mem value means this >> memory is allocated and not released? > > mem means : sockets have forward_alloc reservations. They can use this > amount of memory without worrying. > >> If it is allocated pages, MemFree in /proc/memstat should be different >> value. > > Its not allocated pages. > >> I just shut down balancer process now and bypassed all traffic, and >> waited all(ok, almost all) sockets to close, this particular server >> had >> 300k connections before test. >> >> sockets: used 78 >> TCP: inuse 22 orphan 17 tw 2 alloc 22 mem 1048593 >> UDP: inuse 1 mem 0 >> UDPLITE: inuse 0 >> RAW: inuse 0 >> FRAG: inuse 0 memory 0 >> >> MemTotal: 32664776 kB >> MemFree: 32462456 kB >> MemAvailable: 32449704 kB >> >> ss output: >> >> Netid State Recv-Q Send-Q >> Local >> Address:Port >> Peer Address:Port >> nl UNCONN 0 0 >> >> 0:0 >> * >> nl UNCONN 0 0 >> >> 4:0 >> * >> nl UNCONN 0 0 >> >> 10:0 >> * >> nl UNCONN 0 0 >> >> 11:0 >> * >> nl UNCONN 0 0 >> >> 15:0 >> * >> nl UNCONN 0 0 >> >> 16:0 >> * >> nl UNCONN 0 0 >> >> 18:0 >> * >> nl UNCONN 0 0 >> >> 21:0 >> * >> tcp UNCONN 0 0 >> >> *:49484 >> *:* >> skmem:(r0,rb2304,t0,tb212992,f0,w0,o0,bl0) >> tcp LISTEN 0 1280 >> >> *:8880 >> *:* >> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) >> tcp LISTEN 0 20 >> >> *:22 >> *:* >> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) >> tcp LISTEN 0 1 >> >> *:23 >> *:* >> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) >> tcp LISTEN 0 1280 >> >> *:443 >> *:* >> skmem:(r0,rb87380,t0,tb16384,f0,w0,o0,bl0) >> tcp FIN-WAIT-1 0 4545 >> >> 157.56.194.74:443 >> 172.17.37.4:52473 >> skmem:(r0,rb8192,t0,tb8192,f3072,w9216,o0,bl0) >> tcp ESTAB 0 3008 >> >> 10.0.22.22:22 >> 194.146.153.22:59156 >> skmem:(r0,rb359040,t0,tb18432,f768,w11520,o0,bl0) >> tcp FIN-WAIT-1 0 1 >> >> 54.225.250.234:443 >> 172.17.32.251:35969 >> skmem:(r0,rb8192,t0,tb8192,f2816,w1280,o0,bl0) >> tcp FIN-WAIT-2 0 0 >> >> 206.53.159.203:443 >> 172.17.76.10:54688 >> >> tcp FIN-WAIT-2 0 0 >> >> 173.194.39.40:443 >> 172.17.64.221:39482 >> >> tcp FIN-WAIT-1 0 1 >> >> 69.171.235.48:443 >> 172.17.32.251:52731 >> skmem:(r0,rb8192,t0,tb8192,f4294966016,w1280,o0,bl0) > > This looks bad : > > f4294966016 : 0xFFFFFB00 > > There is a bug of some sort, that leads to overflows. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? 2014-07-18 8:12 ` Denys Fedoryshchenko @ 2014-07-18 8:41 ` Eric Dumazet 2014-09-23 20:14 ` Semen Slepov [not found] ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com> 0 siblings, 2 replies; 25+ messages in thread From: Eric Dumazet @ 2014-07-18 8:41 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: netdev, kaber, davem On Fri, 2014-07-18 at 11:12 +0300, Denys Fedoryshchenko wrote: > More updates. Now i found out how to see if tcp are under memory > pressure mode, i guess when it is active, speed might degrade. Now > almost all sockets are released, is it normal that tcp pressure still > "yes" ? > > balancer-backup ~ # cat /proc/net/protocols > protocol size sockets memory press maxhdr slab module cl co di ac > io in de sh ss gs se re sp bi br ha uh gp em > PACKET 1216 0 -1 NI 0 no kernel n n n n > n n n n n n n n n n n n n n n > ALG 624 0 0 NI 0 no kernel n n n n > n n n n n n n n n n n n n n n > UNIX 832 1 -1 NI 0 yes kernel n n n n > n n n n n n n n n n n n n n n > UDP-Lite 800 0 -1 NI 0 yes kernel y y y n > y y y n y y y y y n y y y y n > PING 760 0 -1 NI 0 yes kernel y y y n > n y n n y y y y n y y y y y n > RAW 768 0 -1 NI 0 yes kernel y y y n > y y y n y y y y n y y y y n n > UDP 800 1 0 NI 0 yes kernel y y y n > y n y n y y y y y n y y y y n > TCP 1608 24 1823119 yes 208 yes kernel y y y > y y y y y y y y y y n y y y y y > NETLINK 992 0 -1 NI 0 no kernel n n n n > n n n n n n n n n n n n n n n > balancer-backup ~ # cat /proc/net/sockstat > sockets: used 51 > TCP: inuse 21 orphan 13 tw 2 alloc 21 mem 1823120 > UDP: inuse 1 mem 0 > UDPLITE: inuse 0 > RAW: inuse 0 > FRAG: inuse 0 memory tcp believes its under pressure because of some bug on accounting. If you change /proc/sys/net/ipv4/tcp_mem (3rd value), it might say it is not under pressure. I am taking a look, thanks again for the report. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? 2014-07-18 8:41 ` Eric Dumazet @ 2014-09-23 20:14 ` Semen Slepov [not found] ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com> 1 sibling, 0 replies; 25+ messages in thread From: Semen Slepov @ 2014-09-23 20:14 UTC (permalink / raw) To: netdev Eric Dumazet <eric.dumazet <at> gmail.com> writes: > > > tcp believes its under pressure because of some bug on accounting. > > If you change /proc/sys/net/ipv4/tcp_mem (3rd value), it might say it > is not under pressure. > > I am taking a look, thanks again for the report. > > Still have this accounting bug on 3.15.7 . Is there fix in further kernel releases (or patch)? ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <0b0b74ab8d2222fa85b23e06c6e992ec@visp.net.lb>]
[parent not found: <1405849675.10255.100.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <6fc086a122bc3ac728a7aec9e812310f@visp.net.lb>]
[parent not found: <1406016045.10255.122.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <3f463c3912608f7959481edaa6d7668d@visp.net.lb>]
[parent not found: <1406194817.3363.70.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <8f138d88e3b850130eb9bb15e3e27069@visp.net.lb>]
* /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) [not found] ` <8f138d88e3b850130eb9bb15e3e27069@visp.net.lb> @ 2014-11-12 17:07 ` Denys Fedoryshchenko 2014-11-12 17:27 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-11-12 17:07 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Hi I've been able to trace invalid memory shown for sk_forward_alloc. Not sure if it is related to bug when all tcp stack are wrecked on highload servers (i had even such crash on server with plain torrents, not that much of them). First they were appearing in ss output as f4294966016, but in tc it shows as unsigned integer, while in kernel it is signed integer. Should i provide patch for iproute2? After changing value to correct one, here is what is got: skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) So as it is signed integer, it is actually appears a lot as -1280 bytes. After placing several WARN_ON_ONCE on fall functions involving sk_forward_alloc, where negative value may appear i traced down at least to tcp_connect_queue_skb(). [ 13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476 tcp_connect_queue_skb+0x9f/0xd0() After adding some debug values, it was confirmed: skb_header_release(skb); __tcp_add_write_queue_tail(sk, skb); sk->sk_wmem_queued += skb->truesize; sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is 0, and skb->truesize is 1280, so -1280 is result tp->write_seq = tcb->end_seq; tp->packets_out += tcp_skb_pcount(skb); Maybe it should not be subtracted here? Or maybe logic are inverted all over the code? Please help :) ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-12 17:07 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Denys Fedoryshchenko @ 2014-11-12 17:27 ` Eric Dumazet 2014-11-16 8:54 ` Denys Fedoryshchenko 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-11-12 17:27 UTC (permalink / raw) To: Denys Fedoryshchenko, Neal Cardwell, Yuchung Cheng; +Cc: netdev On Wed, 2014-11-12 at 19:07 +0200, Denys Fedoryshchenko wrote: > Hi > > > I've been able to trace invalid memory shown for sk_forward_alloc. Not > sure if it is related to bug when all tcp stack are wrecked on highload > servers (i had even such crash on server with plain torrents, not that > much of them). > First they were appearing in ss output as f4294966016, but in tc it > shows as unsigned integer, while in kernel it is signed integer. Should > i provide patch for iproute2? > After changing value to correct one, here is what is got: > skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) > skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) > > So as it is signed integer, it is actually appears a lot as -1280 bytes. > After placing several WARN_ON_ONCE on fall functions involving > sk_forward_alloc, where negative value may > appear i traced down at least to tcp_connect_queue_skb(). > [ 13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476 > tcp_connect_queue_skb+0x9f/0xd0() > After adding some debug values, it was confirmed: > > > skb_header_release(skb); > __tcp_add_write_queue_tail(sk, skb); > sk->sk_wmem_queued += skb->truesize; > sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is 0, > and skb->truesize is 1280, so -1280 is result > tp->write_seq = tcb->end_seq; > tp->packets_out += tcp_skb_pcount(skb); > > Maybe it should not be subtracted here? Or maybe logic are inverted all > over the code? > Please help :) > Thanks a lot for the report, I am CCing other experts ;) AFAIK, a negative forward alloc should not be a problem, a long as the incursion is bound. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-12 17:27 ` Eric Dumazet @ 2014-11-16 8:54 ` Denys Fedoryshchenko 2014-11-16 18:11 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-11-16 8:54 UTC (permalink / raw) To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev As latest findings, when servers are going crazy because of tcp memory invalid accounting. First of all i upgraded kernel to latest version 3.17.3 and added also patch from upcoming kernel, "12) Don't call sock_kfree_s() with NULL pointers, this function also has the side effect of adjusting the socket memory usage. From Cong Wang.", but it didnt helped. I added printk_ratelimited to places where suspicious values might appear, and got some more information. First, is not very suspicious, no idea if it is a problem: [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 4352 [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 4352 [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 4352 [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 4352 [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 4352 [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 4352 [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 4352 [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 4352 [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 4352 [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352 [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 4352 [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352 [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 4352 [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 4352 [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 4352 [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 4352 [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 4352 Second is always linked with crashes, it is sk_mem_uncharge and sk_forward_alloc goes negative. Patch to show message for sk_mem_uncharge in sock.h is very simple: static inline void sk_mem_uncharge(struct sock *sk, int size) @@ -1480,6 +1485,8 @@ if (!sk_has_account(sk)) return; sk->sk_forward_alloc += size; + if (sk->sk_forward_alloc < -8192) + printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge negative %d by %d\n", sk, sk->sk_forward_alloc, size); } This is what i am usually setting before box are rebooted: Nov 15 13:20:53 g1 user.warn kernel: [ 2002.363437] sk ffff88155e904100 sk_mem_uncharge negative -2147482496 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.363837] sk ffff88155e904100 sk_mem_uncharge negative -2147480192 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.364232] sk ffff88155e904100 sk_mem_uncharge negative -2147477888 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.364627] sk ffff88155e904100 sk_mem_uncharge negative -2147475584 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365022] sk ffff88155e904100 sk_mem_uncharge negative -2147473280 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365416] sk ffff88155e904100 sk_mem_uncharge negative -2147470976 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.365811] sk ffff88155e904100 sk_mem_uncharge negative -2147468672 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.366214] sk ffff88155e904100 sk_mem_uncharge negative -2147466368 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.366611] sk ffff88155e904100 sk_mem_uncharge negative -2147464064 by 2304 Nov 15 13:20:53 g1 user.warn kernel: [ 2002.367004] sk ffff88155e904100 sk_mem_uncharge negative -2147461760 by 2304 Nov 15 13:20:58 g1 user.warn kernel: [ 2007.363601] sk_mem_uncharge: 576170 callbacks suppressed Nov 15 13:20:58 g1 user.warn kernel: [ 2007.364001] sk ffff88155e904100 sk_mem_uncharge negative -1517794432 by 2304 ..... until ... Nov 15 13:31:58 g1 user.warn kernel: [ 2666.393667] sk ffff88170141ad80 sk_mem_uncharge negative -169088 by 2944 Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394064] sk ffff88170141ad80 sk_mem_uncharge negative -166144 by 2944 Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394472] sk ffff88170141ad80 sk_mem_uncharge negative -163200 by 2944 Nov 15 13:31:58 g1 user.warn kernel: [ 2666.394871] sk ffff88170141ad80 sk_mem_uncharge negative -158208 by 4992 <reboot> Or: Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758129] sk ffff8815f8014e00 sk_mem_uncharge negative -2147483520 by 2304 Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758536] sk ffff8815f8014e00 sk_mem_uncharge negative -2147481216 by 2304 Nov 15 14:55:01 g1 user.warn kernel: [ 1965.758935] sk ffff8815f8014e00 sk_mem_uncharge negative -2147478912 by 2304 Nov 15 14:55:01 g1 user.warn kernel: [ 1965.759332] sk ffff8815f8014e00 sk_mem_uncharge negative -2147476608 by 2304 Nov 15 14:55:01 g1 user.warn kernel: [ 1965.759728] sk ffff8815f8014e00 sk_mem_uncharge negative -2147474304 by 2304 Nov 15 14:55:01 g1 user.warn kernel: [ 1965.760124] sk ffff8815f8014e00 sk_mem_uncharge negative -2147472000 by 2304 Nov 15 14:55:01 g1 user.warn kernel: [ 1965.760518] sk ffff8815f8014e00 sk_mem_uncharge negative -2147469696 by 2304 ... Nov 15 15:03:27 g1 user.warn kernel: [ 2472.170857] sk ffff88163f429a00 sk_mem_uncharge negative -277169024 by 896 Nov 15 15:03:30 g1 user.warn kernel: [ 2474.611109] sk ffff88163f429a00 sk_mem_uncharge negative -277164928 by 896 Nov 15 15:03:30 g1 user.warn kernel: [ 2474.611511] sk ffff88163f429a00 sk_mem_uncharge negative -277164032 by 896 Nov 15 15:03:30 g1 user.warn kernel: [ 2474.706237] sk ffff88163f429a00 sk_mem_uncharge negative -277159936 by 2304 Nov 15 15:03:34 g1 user.warn kernel: [ 2478.889110] sk ffff88163f429a00 sk_mem_uncharge negative -277155840 by 2304 Nov 15 15:03:34 g1 user.warn kernel: [ 2478.911516] sk ffff88163f429a00 sk_mem_uncharge negative -277147648 by 1536 Nov 15 15:03:35 g1 user.warn kernel: [ 2479.320851] sk ffff88163f429a00 sk_mem_uncharge negative -277143552 by 2304 Nov 15 15:04:33 g1 user.warn kernel: [ 2537.408184] sk ffff88163f429a00 sk_mem_uncharge negative -277139456 by 2304 Nov 15 15:04:33 g1 user.warn kernel: [ 2537.409790] sk ffff88163f429a00 sk_mem_uncharge negative -277135360 by 896 (here i can see two sk at same time are like this) Or Nov 15 23:37:19 g1 user.warn kernel: [14137.049570] sk ffff8816b3312700 sk_mem_uncharge negative -2147481856 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.049973] sk ffff8816b3312700 sk_mem_uncharge negative -2147479552 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.050372] sk ffff8816b3312700 sk_mem_uncharge negative -2147477248 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.050770] sk ffff8816b3312700 sk_mem_uncharge negative -2147474944 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.051170] sk ffff8816b3312700 sk_mem_uncharge negative -2147472640 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.051572] sk ffff8816b3312700 sk_mem_uncharge negative -2147470336 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.051971] sk ffff8816b3312700 sk_mem_uncharge negative -2147468032 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.052371] sk ffff8816b3312700 sk_mem_uncharge negative -2147465728 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.052771] sk ffff8816b3312700 sk_mem_uncharge negative -2147463424 by 2304 Nov 15 23:37:19 g1 user.warn kernel: [14137.053169] sk ffff8816b3312700 sk_mem_uncharge negative -2147461120 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.526984] sk_mem_uncharge: 1083519 callbacks suppressed Nov 15 23:37:32 g1 user.warn kernel: [14150.527384] sk ffff8816b3312700 sk_mem_uncharge negative -2147483392 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.527782] sk ffff8816b3312700 sk_mem_uncharge negative -2147481088 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.528179] sk ffff8816b3312700 sk_mem_uncharge negative -2147478784 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.528576] sk ffff8816b3312700 sk_mem_uncharge negative -2147476480 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.528975] sk ffff8816b3312700 sk_mem_uncharge negative -2147474176 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.529370] sk ffff8816b3312700 sk_mem_uncharge negative -2147471872 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.529770] sk ffff8816b3312700 sk_mem_uncharge negative -2147469568 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.530167] sk ffff8816b3312700 sk_mem_uncharge negative -2147467264 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.530572] sk ffff8816b3312700 sk_mem_uncharge negative -2147464960 by 2304 Nov 15 23:37:32 g1 user.warn kernel: [14150.530973] sk ffff8816b3312700 sk_mem_uncharge negative -2147462656 by 2304 <here it reboots much earlier, at this point> I will try to get more info, what kind of socket it is. On 2014-11-12 19:27, Eric Dumazet wrote: > On Wed, 2014-11-12 at 19:07 +0200, Denys Fedoryshchenko wrote: >> Hi >> >> >> I've been able to trace invalid memory shown for sk_forward_alloc. Not >> sure if it is related to bug when all tcp stack are wrecked on >> highload >> servers (i had even such crash on server with plain torrents, not that >> much of them). >> First they were appearing in ss output as f4294966016, but in tc it >> shows as unsigned integer, while in kernel it is signed integer. >> Should >> i provide patch for iproute2? >> After changing value to correct one, here is what is got: >> skmem:(r0,rb359040,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb349440,t0,tb46080,f-1280,w1280,o0,bl0) >> skmem:(r0,rb357120,t0,tb46080,f-1280,w1280,o0,bl0) >> >> So as it is signed integer, it is actually appears a lot as -1280 >> bytes. >> After placing several WARN_ON_ONCE on fall functions involving >> sk_forward_alloc, where negative value may >> appear i traced down at least to tcp_connect_queue_skb(). >> [ 13.094561] WARNING: CPU: 4 PID: 2855 at include/net/sock.h:1476 >> tcp_connect_queue_skb+0x9f/0xd0() >> After adding some debug values, it was confirmed: >> >> >> skb_header_release(skb); >> __tcp_add_write_queue_tail(sk, skb); >> sk->sk_wmem_queued += skb->truesize; >> sk_mem_charge(sk, skb->truesize); <<<< sk->sk_forward_alloc often is >> 0, >> and skb->truesize is 1280, so -1280 is result >> tp->write_seq = tcb->end_seq; >> tp->packets_out += tcp_skb_pcount(skb); >> >> Maybe it should not be subtracted here? Or maybe logic are inverted >> all >> over the code? >> Please help :) >> > > Thanks a lot for the report, I am CCing other experts ;) > > AFAIK, a negative forward alloc should not be a problem, a long as the > incursion is bound. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-16 8:54 ` Denys Fedoryshchenko @ 2014-11-16 18:11 ` Eric Dumazet 2014-11-16 19:05 ` Denys Fedoryshchenko 2014-11-16 19:10 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet 0 siblings, 2 replies; 25+ messages in thread From: Eric Dumazet @ 2014-11-16 18:11 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote: > As latest findings, when servers are going crazy because of tcp memory > invalid accounting. > First of all i upgraded kernel to latest version 3.17.3 and added also > patch from upcoming kernel, > "12) Don't call sock_kfree_s() with NULL pointers, this function also > has the side effect of adjusting > the socket memory usage. From Cong Wang.", but it didnt helped. > > I added printk_ratelimited to places where suspicious values might > appear, and got some more information. > First, is not very suspicious, no idea if it is a problem: > [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by 4352 > [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by 4352 > [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by 4352 > [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by 4352 > [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by 4352 > [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by 4352 > [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by 4352 > [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by 4352 > [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by 4352 > [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352 > [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by 4352 > [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by 4352 > [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by 4352 > [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by 4352 > [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by 4352 > [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by 4352 > [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by 4352 > Second is always linked with crashes, it is sk_mem_uncharge and > sk_forward_alloc goes negative. Patch to show message > for sk_mem_uncharge in sock.h is very simple: > > static inline void sk_mem_uncharge(struct sock *sk, int size) > @@ -1480,6 +1485,8 @@ > if (!sk_has_account(sk)) > return; > sk->sk_forward_alloc += size; > + if (sk->sk_forward_alloc < -8192) > + printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge > negative %d by %d\n", sk, sk->sk_forward_alloc, size); > } > Could you describe your hardware setup and networking setup ? ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-16 18:11 ` Eric Dumazet @ 2014-11-16 19:05 ` Denys Fedoryshchenko 2014-11-16 20:16 ` Eric Dumazet 2014-11-16 19:10 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet 1 sibling, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-11-16 19:05 UTC (permalink / raw) To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev On 2014-11-16 20:11, Eric Dumazet wrote: > On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote: >> As latest findings, when servers are going crazy because of tcp memory >> invalid accounting. >> First of all i upgraded kernel to latest version 3.17.3 and added also >> patch from upcoming kernel, >> "12) Don't call sock_kfree_s() with NULL pointers, this function also >> has the side effect of adjusting >> the socket memory usage. From Cong Wang.", but it didnt helped. >> >> I added printk_ratelimited to places where suspicious values might >> appear, and got some more information. >> First, is not very suspicious, no idea if it is a problem: >> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by >> 4352 >> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by >> 4352 >> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by >> 4352 >> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by >> 4352 >> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by >> 4352 >> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by >> 4352 >> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by >> 4352 >> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by >> 4352 >> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by >> 4352 >> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by >> 4352 >> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by >> 4352 >> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by >> 4352 >> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by >> 4352 >> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by >> 4352 >> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by >> 4352 >> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by >> 4352 >> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by >> 4352 >> Second is always linked with crashes, it is sk_mem_uncharge and >> sk_forward_alloc goes negative. Patch to show message >> for sk_mem_uncharge in sock.h is very simple: >> >> static inline void sk_mem_uncharge(struct sock *sk, int size) >> @@ -1480,6 +1485,8 @@ >> if (!sk_has_account(sk)) >> return; >> sk->sk_forward_alloc += size; >> + if (sk->sk_forward_alloc < -8192) >> + printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge >> negative %d by %d\n", sk, sk->sk_forward_alloc, size); >> } >> > > > Could you describe your hardware setup and networking setup ? This problem are happening on multiple different units that i am using as https balancers, and all of them very different (except it is all Intel CPU's, but even in that - different generations and models). Such problem seems happens on all of them, and seems doesn't depend on hardware (networking - igb, e1000e, broadcom stuff - all affected). But if it is important: S2600GZ motherboard, one E5-2620 Xeon networking - onboard igb, 2 ports used 100GB RAM This particular one has bonding (but it seems crashes with or without it). System are custom, running on USB flash, busybox+glibc based setup, similar OS working for other purposes for NAT, PPPoE termination without any issues. What is common between failing units: I am using haproxy-based HTTPS balancer(Also as i remember haproxy doing a lot of setsockopt stuff), that is handling right now: 454444 connections established Bandwidth passing thru is around 1Gbps. I'm disabling tso/gso/gro on all interfaces. The way i am forwarding transparent traffic to haproxy: iptables -t mangle -A PREROUTING -p tcp --sport 443 -j MARK --set-mark 0x1 iptables -t mangle -A PREROUTING -p tcp --dport 443 -j MARK --set-mark 0x1 ip rule add fwmark 0x1 lookup 100 ip route add local 0.0.0.0/0 dev lo table 100 "Typical" setup is backend ssl_passthru mode tcp option transparent source 0.0.0.0 usesrc clientip frontend ssl-in mode tcp bind :443 transparent default_backend ssl_passthru option tcp-smart-accept I hope i didnt missed something important. I can provide remote ssh access to it. I will keep sending info, just with hope that some of info maybe will give idea, what i should patch or test. P.S. Just got an idea now, that -2147483648 hinting that somewhere is happening integer overflow from very large positive value, to negative. I will try to set triggers also to that now. If required i can provide image with such system. I am not sure you are interested in this problem and if it can be reproduced on synthetic setup, but as i remember this memory leak happened with me once also on normal server with torrents (i left some image unattended for 2 weeks, with a lot of requests, and it crashed at the end), so it might affect also other use cases. I am trying to limit now socket buffers, to see if it will decrease frequency of crashes. Also i tried to put "canary" values inside structure, near sk_forward_alloc , to see if there is any sort of memory corruption occuring on sk_forward_alloc, but seems there is no corruption. I will try also going back to stable kernels 3.2.64, to see if it will fix this problem, but testing takes sometimes almost 1 day, depends on luck. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-16 19:05 ` Denys Fedoryshchenko @ 2014-11-16 20:16 ` Eric Dumazet 2014-11-17 5:01 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-11-16 20:16 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev On Sun, 2014-11-16 at 21:05 +0200, Denys Fedoryshchenko wrote: > On 2014-11-16 20:11, Eric Dumazet wrote: > > On Sun, 2014-11-16 at 10:54 +0200, Denys Fedoryshchenko wrote: > >> As latest findings, when servers are going crazy because of tcp memory > >> invalid accounting. > >> First of all i upgraded kernel to latest version 3.17.3 and added also > >> patch from upcoming kernel, > >> "12) Don't call sock_kfree_s() with NULL pointers, this function also > >> has the side effect of adjusting > >> the socket memory usage. From Cong Wang.", but it didnt helped. > >> > >> I added printk_ratelimited to places where suspicious values might > >> appear, and got some more information. > >> First, is not very suspicious, no idea if it is a problem: > >> [ 1413.031622] sk ffff8817184d8680 sk_mem_charge negative -10752 by > >> 4352 > >> [ 1413.032027] sk ffff8817184d8680 sk_mem_charge negative -15104 by > >> 4352 > >> [ 1415.768465] sk ffff881666842d80 sk_mem_charge negative -9984 by > >> 4352 > >> [ 1415.768868] sk ffff881666842d80 sk_mem_charge negative -14336 by > >> 4352 > >> [ 1415.769268] sk ffff881666842d80 sk_mem_charge negative -18688 by > >> 4352 > >> [ 1415.769681] sk ffff881666842d80 sk_mem_charge negative -9088 by > >> 4352 > >> [ 1418.933799] sk ffff8816dd640000 sk_mem_charge negative -9984 by > >> 4352 > >> [ 1418.934205] sk ffff8816dd640000 sk_mem_charge negative -14336 by > >> 4352 > >> [ 1418.934604] sk ffff8816dd640000 sk_mem_charge negative -18688 by > >> 4352 > >> [ 1427.131310] sk ffff881731801a00 sk_mem_charge negative -11776 by > >> 4352 > >> [ 1428.564640] sk ffff881731801a00 sk_mem_charge negative -11008 by > >> 4352 > >> [ 1429.134279] sk ffff881731801a00 sk_mem_charge negative -11776 by > >> 4352 > >> [ 1429.134691] sk ffff881731801a00 sk_mem_charge negative -16128 by > >> 4352 > >> [ 1430.666541] sk ffff881731801a00 sk_mem_charge negative -10496 by > >> 4352 > >> [ 1431.395099] sk ffff881731801a00 sk_mem_charge negative -12032 by > >> 4352 > >> [ 1431.395506] sk ffff881731801a00 sk_mem_charge negative -16384 by > >> 4352 > >> [ 1431.877862] sk ffff881731801a00 sk_mem_charge negative -11648 by > >> 4352 > >> Second is always linked with crashes, it is sk_mem_uncharge and > >> sk_forward_alloc goes negative. Patch to show message > >> for sk_mem_uncharge in sock.h is very simple: > >> > >> static inline void sk_mem_uncharge(struct sock *sk, int size) > >> @@ -1480,6 +1485,8 @@ > >> if (!sk_has_account(sk)) > >> return; > >> sk->sk_forward_alloc += size; > >> + if (sk->sk_forward_alloc < -8192) > >> + printk_ratelimited(KERN_WARNING"sk %p sk_mem_uncharge > >> negative %d by %d\n", sk, sk->sk_forward_alloc, size); > >> } > >> > > > > > > Could you describe your hardware setup and networking setup ? > This problem are happening on multiple different units that i am using > as https balancers, and all of them very different (except it is all > Intel CPU's, but even in that - different generations and models). Such > problem seems happens on all of them, and seems doesn't depend on > hardware (networking - igb, e1000e, broadcom stuff - all affected). But > if it is important: > S2600GZ motherboard, one E5-2620 Xeon > networking - onboard igb, 2 ports used > 100GB RAM > This particular one has bonding (but it seems crashes with or without > it). > > System are custom, running on USB flash, busybox+glibc based setup, > similar OS working for other purposes for NAT, PPPoE termination without > any issues. > > What is common between failing units: > > I am using haproxy-based HTTPS balancer(Also as i remember haproxy doing > a lot of setsockopt stuff), that is handling right now: > 454444 connections established > Bandwidth passing thru is around 1Gbps. > > I'm disabling tso/gso/gro on all interfaces. > > The way i am forwarding transparent traffic to haproxy: > iptables -t mangle -A PREROUTING -p tcp --sport 443 -j MARK --set-mark > 0x1 > iptables -t mangle -A PREROUTING -p tcp --dport 443 -j MARK --set-mark > 0x1 > ip rule add fwmark 0x1 lookup 100 > ip route add local 0.0.0.0/0 dev lo table 100 > > "Typical" setup is > > backend ssl_passthru > mode tcp > option transparent > source 0.0.0.0 usesrc clientip > > frontend ssl-in > mode tcp > bind :443 transparent > default_backend ssl_passthru > option tcp-smart-accept > > I hope i didnt missed something important. I can provide remote ssh > access to it. > I will keep sending info, just with hope that some of info maybe will > give idea, what i should patch or test. > > P.S. Just got an idea now, that -2147483648 hinting that somewhere is > happening integer overflow from very large positive value, to negative. > I will try to set triggers also to that now. > > If required i can provide image with such system. I am not sure you are > interested in this problem and if it can be reproduced on synthetic > setup, but as i remember this memory leak happened with me once also on > normal server with torrents (i left some image unattended for 2 weeks, > with a lot of requests, and it crashed at the end), so it might affect > also other use cases. > I am trying to limit now socket buffers, to see if it will decrease > frequency of crashes. > Also i tried to put "canary" values inside structure, near > sk_forward_alloc , to see if there is any sort of memory corruption > occuring on sk_forward_alloc, but seems there is no corruption. > I will try also going back to stable kernels 3.2.64, to see if it will > fix this problem, but testing takes sometimes almost 1 day, depends on > luck. Thanks Denys ! Could you try following patch ? Thanks ! net/ipv4/tcp_output.c | 33 +++++++++++---------------------- 1 file changed, 11 insertions(+), 22 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a3d453b94747..877eb4aa05a6 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_fastopen_request *fo = tp->fastopen_req; - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen; + int syn_loss = 0, space, err = 0; struct sk_buff *syn_data = NULL, *data; unsigned long last_syn_loss = 0; @@ -3031,25 +3031,17 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) /* limit to order-0 allocations */ space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER)); - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space, - sk->sk_allocation); - if (syn_data == NULL) + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation); + if (!syn_data) goto fallback; - for (i = 0; i < iovlen && syn_data->len < space; ++i) { - struct iovec *iov = &fo->data->msg_iov[i]; - unsigned char __user *from = iov->iov_base; - int len = iov->iov_len; - - if (syn_data->len + len > space) - len = space - syn_data->len; - else if (i + 1 == iovlen) - /* No more data pending in inet_wait_for_connect() */ - fo->data = NULL; + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); + if (memcpy_fromiovec(skb_put(syn_data, space), fo->data->msg_iov, space)) + goto fallback; - if (skb_add_data(syn_data, from, len)) - goto fallback; - } + /* No more data pending in inet_wait_for_connect() */ + if (space == fo->size) + fo->data = NULL; /* Queue a data-only packet after the regular SYN for retransmission */ data = pskb_copy(syn_data, sk->sk_allocation); @@ -3101,13 +3093,10 @@ int tcp_connect(struct sock *sk) return 0; } - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation); - if (unlikely(buff == NULL)) + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation); + if (unlikely(!buff)) return -ENOBUFS; - /* Reserve space for headers. */ - skb_reserve(buff, MAX_TCP_HEADER); - tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN); tp->retrans_stamp = tcp_time_stamp; tcp_connect_queue_skb(sk, buff); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-16 20:16 ` Eric Dumazet @ 2014-11-17 5:01 ` Eric Dumazet 2014-11-17 10:22 ` Denys Fedoryshchenko 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-11-17 5:01 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev On Sun, 2014-11-16 at 12:16 -0800, Eric Dumazet wrote: > Thanks Denys ! > > Could you try following patch ? > > Thanks ! Hmm.... I have an updated patch, sorry. (A memcpy_fromiovec() has to be memcpy_fromiovecend() ) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a3d453b94747..c2bbfcd9c0db 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_fastopen_request *fo = tp->fastopen_req; - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen; + int syn_loss = 0, space, err = 0; struct sk_buff *syn_data = NULL, *data; unsigned long last_syn_loss = 0; @@ -3031,25 +3031,19 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) /* limit to order-0 allocations */ space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER)); - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space, - sk->sk_allocation); - if (syn_data == NULL) + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation); + if (!syn_data) goto fallback; - for (i = 0; i < iovlen && syn_data->len < space; ++i) { - struct iovec *iov = &fo->data->msg_iov[i]; - unsigned char __user *from = iov->iov_base; - int len = iov->iov_len; - - if (syn_data->len + len > space) - len = space - syn_data->len; - else if (i + 1 == iovlen) - /* No more data pending in inet_wait_for_connect() */ - fo->data = NULL; + syn_data->ip_summed = CHECKSUM_PARTIAL; + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); + if (memcpy_fromiovecend(skb_put(syn_data, space), + fo->data->msg_iov, 0, space)) + goto fallback; - if (skb_add_data(syn_data, from, len)) - goto fallback; - } + /* No more data pending in inet_wait_for_connect() */ + if (space == fo->size) + fo->data = NULL; /* Queue a data-only packet after the regular SYN for retransmission */ data = pskb_copy(syn_data, sk->sk_allocation); @@ -3101,13 +3095,10 @@ int tcp_connect(struct sock *sk) return 0; } - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation); - if (unlikely(buff == NULL)) + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation); + if (unlikely(!buff)) return -ENOBUFS; - /* Reserve space for headers. */ - skb_reserve(buff, MAX_TCP_HEADER); - tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN); tp->retrans_stamp = tcp_time_stamp; tcp_connect_queue_skb(sk, buff); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-17 5:01 ` Eric Dumazet @ 2014-11-17 10:22 ` Denys Fedoryshchenko 2014-11-17 15:33 ` Denys Fedoryshchenko 0 siblings, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-11-17 10:22 UTC (permalink / raw) To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev On 2014-11-17 07:01, Eric Dumazet wrote: > On Sun, 2014-11-16 at 12:16 -0800, Eric Dumazet wrote: > >> Thanks Denys ! >> >> Could you try following patch ? >> >> Thanks ! > > Hmm.... I have an updated patch, sorry. > > (A memcpy_fromiovec() has to be memcpy_fromiovecend() ) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index a3d453b94747..c2bbfcd9c0db 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2998,7 +2998,7 @@ static int tcp_send_syn_data(struct sock *sk, > struct sk_buff *syn) > { > struct tcp_sock *tp = tcp_sk(sk); > struct tcp_fastopen_request *fo = tp->fastopen_req; > - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen; > + int syn_loss = 0, space, err = 0; > struct sk_buff *syn_data = NULL, *data; > unsigned long last_syn_loss = 0; > > @@ -3031,25 +3031,19 @@ static int tcp_send_syn_data(struct sock *sk, > struct sk_buff *syn) > /* limit to order-0 allocations */ > space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER)); > > - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space, > - sk->sk_allocation); > - if (syn_data == NULL) > + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation); > + if (!syn_data) > goto fallback; > > - for (i = 0; i < iovlen && syn_data->len < space; ++i) { > - struct iovec *iov = &fo->data->msg_iov[i]; > - unsigned char __user *from = iov->iov_base; > - int len = iov->iov_len; > - > - if (syn_data->len + len > space) > - len = space - syn_data->len; > - else if (i + 1 == iovlen) > - /* No more data pending in inet_wait_for_connect() */ > - fo->data = NULL; > + syn_data->ip_summed = CHECKSUM_PARTIAL; > + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); > + if (memcpy_fromiovecend(skb_put(syn_data, space), > + fo->data->msg_iov, 0, space)) > + goto fallback; > > - if (skb_add_data(syn_data, from, len)) > - goto fallback; > - } > + /* No more data pending in inet_wait_for_connect() */ > + if (space == fo->size) > + fo->data = NULL; > > /* Queue a data-only packet after the regular SYN for retransmission > */ > data = pskb_copy(syn_data, sk->sk_allocation); > @@ -3101,13 +3095,10 @@ int tcp_connect(struct sock *sk) > return 0; > } > > - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation); > - if (unlikely(buff == NULL)) > + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation); > + if (unlikely(!buff)) > return -ENOBUFS; > > - /* Reserve space for headers. */ > - skb_reserve(buff, MAX_TCP_HEADER); > - > tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN); > tp->retrans_stamp = tcp_time_stamp; > tcp_connect_queue_skb(sk, buff); Installed patch, but will have to wait a while (usually at least 24hours), to be sure if it is stable. Thanks a lot! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-17 10:22 ` Denys Fedoryshchenko @ 2014-11-17 15:33 ` Denys Fedoryshchenko [not found] ` <1416240651.5102.10.camel@edumazet-glaptop2.roam.corp.google.com> 0 siblings, 1 reply; 25+ messages in thread From: Denys Fedoryshchenko @ 2014-11-17 15:33 UTC (permalink / raw) To: Eric Dumazet; +Cc: Neal Cardwell, Yuchung Cheng, netdev On 2014-11-17 12:22, Denys Fedoryshchenko wrote: >> Hmm.... I have an updated patch, sorry. >> ... > > Installed patch, but will have to wait a while (usually at least > 24hours), to be sure if it is stable. > > Thanks a lot! Tried updated patch, it seems crashed same after while with it too, and on second test i noticed same value overflow. In debug, after i added alert if sk_forward_alloc > 1147483648 i noticed that on some sockets it continuously increasing this value until it will overflow. I can provide logs if it is interesting. I will try to sysctl fastopen to zero, to make sure if it changes anything. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1416240651.5102.10.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <c9a009556b97fcf7399eb73035c7f8a0@visp.net.lb>]
[parent not found: <1416243011.5102.19.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <34058cb0ee676b891dd9d54515512733@visp.net.lb>]
[parent not found: <0487f30ffd58f2ee9b47135fa5404d25@visp.net.lb>]
[parent not found: <1416245491.18588.1.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <ecd48e5dd70f9f83a4a4db3d758cc615@visp.net.lb>]
[parent not found: <b2bf14dfc3180a316dd9c9bd89d02c40@visp.net.lb>]
[parent not found: <1416252989.18588.2.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <2a4554e39649ac6dfbf2e0366794af08@visp.net.lb>]
[parent not found: <1416254490.18588.6.camel@edumazet-glaptop2.roam.corp.google.com>]
[parent not found: <ed049b2f9ecc47676e25ec8aeafb25c0@visp.net.lb>]
[parent not found: <1f001a887ec3fb5b962e4374391863b9@visp.net.lb>]
[parent not found: <e33542339748d2f51a149a4020ee5f75@visp.net.lb>]
[parent not found: <57f78d76af5f78317ba69b613123ff9c@visp.net.lb>]
[parent not found: <d05242d8a607d4e15f3dd9200223699f@visp.net.lb>]
[parent not found: <1416266755.18588.12.camel@edumazet-glaptop2.roam.corp.google.com>]
* [PATCH net-next] tcp: make connect() mem charging friendly [not found] ` <1416266755.18588.12.camel@edumazet-glaptop2.roam.corp.google.com> @ 2014-11-17 23:49 ` Eric Dumazet 2014-11-18 7:06 ` [PATCH v2 " Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-11-17 23:49 UTC (permalink / raw) To: Denys Fedoryshchenko, David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng From: Eric Dumazet <edumazet@google.com> While working on sk_forward_alloc problems reported by Denys Fedoryshchenko, we found that tcp connect() (and fastopen) do not call sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so sk_forward_alloc is negative while connect is in progress. We can fix this by calling regular sk_stream_alloc_skb() both for the SYN packet (in tcp_connect()) and the syn_data packet in tcp_send_syn_data() Then, tcp_send_syn_data() can avoid copying syn_data as we simply can manipulate syn_data->cb[] to remove SYN flag (and increment seq) Instead of open coding memcpy_fromiovecend(), simply use this helper. This leaves in socket write queue clean fast clone skbs. This was tested against our fastopen packetdrill tests. Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/tcp_output.c | 66 +++++++++++++++------------------------- 1 file changed, 26 insertions(+), 40 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index eb73a1dccf56..377e15d0ba96 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_fastopen_request *fo = tp->fastopen_req; - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen; - struct sk_buff *syn_data = NULL, *data; + int syn_loss = 0, space, err = 0; unsigned long last_syn_loss = 0; + struct sk_buff *syn_data; tp->rx_opt.mss_clamp = tp->advmss; /* If MSS is not cached */ tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie, @@ -3044,48 +3044,38 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) /* limit to order-0 allocations */ space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER)); - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space, - sk->sk_allocation); - if (syn_data == NULL) + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation); + if (!syn_data) + goto fallback; + syn_data->ip_summed = CHECKSUM_PARTIAL; + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); + if (memcpy_fromiovecend(skb_put(syn_data, space), + fo->data->msg_iov, 0, space)) goto fallback; - for (i = 0; i < iovlen && syn_data->len < space; ++i) { - struct iovec *iov = &fo->data->msg_iov[i]; - unsigned char __user *from = iov->iov_base; - int len = iov->iov_len; + /* No more data pending in inet_wait_for_connect() */ + if (space == fo->size) + fo->data = NULL; + fo->copied = space; - if (syn_data->len + len > space) - len = space - syn_data->len; - else if (i + 1 == iovlen) - /* No more data pending in inet_wait_for_connect() */ - fo->data = NULL; + tcp_connect_queue_skb(sk, syn_data); - if (skb_add_data(syn_data, from, len)) - goto fallback; - } + err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation); - /* Queue a data-only packet after the regular SYN for retransmission */ - data = pskb_copy(syn_data, sk->sk_allocation); - if (data == NULL) - goto fallback; - TCP_SKB_CB(data)->seq++; - TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN; - TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH); - tcp_connect_queue_skb(sk, data); - fo->copied = data->len; - - /* syn_data is about to be sent, we need to take current time stamps - * for the packets that are in write queue : SYN packet and DATA - */ - skb_mstamp_get(&syn->skb_mstamp); - data->skb_mstamp = syn->skb_mstamp; + syn->skb_mstamp = syn_data->skb_mstamp; - if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) { + /* Now full SYN+DATA was cloned and sent (or not), + * remove the SYN from the original skb (syn_data) + * we keep in write queue in case of a retransmit, as we + * also have the SYN packet (with no data) in the same queue. + */ + TCP_SKB_CB(syn_data)->seq++; + TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH; + if (!err) { tp->syn_data = (fo->copied > 0); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT); goto done; } - syn_data = NULL; fallback: /* Send a regular SYN with Fast Open cookie request option */ @@ -3094,7 +3084,6 @@ fallback: err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation); if (err) tp->syn_fastopen = 0; - kfree_skb(syn_data); done: fo->cookie.len = -1; /* Exclude Fast Open option for SYN retries */ return err; @@ -3114,13 +3103,10 @@ int tcp_connect(struct sock *sk) return 0; } - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation); - if (unlikely(buff == NULL)) + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation); + if (unlikely(!buff)) return -ENOBUFS; - /* Reserve space for headers. */ - skb_reserve(buff, MAX_TCP_HEADER); - tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN); tp->retrans_stamp = tcp_time_stamp; tcp_connect_queue_skb(sk, buff); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 net-next] tcp: make connect() mem charging friendly 2014-11-17 23:49 ` [PATCH net-next] tcp: make connect() mem charging friendly Eric Dumazet @ 2014-11-18 7:06 ` Eric Dumazet 2014-11-19 6:10 ` Yuchung Cheng 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-11-18 7:06 UTC (permalink / raw) To: Denys Fedoryshchenko, David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng From: Eric Dumazet <edumazet@google.com> While working on sk_forward_alloc problems reported by Denys Fedoryshchenko, we found that tcp connect() (and fastopen) do not call sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so sk_forward_alloc is negative while connect is in progress. We can fix this by calling regular sk_stream_alloc_skb() both for the SYN packet (in tcp_connect()) and the syn_data packet in tcp_send_syn_data() Then, tcp_send_syn_data() can avoid copying syn_data as we simply can manipulate syn_data->cb[] to remove SYN flag (and increment seq) Instead of open coding memcpy_fromiovecend(), simply use this helper. This leaves in socket write queue clean fast clone skbs. This was tested against our fastopen packetdrill tests. Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> --- v2: added a kfree_skb(syn_data) if memcpy_fromiovecend() fails, as spotted by Yuchung. net/ipv4/tcp_output.c | 68 ++++++++++++++++------------------------ 1 file changed, 28 insertions(+), 40 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index eb73a1dccf56b823a45c0ca034e40dc50fc48068..f5bd4bd3f7e669b3fd48a843d55e7313a30a3409 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) { struct tcp_sock *tp = tcp_sk(sk); struct tcp_fastopen_request *fo = tp->fastopen_req; - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen; - struct sk_buff *syn_data = NULL, *data; + int syn_loss = 0, space, err = 0; unsigned long last_syn_loss = 0; + struct sk_buff *syn_data; tp->rx_opt.mss_clamp = tp->advmss; /* If MSS is not cached */ tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie, @@ -3044,48 +3044,40 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) /* limit to order-0 allocations */ space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER)); - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space, - sk->sk_allocation); - if (syn_data == NULL) + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation); + if (!syn_data) goto fallback; + syn_data->ip_summed = CHECKSUM_PARTIAL; + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); + if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space), + fo->data->msg_iov, 0, space))) { + kfree_skb(syn_data); + goto fallback; + } - for (i = 0; i < iovlen && syn_data->len < space; ++i) { - struct iovec *iov = &fo->data->msg_iov[i]; - unsigned char __user *from = iov->iov_base; - int len = iov->iov_len; + /* No more data pending in inet_wait_for_connect() */ + if (space == fo->size) + fo->data = NULL; + fo->copied = space; - if (syn_data->len + len > space) - len = space - syn_data->len; - else if (i + 1 == iovlen) - /* No more data pending in inet_wait_for_connect() */ - fo->data = NULL; + tcp_connect_queue_skb(sk, syn_data); - if (skb_add_data(syn_data, from, len)) - goto fallback; - } + err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation); - /* Queue a data-only packet after the regular SYN for retransmission */ - data = pskb_copy(syn_data, sk->sk_allocation); - if (data == NULL) - goto fallback; - TCP_SKB_CB(data)->seq++; - TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN; - TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH); - tcp_connect_queue_skb(sk, data); - fo->copied = data->len; - - /* syn_data is about to be sent, we need to take current time stamps - * for the packets that are in write queue : SYN packet and DATA - */ - skb_mstamp_get(&syn->skb_mstamp); - data->skb_mstamp = syn->skb_mstamp; + syn->skb_mstamp = syn_data->skb_mstamp; - if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) { + /* Now full SYN+DATA was cloned and sent (or not), + * remove the SYN from the original skb (syn_data) + * we keep in write queue in case of a retransmit, as we + * also have the SYN packet (with no data) in the same queue. + */ + TCP_SKB_CB(syn_data)->seq++; + TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH; + if (!err) { tp->syn_data = (fo->copied > 0); NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT); goto done; } - syn_data = NULL; fallback: /* Send a regular SYN with Fast Open cookie request option */ @@ -3094,7 +3086,6 @@ fallback: err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation); if (err) tp->syn_fastopen = 0; - kfree_skb(syn_data); done: fo->cookie.len = -1; /* Exclude Fast Open option for SYN retries */ return err; @@ -3114,13 +3105,10 @@ int tcp_connect(struct sock *sk) return 0; } - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation); - if (unlikely(buff == NULL)) + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation); + if (unlikely(!buff)) return -ENOBUFS; - /* Reserve space for headers. */ - skb_reserve(buff, MAX_TCP_HEADER); - tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN); tp->retrans_stamp = tcp_time_stamp; tcp_connect_queue_skb(sk, buff); ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly 2014-11-18 7:06 ` [PATCH v2 " Eric Dumazet @ 2014-11-19 6:10 ` Yuchung Cheng 2014-11-19 19:57 ` David Miller 0 siblings, 1 reply; 25+ messages in thread From: Yuchung Cheng @ 2014-11-19 6:10 UTC (permalink / raw) To: Eric Dumazet; +Cc: Denys Fedoryshchenko, David Miller, netdev, Neal Cardwell On Tue, Nov 18, 2014 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > From: Eric Dumazet <edumazet@google.com> > > While working on sk_forward_alloc problems reported by Denys > Fedoryshchenko, we found that tcp connect() (and fastopen) do not call > sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so > sk_forward_alloc is negative while connect is in progress. > > We can fix this by calling regular sk_stream_alloc_skb() both for the > SYN packet (in tcp_connect()) and the syn_data packet in > tcp_send_syn_data() > > Then, tcp_send_syn_data() can avoid copying syn_data as we simply > can manipulate syn_data->cb[] to remove SYN flag (and increment seq) > > Instead of open coding memcpy_fromiovecend(), simply use this helper. > > This leaves in socket write queue clean fast clone skbs. > > This was tested against our fastopen packetdrill tests. > > Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Thanks! this simplifies the code a lot. > --- > v2: added a kfree_skb(syn_data) if memcpy_fromiovecend() fails, > as spotted by Yuchung. > > net/ipv4/tcp_output.c | 68 ++++++++++++++++------------------------ > 1 file changed, 28 insertions(+), 40 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index eb73a1dccf56b823a45c0ca034e40dc50fc48068..f5bd4bd3f7e669b3fd48a843d55e7313a30a3409 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -3011,9 +3011,9 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) > { > struct tcp_sock *tp = tcp_sk(sk); > struct tcp_fastopen_request *fo = tp->fastopen_req; > - int syn_loss = 0, space, i, err = 0, iovlen = fo->data->msg_iovlen; > - struct sk_buff *syn_data = NULL, *data; > + int syn_loss = 0, space, err = 0; > unsigned long last_syn_loss = 0; > + struct sk_buff *syn_data; > > tp->rx_opt.mss_clamp = tp->advmss; /* If MSS is not cached */ > tcp_fastopen_cache_get(sk, &tp->rx_opt.mss_clamp, &fo->cookie, > @@ -3044,48 +3044,40 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) > /* limit to order-0 allocations */ > space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER)); > > - syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space, > - sk->sk_allocation); > - if (syn_data == NULL) > + syn_data = sk_stream_alloc_skb(sk, space, sk->sk_allocation); > + if (!syn_data) > goto fallback; > + syn_data->ip_summed = CHECKSUM_PARTIAL; > + memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); > + if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space), > + fo->data->msg_iov, 0, space))) { > + kfree_skb(syn_data); > + goto fallback; > + } > > - for (i = 0; i < iovlen && syn_data->len < space; ++i) { > - struct iovec *iov = &fo->data->msg_iov[i]; > - unsigned char __user *from = iov->iov_base; > - int len = iov->iov_len; > + /* No more data pending in inet_wait_for_connect() */ > + if (space == fo->size) > + fo->data = NULL; > + fo->copied = space; > > - if (syn_data->len + len > space) > - len = space - syn_data->len; > - else if (i + 1 == iovlen) > - /* No more data pending in inet_wait_for_connect() */ > - fo->data = NULL; > + tcp_connect_queue_skb(sk, syn_data); > > - if (skb_add_data(syn_data, from, len)) > - goto fallback; > - } > + err = tcp_transmit_skb(sk, syn_data, 1, sk->sk_allocation); > > - /* Queue a data-only packet after the regular SYN for retransmission */ > - data = pskb_copy(syn_data, sk->sk_allocation); > - if (data == NULL) > - goto fallback; > - TCP_SKB_CB(data)->seq++; > - TCP_SKB_CB(data)->tcp_flags &= ~TCPHDR_SYN; > - TCP_SKB_CB(data)->tcp_flags = (TCPHDR_ACK|TCPHDR_PSH); > - tcp_connect_queue_skb(sk, data); > - fo->copied = data->len; > - > - /* syn_data is about to be sent, we need to take current time stamps > - * for the packets that are in write queue : SYN packet and DATA > - */ > - skb_mstamp_get(&syn->skb_mstamp); > - data->skb_mstamp = syn->skb_mstamp; > + syn->skb_mstamp = syn_data->skb_mstamp; > > - if (tcp_transmit_skb(sk, syn_data, 0, sk->sk_allocation) == 0) { > + /* Now full SYN+DATA was cloned and sent (or not), > + * remove the SYN from the original skb (syn_data) > + * we keep in write queue in case of a retransmit, as we > + * also have the SYN packet (with no data) in the same queue. > + */ > + TCP_SKB_CB(syn_data)->seq++; > + TCP_SKB_CB(syn_data)->tcp_flags = TCPHDR_ACK | TCPHDR_PSH; > + if (!err) { > tp->syn_data = (fo->copied > 0); > NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPORIGDATASENT); > goto done; > } > - syn_data = NULL; > > fallback: > /* Send a regular SYN with Fast Open cookie request option */ > @@ -3094,7 +3086,6 @@ fallback: > err = tcp_transmit_skb(sk, syn, 1, sk->sk_allocation); > if (err) > tp->syn_fastopen = 0; > - kfree_skb(syn_data); > done: > fo->cookie.len = -1; /* Exclude Fast Open option for SYN retries */ > return err; > @@ -3114,13 +3105,10 @@ int tcp_connect(struct sock *sk) > return 0; > } > > - buff = alloc_skb_fclone(MAX_TCP_HEADER + 15, sk->sk_allocation); > - if (unlikely(buff == NULL)) > + buff = sk_stream_alloc_skb(sk, 0, sk->sk_allocation); > + if (unlikely(!buff)) > return -ENOBUFS; > > - /* Reserve space for headers. */ > - skb_reserve(buff, MAX_TCP_HEADER); > - > tcp_init_nondata_skb(buff, tp->write_seq++, TCPHDR_SYN); > tp->retrans_stamp = tcp_time_stamp; > tcp_connect_queue_skb(sk, buff); > > ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly 2014-11-19 6:10 ` Yuchung Cheng @ 2014-11-19 19:57 ` David Miller 2014-11-20 15:33 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: David Miller @ 2014-11-19 19:57 UTC (permalink / raw) To: ycheng; +Cc: eric.dumazet, nuclearcat, netdev, ncardwell From: Yuchung Cheng <ycheng@google.com> Date: Wed, 19 Nov 2014 14:10:42 +0800 > On Tue, Nov 18, 2014 at 3:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote: >> From: Eric Dumazet <edumazet@google.com> >> >> While working on sk_forward_alloc problems reported by Denys >> Fedoryshchenko, we found that tcp connect() (and fastopen) do not call >> sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so >> sk_forward_alloc is negative while connect is in progress. >> >> We can fix this by calling regular sk_stream_alloc_skb() both for the >> SYN packet (in tcp_connect()) and the syn_data packet in >> tcp_send_syn_data() >> >> Then, tcp_send_syn_data() can avoid copying syn_data as we simply >> can manipulate syn_data->cb[] to remove SYN flag (and increment seq) >> >> Instead of open coding memcpy_fromiovecend(), simply use this helper. >> >> This leaves in socket write queue clean fast clone skbs. >> >> This was tested against our fastopen packetdrill tests. >> >> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> >> Signed-off-by: Eric Dumazet <edumazet@google.com> > Acked-by: Yuchung Cheng <ycheng@google.com> > > Thanks! this simplifies the code a lot. Agreed, applied, thanks everyone! ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 net-next] tcp: make connect() mem charging friendly 2014-11-19 19:57 ` David Miller @ 2014-11-20 15:33 ` Eric Dumazet 2014-11-20 16:23 ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-11-20 15:33 UTC (permalink / raw) To: David Miller; +Cc: ycheng, nuclearcat, netdev, ncardwell On Wed, 2014-11-19 at 14:57 -0500, David Miller wrote: > From: Yuchung Cheng <ycheng@google.com> > > > > Thanks! this simplifies the code a lot. > > Agreed, applied, thanks everyone! BTW, unless I am mistaken, it seems we can probably replace memcpy_fromiovecend() by memcpy_fromiovec() and delete from tcp_sendmsg() the annoying code skipping over the already consumed bytes. Something like this untested patch : diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c239f4740d10b10b67ef4fa44c831851fb9e1dcf..227540eef9d0870721258f9ddbace27b417c619e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1089,20 +1089,19 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; int iovlen, flags, err, copied = 0; - int mss_now = 0, size_goal, copied_syn = 0, offset = 0; + int mss_now = 0, size_goal, copied_syn = 0; bool sg; long timeo; lock_sock(sk); flags = msg->msg_flags; - if (flags & MSG_FASTOPEN) { + if (unlikely(flags & MSG_FASTOPEN)) { err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size); if (err == -EINPROGRESS && copied_syn > 0) goto out; else if (err) goto out_err; - offset = copied_syn; } timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); @@ -1151,15 +1150,6 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, unsigned char __user *from = iov->iov_base; iov++; - if (unlikely(offset > 0)) { /* Skip bytes copied in SYN */ - if (offset >= seglen) { - offset -= seglen; - continue; - } - seglen -= offset; - from += offset; - offset = 0; - } while (seglen > 0) { int copy = 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f5bd4bd3f7e669b3fd48a843d55e7313a30a3409..524e5b657e881a348f11def3f48f29a76f54fbab 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3049,8 +3049,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) goto fallback; syn_data->ip_summed = CHECKSUM_PARTIAL; memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); - if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space), - fo->data->msg_iov, 0, space))) { + if (unlikely(memcpy_fromiovec(skb_put(syn_data, space), + fo->data->msg_iov, space))) { kfree_skb(syn_data); goto fallback; } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code 2014-11-20 15:33 ` Eric Dumazet @ 2014-11-20 16:23 ` Eric Dumazet 2014-11-20 23:05 ` Eric Dumazet 0 siblings, 1 reply; 25+ messages in thread From: Eric Dumazet @ 2014-11-20 16:23 UTC (permalink / raw) To: David Miller; +Cc: ycheng, nuclearcat, netdev, ncardwell From: Eric Dumazet <edumazet@google.com> If we consume iovec bytes in tcp_send_syn_data(), we can remove annoying fastopen code in tcp_sendmsg() skipping over the already consumed bytes. Also add an unlikely(flags & MSG_FASTOPEN), as most TCP sendmsg() do not ask for FASTOPEN. Tested: Ran our 125 packetdrill fastopen tests Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/tcp.c | 14 ++------------ net/ipv4/tcp_output.c | 4 ++-- 2 files changed, 4 insertions(+), 14 deletions(-) diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c239f4740d10b10b67ef4fa44c831851fb9e1dcf..227540eef9d0870721258f9ddbace27b417c619e 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1089,20 +1089,19 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; int iovlen, flags, err, copied = 0; - int mss_now = 0, size_goal, copied_syn = 0, offset = 0; + int mss_now = 0, size_goal, copied_syn = 0; bool sg; long timeo; lock_sock(sk); flags = msg->msg_flags; - if (flags & MSG_FASTOPEN) { + if (unlikely(flags & MSG_FASTOPEN)) { err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size); if (err == -EINPROGRESS && copied_syn > 0) goto out; else if (err) goto out_err; - offset = copied_syn; } timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); @@ -1151,15 +1150,6 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, unsigned char __user *from = iov->iov_base; iov++; - if (unlikely(offset > 0)) { /* Skip bytes copied in SYN */ - if (offset >= seglen) { - offset -= seglen; - continue; - } - seglen -= offset; - from += offset; - offset = 0; - } while (seglen > 0) { int copy = 0; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index f5bd4bd3f7e669b3fd48a843d55e7313a30a3409..524e5b657e881a348f11def3f48f29a76f54fbab 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3049,8 +3049,8 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn) goto fallback; syn_data->ip_summed = CHECKSUM_PARTIAL; memcpy(syn_data->cb, syn->cb, sizeof(syn->cb)); - if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space), - fo->data->msg_iov, 0, space))) { + if (unlikely(memcpy_fromiovec(skb_put(syn_data, space), + fo->data->msg_iov, space))) { kfree_skb(syn_data); goto fallback; } ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code 2014-11-20 16:23 ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet @ 2014-11-20 23:05 ` Eric Dumazet 0 siblings, 0 replies; 25+ messages in thread From: Eric Dumazet @ 2014-11-20 23:05 UTC (permalink / raw) To: David Miller, Al Viro; +Cc: ycheng, nuclearcat, netdev, ncardwell On Thu, 2014-11-20 at 08:23 -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > If we consume iovec bytes in tcp_send_syn_data(), we can remove > annoying fastopen code in tcp_sendmsg() skipping over the already > consumed bytes. > > Also add an unlikely(flags & MSG_FASTOPEN), as most TCP sendmsg() do not > ask for FASTOPEN. > > Tested: > Ran our 125 packetdrill fastopen tests > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- Please disregard this patch, Al Viro said sendmsg() was apparently not allowed to change iovec. Some callers depend on it staying constant. Not clear why memcpy_fromiovec() even exists. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) 2014-11-16 18:11 ` Eric Dumazet 2014-11-16 19:05 ` Denys Fedoryshchenko @ 2014-11-16 19:10 ` Eric Dumazet 1 sibling, 0 replies; 25+ messages in thread From: Eric Dumazet @ 2014-11-16 19:10 UTC (permalink / raw) To: Denys Fedoryshchenko; +Cc: Neal Cardwell, Yuchung Cheng, netdev On Sun, 2014-11-16 at 10:11 -0800, Eric Dumazet wrote: > Could you describe your hardware setup and networking setup ? Hmmm, this looks like a bug in TCP fast open. I am cooking a patch. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-11-20 23:05 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-17 10:52 /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? Denys Fedoryshchenko
2014-07-17 11:51 ` Eric Dumazet
2014-07-17 12:13 ` Denys Fedoryshchenko
2014-07-17 13:12 ` Denys Fedoryshchenko
2014-07-17 13:42 ` Eric Dumazet
2014-07-18 8:12 ` Denys Fedoryshchenko
2014-07-18 8:41 ` Eric Dumazet
2014-09-23 20:14 ` Semen Slepov
[not found] ` <1405676214.10255.96.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <0b0b74ab8d2222fa85b23e06c6e992ec@visp.net.lb>
[not found] ` <1405849675.10255.100.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <6fc086a122bc3ac728a7aec9e812310f@visp.net.lb>
[not found] ` <1406016045.10255.122.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <3f463c3912608f7959481edaa6d7668d@visp.net.lb>
[not found] ` <1406194817.3363.70.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <8f138d88e3b850130eb9bb15e3e27069@visp.net.lb>
2014-11-12 17:07 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Denys Fedoryshchenko
2014-11-12 17:27 ` Eric Dumazet
2014-11-16 8:54 ` Denys Fedoryshchenko
2014-11-16 18:11 ` Eric Dumazet
2014-11-16 19:05 ` Denys Fedoryshchenko
2014-11-16 20:16 ` Eric Dumazet
2014-11-17 5:01 ` Eric Dumazet
2014-11-17 10:22 ` Denys Fedoryshchenko
2014-11-17 15:33 ` Denys Fedoryshchenko
[not found] ` <1416240651.5102.10.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <c9a009556b97fcf7399eb73035c7f8a0@visp.net.lb>
[not found] ` <1416243011.5102.19.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <34058cb0ee676b891dd9d54515512733@visp.net.lb>
[not found] ` <0487f30ffd58f2ee9b47135fa5404d25@visp.net.lb>
[not found] ` <1416245491.18588.1.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <ecd48e5dd70f9f83a4a4db3d758cc615@visp.net.lb>
[not found] ` <b2bf14dfc3180a316dd9c9bd89d02c40@visp.net.lb>
[not found] ` <1416252989.18588.2.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <2a4554e39649ac6dfbf2e0366794af08@visp.net.lb>
[not found] ` <1416254490.18588.6.camel@edumazet-glaptop2.roam.corp.google.com>
[not found] ` <ed049b2f9ecc47676e25ec8aeafb25c0@visp.net.lb>
[not found] ` <1f001a887ec3fb5b962e4374391863b9@visp.net.lb>
[not found] ` <e33542339748d2f51a149a4020ee5f75@visp.net.lb>
[not found] ` <57f78d76af5f78317ba69b613123ff9c@visp.net.lb>
[not found] ` <d05242d8a607d4e15f3dd9200223699f@visp.net.lb>
[not found] ` <1416266755.18588.12.camel@edumazet-glaptop2.roam.corp.google.com>
2014-11-17 23:49 ` [PATCH net-next] tcp: make connect() mem charging friendly Eric Dumazet
2014-11-18 7:06 ` [PATCH v2 " Eric Dumazet
2014-11-19 6:10 ` Yuchung Cheng
2014-11-19 19:57 ` David Miller
2014-11-20 15:33 ` Eric Dumazet
2014-11-20 16:23 ` [PATCH net-next] tcp: remove from tcp_sendmsg() some fastopen code Eric Dumazet
2014-11-20 23:05 ` Eric Dumazet
2014-11-16 19:10 ` /proc/net/sockstat invalid memory accounting or memory leak in latest kernels? (trying to debug) Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).