* Re: TCP connection stalls under 2.6.24.7 [not found] ` <200807071118.32988.thomas.jarosch@intra2net.com> @ 2008-07-07 13:18 ` Thomas Jarosch 2008-07-10 13:17 ` Jozsef Kadlecsik 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-07 13:18 UTC (permalink / raw) To: netdev Cc: Jozsef Kadlecsik, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List [-- Attachment #1: Type: text/plain, Size: 22030 bytes --] Hello together, On Monday, 7. July 2008 11:18:32 you wrote: > I'll upgrade to 2.6.25.10 and see if it helps, > there is a TCP connection timeout fix in there: > http://kerneltrap.org/mailarchive/linux-kernel/2008/6/14/2122714 After upgrading to 2.6.25.10, the TCP connection still stalls. I temporarily disabled PMTU discovery, TCP window scaling, TCP SACK and manually forced the MTU to 1400 with no noticable effect. I also added a "iptables -I INPUT -s IP.OF.MAIL.RELAY -j ACCEPT" to make sure it's not related to conntrack on the double. So here are the current results: - 2.6.23.16: Working - 2.6.24: Stalling connection - 2.6.24.7: Stalling connection - 2.6.25.10: Stalling connection Attached is a tcpdump of a stalling connection with the sensitive information replaced by "xxxxx", so please ignore the broken checkums at the beginning. The dump was created using 2.6.24.7. Jozsef Kadlecsik suggested this is not related to netfilter, so I'm now asking for help on netdev. Here's the text output from tcpdump: ----------------------------------------------------------- 13:40:14.140625 IP linux.53132 > mailserver.smtp: S 943411848:943411848(0) win 5808 <mss 1452,sackOK,timestamp 5386646 0,nop,wscale 2> 13:40:14.206523 IP mailserver.smtp > linux.53132: S 4213328541:4213328541(0) ack 943411849 win 65535 <mss 1400> 13:40:14.206548 IP linux.53132 > mailserver.smtp: . ack 1 win 5808 13:40:14.271316 IP mailserver.smtp > linux.53132: P 1:84(83) ack 1 win 65535 13:40:14.271336 IP linux.53132 > mailserver.smtp: . ack 84 win 5808 13:40:14.271395 IP linux.53132 > mailserver.smtp: P 1:26(25) ack 84 win 5808 13:40:14.341555 IP mailserver.smtp > linux.53132: P 84:257(173) ack 26 win 65535 13:40:14.341737 IP linux.53132 > mailserver.smtp: P 26:38(12) ack 257 win 6432 13:40:14.405342 IP mailserver.smtp > linux.53132: P 257:275(18) ack 38 win 65535 13:40:14.405419 IP linux.53132 > mailserver.smtp: P 38:68(30) ack 275 win 6432 13:40:14.471391 IP mailserver.smtp > linux.53132: P 275:293(18) ack 68 win 65535 13:40:14.471485 IP linux.53132 > mailserver.smtp: P 68:82(14) ack 293 win 6432 13:40:14.535423 IP mailserver.smtp > linux.53132: . ack 82 win 65535 13:40:14.539343 IP mailserver.smtp > linux.53132: P 293:324(31) ack 82 win 65535 13:40:14.539405 IP linux.53132 > mailserver.smtp: P 82:224(142) ack 324 win 6432 13:40:14.619489 IP mailserver.smtp > linux.53132: P 324:553(229) ack 224 win 65535 13:40:14.619633 IP linux.53132 > mailserver.smtp: . 224:1624(1400) ack 553 win 7504 13:40:14.619671 IP linux.53132 > mailserver.smtp: . 1624:3024(1400) ack 553 win 7504 13:40:14.746337 IP mailserver.smtp > linux.53132: . ack 1624 win 65535 13:40:14.746378 IP linux.53132 > mailserver.smtp: . 3024:4424(1400) ack 553 win 7504 13:40:14.746414 IP linux.53132 > mailserver.smtp: . 4424:5824(1400) ack 553 win 7504 13:40:14.863352 IP mailserver.smtp > linux.53132: . ack 4424 win 65535 13:40:14.863381 IP linux.53132 > mailserver.smtp: . 5824:7224(1400) ack 553 win 7504 13:40:14.863412 IP linux.53132 > mailserver.smtp: . 7224:8624(1400) ack 553 win 7504 13:40:14.888119 IP linux.53132 > mailserver.smtp: . 8624:10024(1400) ack 553 win 7504 13:40:14.955509 IP mailserver.smtp > linux.53132: . ack 5824 win 65535 13:40:14.955539 IP linux.53132 > mailserver.smtp: . 10024:11424(1400) ack 553 win 7504 13:40:14.955569 IP linux.53132 > mailserver.smtp: P 11424:12512(1088) ack 553 win 7504 13:40:15.048337 IP mailserver.smtp > linux.53132: . ack 8624 win 65535 13:40:15.048365 IP linux.53132 > mailserver.smtp: . 12512:13912(1400) ack 553 win 7504 13:40:15.048397 IP linux.53132 > mailserver.smtp: . 13912:15312(1400) ack 553 win 7504 13:40:15.073100 IP linux.53132 > mailserver.smtp: . 15312:16712(1400) ack 553 win 7504 13:40:15.165394 IP mailserver.smtp > linux.53132: . ack 10024 win 65535 13:40:15.165422 IP linux.53132 > mailserver.smtp: . 16712:18112(1400) ack 553 win 7504 13:40:15.165452 IP linux.53132 > mailserver.smtp: . 18112:19512(1400) ack 553 win 7504 13:40:15.271312 IP mailserver.smtp > linux.53132: . ack 13912 win 65535 13:40:15.271343 IP linux.53132 > mailserver.smtp: P 19512:20704(1192) ack 553 win 7504 13:40:15.271386 IP linux.53132 > mailserver.smtp: . 20704:22104(1400) ack 553 win 7504 13:40:15.296088 IP linux.53132 > mailserver.smtp: . 22104:23504(1400) ack 553 win 7504 13:40:15.320793 IP linux.53132 > mailserver.smtp: . 23504:24904(1400) ack 553 win 7504 13:40:15.375251 IP mailserver.smtp > linux.53132: . ack 15312 win 65535 13:40:15.375273 IP linux.53132 > mailserver.smtp: . 24904:26304(1400) ack 553 win 7504 13:40:15.375303 IP linux.53132 > mailserver.smtp: . 26304:27704(1400) ack 553 win 7504 13:40:15.447472 IP mailserver.smtp > linux.53132: . ack 18112 win 65535 13:40:15.447524 IP linux.53132 > mailserver.smtp: . 27704:29104(1400) ack 553 win 7504 13:40:15.447559 IP linux.53132 > mailserver.smtp: . 29104:30504(1400) ack 553 win 7504 13:40:15.472265 IP linux.53132 > mailserver.smtp: . 30504:31904(1400) ack 553 win 7504 13:40:15.585446 IP mailserver.smtp > linux.53132: . ack 20704 win 65535 13:40:15.585487 IP linux.53132 > mailserver.smtp: P 31904:32992(1088) ack 553 win 7504 13:40:15.585614 IP linux.53132 > mailserver.smtp: . 32992:34392(1400) ack 553 win 7504 13:40:15.610316 IP linux.53132 > mailserver.smtp: . 34392:35792(1400) ack 553 win 7504 13:40:15.677292 IP mailserver.smtp > linux.53132: . ack 23504 win 65535 13:40:15.677313 IP linux.53132 > mailserver.smtp: . 35792:37192(1400) ack 553 win 7504 13:40:15.677342 IP linux.53132 > mailserver.smtp: . 37192:38592(1400) ack 553 win 7504 13:40:15.702048 IP linux.53132 > mailserver.smtp: . 38592:39992(1400) ack 553 win 7504 13:40:15.796288 IP mailserver.smtp > linux.53132: . ack 24904 win 65535 13:40:15.796314 IP linux.53132 > mailserver.smtp: . 39992:41392(1400) ack 553 win 7504 13:40:15.796350 IP linux.53132 > mailserver.smtp: . 41392:42792(1400) ack 553 win 7504 13:40:15.856442 IP mailserver.smtp > linux.53132: . ack 27704 win 65535 13:40:15.856470 IP linux.53132 > mailserver.smtp: . 42792:44192(1400) ack 553 win 7504 13:40:15.856515 IP linux.53132 > mailserver.smtp: . 44192:45592(1400) ack 553 win 7504 13:40:15.881218 IP linux.53132 > mailserver.smtp: . 45592:46992(1400) ack 553 win 7504 13:40:15.977365 IP mailserver.smtp > linux.53132: . ack 30504 win 65535 13:40:15.977389 IP linux.53132 > mailserver.smtp: . 46992:48392(1400) ack 553 win 7504 13:40:16.001505 IP linux.53132 > mailserver.smtp: . 48392:49792(1400) ack 553 win 7504 13:40:16.001534 IP linux.53132 > mailserver.smtp: . 49792:51192(1400) ack 553 win 7504 13:40:16.141214 IP mailserver.smtp > linux.53132: . ack 34392 win 65535 13:40:16.141249 IP linux.53132 > mailserver.smtp: . 51192:52592(1400) ack 553 win 7504 13:40:16.141280 IP linux.53132 > mailserver.smtp: . 52592:53992(1400) ack 553 win 7504 13:40:16.165987 IP linux.53132 > mailserver.smtp: . 53992:55392(1400) ack 553 win 7504 13:40:16.190691 IP linux.53132 > mailserver.smtp: . 55392:56792(1400) ack 553 win 7504 13:40:16.215342 IP mailserver.smtp > linux.53132: . ack 35792 win 65535 13:40:16.215393 IP linux.53132 > mailserver.smtp: . 56792:58192(1400) ack 553 win 7504 13:40:16.240096 IP linux.53132 > mailserver.smtp: . 58192:59592(1400) ack 553 win 7504 13:40:16.329180 IP mailserver.smtp > linux.53132: . ack 38592 win 65535 13:40:16.329220 IP linux.53132 > mailserver.smtp: . 59592:60992(1400) ack 553 win 7504 13:40:16.329255 IP linux.53132 > mailserver.smtp: P 60992:61664(672) ack 553 win 7504 13:40:16.341471 IP linux.53132 > mailserver.smtp: . 61664:63064(1400) ack 553 win 7504 13:40:16.425284 IP mailserver.smtp > linux.53132: . ack 39992 win 65535 13:40:16.425322 IP linux.53132 > mailserver.smtp: . 63064:64464(1400) ack 553 win 7504 13:40:16.425357 IP linux.53132 > mailserver.smtp: . 64464:65864(1400) ack 553 win 7504 13:40:16.505348 IP mailserver.smtp > linux.53132: . ack 42792 win 65535 13:40:16.505387 IP linux.53132 > mailserver.smtp: . 65864:67264(1400) ack 553 win 7504 13:40:16.505420 IP linux.53132 > mailserver.smtp: . 67264:68664(1400) ack 553 win 7504 13:40:16.530126 IP linux.53132 > mailserver.smtp: . 68664:70064(1400) ack 553 win 7504 13:40:16.622359 IP mailserver.smtp > linux.53132: . ack 45592 win 65535 13:40:16.622387 IP linux.53132 > mailserver.smtp: . 70064:71464(1400) ack 553 win 7504 13:40:16.622417 IP linux.53132 > mailserver.smtp: . 71464:72864(1400) ack 553 win 7504 13:40:16.647124 IP linux.53132 > mailserver.smtp: . 72864:74264(1400) ack 553 win 7504 13:40:16.751201 IP mailserver.smtp > linux.53132: . ack 48392 win 65535 13:40:16.751228 IP linux.53132 > mailserver.smtp: . 74264:75664(1400) ack 553 win 7504 13:40:16.751259 IP linux.53132 > mailserver.smtp: . 75664:77064(1400) ack 553 win 7504 13:40:16.775965 IP linux.53132 > mailserver.smtp: . 77064:78464(1400) ack 553 win 7504 13:40:16.840381 IP mailserver.smtp > linux.53132: . ack 49792 win 65535 13:40:16.840419 IP linux.53132 > mailserver.smtp: . 78464:79864(1400) ack 553 win 7504 13:40:16.840450 IP linux.53132 > mailserver.smtp: . 79864:81264(1400) ack 553 win 7504 13:40:16.927375 IP mailserver.smtp > linux.53132: . ack 52592 win 65535 13:40:16.927401 IP linux.53132 > mailserver.smtp: . 81264:82664(1400) ack 553 win 7504 13:40:16.927433 IP linux.53132 > mailserver.smtp: . 82664:84064(1400) ack 553 win 7504 13:40:16.952139 IP linux.53132 > mailserver.smtp: . 84064:85464(1400) ack 553 win 7504 13:40:17.045338 IP mailserver.smtp > linux.53132: . ack 55392 win 65535 13:40:17.045374 IP linux.53132 > mailserver.smtp: . 85464:86864(1400) ack 553 win 7504 13:40:17.045406 IP linux.53132 > mailserver.smtp: . 86864:88264(1400) ack 553 win 7504 13:40:17.070113 IP linux.53132 > mailserver.smtp: . 88264:89664(1400) ack 553 win 7504 13:40:17.162120 IP mailserver.smtp > linux.53132: . ack 58192 win 65535 13:40:17.162148 IP linux.53132 > mailserver.smtp: . 89664:91064(1400) ack 553 win 7504 13:40:17.162179 IP linux.53132 > mailserver.smtp: . 91064:92464(1400) ack 553 win 7504 13:40:17.186886 IP linux.53132 > mailserver.smtp: . 92464:93864(1400) ack 553 win 7504 13:40:17.255239 IP mailserver.smtp > linux.53132: . ack 59592 win 65535 13:40:17.255268 IP linux.53132 > mailserver.smtp: . 93864:95264(1400) ack 553 win 7504 13:40:17.255298 IP linux.53132 > mailserver.smtp: . 95264:96664(1400) ack 553 win 7504 13:40:17.368334 IP mailserver.smtp > linux.53132: . ack 63064 win 65535 13:40:17.368390 IP linux.53132 > mailserver.smtp: . 96664:98064(1400) ack 553 win 7504 13:40:17.368423 IP linux.53132 > mailserver.smtp: P 98064:98528(464) ack 553 win 7504 13:40:17.377076 IP linux.53132 > mailserver.smtp: . 98528:99928(1400) ack 553 win 7504 13:40:17.401781 IP linux.53132 > mailserver.smtp: . 99928:101328(1400) ack 553 win 7504 13:40:17.465163 IP mailserver.smtp > linux.53132: . ack 64464 win 65535 13:40:17.465230 IP linux.53132 > mailserver.smtp: . 101328:102728(1400) ack 553 win 7504 13:40:17.465265 IP linux.53132 > mailserver.smtp: . 102728:104128(1400) ack 553 win 7504 13:40:17.544242 IP mailserver.smtp > linux.53132: . ack 67264 win 65535 13:40:17.544272 IP linux.53132 > mailserver.smtp: . 104128:105528(1400) ack 553 win 7504 13:40:17.544303 IP linux.53132 > mailserver.smtp: . 105528:106928(1400) ack 553 win 7504 13:40:17.569011 IP linux.53132 > mailserver.smtp: . 106928:108328(1400) ack 553 win 7504 13:40:17.661252 IP mailserver.smtp > linux.53132: . ack 70064 win 65535 13:40:17.661289 IP linux.53132 > mailserver.smtp: . 108328:109728(1400) ack 553 win 7504 13:40:17.661320 IP linux.53132 > mailserver.smtp: . 109728:111128(1400) ack 553 win 7504 13:40:17.686027 IP linux.53132 > mailserver.smtp: . 111128:112528(1400) ack 553 win 7504 13:40:17.792315 IP mailserver.smtp > linux.53132: . ack 72864 win 65535 13:40:17.792346 IP linux.53132 > mailserver.smtp: . 112528:113928(1400) ack 553 win 7504 13:40:17.792377 IP linux.53132 > mailserver.smtp: . 113928:115328(1400) ack 553 win 7504 13:40:17.817082 IP linux.53132 > mailserver.smtp: . 115328:116728(1400) ack 553 win 7504 13:40:17.875197 IP mailserver.smtp > linux.53132: . ack 74264 win 65535 13:40:17.875215 IP linux.53132 > mailserver.smtp: . 116728:118128(1400) ack 553 win 7504 13:40:17.899923 IP linux.53132 > mailserver.smtp: . 118128:119528(1400) ack 553 win 7504 13:40:17.980287 IP mailserver.smtp > linux.53132: . ack 77064 win 65535 13:40:17.980334 IP linux.53132 > mailserver.smtp: . 119528:120928(1400) ack 553 win 7504 13:40:17.980365 IP linux.53132 > mailserver.smtp: . 120928:122328(1400) ack 553 win 7504 13:40:18.005072 IP linux.53132 > mailserver.smtp: . 122328:123728(1400) ack 553 win 7504 13:40:18.085234 IP mailserver.smtp > linux.53132: . ack 78464 win 65535 13:40:18.085265 IP linux.53132 > mailserver.smtp: . 123728:125128(1400) ack 553 win 7504 13:40:18.085295 IP linux.53132 > mailserver.smtp: . 125128:126528(1400) ack 553 win 7504 13:40:18.156177 IP mailserver.smtp > linux.53132: . ack 81264 win 65535 13:40:18.156206 IP linux.53132 > mailserver.smtp: . 126528:127928(1400) ack 553 win 7504 13:40:18.156237 IP linux.53132 > mailserver.smtp: . 127928:129328(1400) ack 553 win 7504 13:40:18.180942 IP linux.53132 > mailserver.smtp: . 129328:130728(1400) ack 553 win 7504 13:40:18.274172 IP mailserver.smtp > linux.53132: . ack 84064 win 65535 13:40:18.274216 IP linux.53132 > mailserver.smtp: . 130728:132128(1400) ack 553 win 7504 13:40:18.274248 IP linux.53132 > mailserver.smtp: . 132128:133528(1400) ack 553 win 7504 13:40:18.298950 IP linux.53132 > mailserver.smtp: . 133528:134928(1400) ack 553 win 7504 13:40:18.390240 IP mailserver.smtp > linux.53132: . ack 86864 win 65535 13:40:18.390279 IP linux.53132 > mailserver.smtp: . 134928:136328(1400) ack 553 win 7504 13:40:18.390310 IP linux.53132 > mailserver.smtp: . 136328:137728(1400) ack 553 win 7504 13:40:18.415017 IP linux.53132 > mailserver.smtp: . 137728:139128(1400) ack 553 win 7504 13:40:18.495173 IP mailserver.smtp > linux.53132: . ack 88264 win 65535 13:40:18.495211 IP linux.53132 > mailserver.smtp: . 139128:140528(1400) ack 553 win 7504 13:40:18.495241 IP linux.53132 > mailserver.smtp: . 140528:141928(1400) ack 553 win 7504 13:40:18.684146 IP mailserver.smtp > linux.53132: . ack 93864 win 65535 13:40:18.684178 IP linux.53132 > mailserver.smtp: . 141928:143328(1400) ack 553 win 7504 13:40:18.684209 IP linux.53132 > mailserver.smtp: . 143328:144728(1400) ack 553 win 7504 13:40:18.708919 IP linux.53132 > mailserver.smtp: . 144728:146128(1400) ack 553 win 7504 13:40:18.733633 IP linux.53132 > mailserver.smtp: . 146128:147528(1400) ack 553 win 7504 13:40:18.758340 IP linux.53132 > mailserver.smtp: . 147528:148928(1400) ack 553 win 7504 13:40:18.801152 IP mailserver.smtp > linux.53132: . ack 96664 win 65535 13:40:18.801206 IP linux.53132 > mailserver.smtp: . 148928:150328(1400) ack 553 win 7504 13:40:18.801236 IP linux.53132 > mailserver.smtp: . 150328:151728(1400) ack 553 win 7504 13:40:18.825942 IP linux.53132 > mailserver.smtp: . 151728:153128(1400) ack 553 win 7504 13:40:18.916231 IP mailserver.smtp > linux.53132: . ack 98528 win 65535 13:40:18.916287 IP linux.53132 > mailserver.smtp: . 153128:154528(1400) ack 553 win 7504 13:40:18.916320 IP linux.53132 > mailserver.smtp: . 154528:155928(1400) ack 553 win 7504 13:40:18.941026 IP linux.53132 > mailserver.smtp: . 155928:157328(1400) ack 553 win 7504 13:40:19.000201 IP mailserver.smtp > linux.53132: . ack 101328 win 65535 13:40:19.000240 IP linux.53132 > mailserver.smtp: . 157328:158728(1400) ack 553 win 7504 13:40:19.000271 IP linux.53132 > mailserver.smtp: . 158728:160128(1400) ack 553 win 7504 13:40:19.024978 IP linux.53132 > mailserver.smtp: . 160128:161528(1400) ack 553 win 7504 13:40:19.118224 IP mailserver.smtp > linux.53132: . ack 104128 win 65535 13:40:19.118256 IP linux.53132 > mailserver.smtp: . 161528:162928(1400) ack 553 win 7504 13:40:19.118286 IP linux.53132 > mailserver.smtp: . 162928:164328(1400) ack 553 win 7504 13:40:19.142994 IP linux.53132 > mailserver.smtp: . 164328:165728(1400) ack 553 win 7504 13:40:19.235024 IP mailserver.smtp > linux.53132: . ack 106928 win 65535 13:40:19.235100 IP linux.53132 > mailserver.smtp: . 165728:167128(1400) ack 553 win 7504 13:40:19.235135 IP linux.53132 > mailserver.smtp: . 167128:168528(1400) ack 553 win 7504 13:40:19.256950 IP linux.53132 > mailserver.smtp: . 168528:169928(1400) ack 553 win 7504 13:40:19.327125 IP mailserver.smtp > linux.53132: . ack 108328 win 65535 13:40:19.327174 IP linux.53132 > mailserver.smtp: . 169928:171328(1400) ack 553 win 7504 13:40:19.327205 IP linux.53132 > mailserver.smtp: . 171328:172728(1400) ack 553 win 7504 13:40:19.411138 IP mailserver.smtp > linux.53132: . ack 111128 win 65535 13:40:19.411173 IP linux.53132 > mailserver.smtp: . 172728:174128(1400) ack 553 win 7504 13:40:19.411205 IP linux.53132 > mailserver.smtp: . 174128:175528(1400) ack 553 win 7504 13:40:19.435913 IP linux.53132 > mailserver.smtp: P 175528:176352(824) ack 553 win 7504 13:40:19.528188 IP mailserver.smtp > linux.53132: . ack 113928 win 65535 13:40:19.528224 IP linux.53132 > mailserver.smtp: . 176352:177752(1400) ack 553 win 7504 13:40:19.528258 IP linux.53132 > mailserver.smtp: . 177752:179152(1400) ack 553 win 7504 13:40:19.646160 IP mailserver.smtp > linux.53132: . ack 116728 win 65535 13:40:19.646199 IP linux.53132 > mailserver.smtp: . 179152:180552(1400) ack 553 win 7504 13:40:19.646233 IP linux.53132 > mailserver.smtp: . 180552:181952(1400) ack 553 win 7504 13:40:19.803080 IP mailserver.smtp > linux.53132: . ack 119528 win 65535 13:40:19.803106 IP linux.53132 > mailserver.smtp: . 181952:183352(1400) ack 553 win 7504 13:40:19.803139 IP linux.53132 > mailserver.smtp: . 183352:184752(1400) ack 553 win 7504 13:40:19.920136 IP mailserver.smtp > linux.53132: . ack 122328 win 65535 13:40:19.920185 IP linux.53132 > mailserver.smtp: . 184752:186152(1400) ack 553 win 7504 13:40:19.920218 IP linux.53132 > mailserver.smtp: . 186152:187552(1400) ack 553 win 7504 13:40:20.037145 IP mailserver.smtp > linux.53132: . ack 125128 win 65535 13:40:20.037176 IP linux.53132 > mailserver.smtp: . 187552:188952(1400) ack 553 win 7504 13:40:20.037209 IP linux.53132 > mailserver.smtp: . 188952:190352(1400) ack 553 win 7504 13:40:20.153935 IP mailserver.smtp > linux.53132: . ack 126528 win 65535 13:40:20.153966 IP linux.53132 > mailserver.smtp: . 190352:191752(1400) ack 553 win 7504 13:40:20.213044 IP mailserver.smtp > linux.53132: . ack 129328 win 65535 13:40:20.213063 IP linux.53132 > mailserver.smtp: . 191752:193152(1400) ack 553 win 7504 13:40:20.213093 IP linux.53132 > mailserver.smtp: . 193152:194552(1400) ack 553 win 7504 13:40:20.331045 IP mailserver.smtp > linux.53132: . ack 132128 win 65535 13:40:20.331106 IP linux.53132 > mailserver.smtp: . 194552:195952(1400) ack 553 win 7504 13:40:20.331141 IP linux.53132 > mailserver.smtp: . 195952:197352(1400) ack 553 win 7504 13:40:20.448086 IP mailserver.smtp > linux.53132: . ack 134928 win 65535 13:40:20.448153 IP linux.53132 > mailserver.smtp: . 197352:198752(1400) ack 553 win 7504 13:40:20.448188 IP linux.53132 > mailserver.smtp: . 198752:200152(1400) ack 553 win 7504 13:40:20.565142 IP mailserver.smtp > linux.53132: . ack 136328 win 65535 13:40:20.565178 IP linux.53132 > mailserver.smtp: . 200152:201552(1400) ack 553 win 7504 13:40:20.627890 IP mailserver.smtp > linux.53132: . ack 139128 win 65535 13:40:20.627916 IP linux.53132 > mailserver.smtp: . 201552:202952(1400) ack 553 win 7504 13:40:20.627949 IP linux.53132 > mailserver.smtp: . 202952:204352(1400) ack 553 win 7504 13:40:23.945532 IP linux.53132 > mailserver.smtp: . 139128:140528(1400) ack 553 win 7504 13:40:24.124744 IP mailserver.smtp > linux.53132: . ack 140528 win 65535 13:40:24.124779 IP linux.53132 > mailserver.smtp: . 204352:205752(1400) ack 553 win 7504 13:40:30.761559 IP linux.53132 > mailserver.smtp: . 140528:141928(1400) ack 553 win 7504 13:40:30.879206 IP mailserver.smtp > linux.53132: . ack 147528 win 65535 13:40:30.879244 IP linux.53132 > mailserver.smtp: . 205752:207152(1400) ack 553 win 7504 13:40:30.879274 IP linux.53132 > mailserver.smtp: . 207152:208552(1400) ack 553 win 7504 13:40:44.157537 IP linux.53132 > mailserver.smtp: . 147528:148928(1400) ack 553 win 7504 13:40:44.277506 IP mailserver.smtp > linux.53132: . ack 150328 win 65535 13:40:44.277546 IP linux.53132 > mailserver.smtp: . 208552:209952(1400) ack 553 win 7504 13:40:44.277579 IP linux.53132 > mailserver.smtp: . 209952:211352(1400) ack 553 win 7504 13:41:10.837536 IP linux.53132 > mailserver.smtp: . 150328:151728(1400) ack 553 win 7504 13:41:10.955575 IP mailserver.smtp > linux.53132: . ack 154528 win 65535 13:41:10.955610 IP linux.53132 > mailserver.smtp: . 211352:212752(1400) ack 553 win 7504 13:41:10.955642 IP linux.53132 > mailserver.smtp: . 212752:214152(1400) ack 553 win 7504 13:42:04.073557 IP linux.53132 > mailserver.smtp: . 154528:155928(1400) ack 553 win 7504 13:42:04.198891 IP mailserver.smtp > linux.53132: . ack 155928 win 65535 13:42:04.198938 IP linux.53132 > mailserver.smtp: . 214152:215552(1400) ack 553 win 7504 13:42:04.198970 IP linux.53132 > mailserver.smtp: . 215552:216952(1400) ack 553 win 7504 13:43:50.437541 IP linux.53132 > mailserver.smtp: . 155928:157328(1400) ack 553 win 7504 13:43:50.696615 IP mailserver.smtp > linux.53132: . ack 157328 win 65535 13:43:50.696641 IP linux.53132 > mailserver.smtp: . 216952:218352(1400) ack 553 win 7504 13:43:50.696671 IP linux.53132 > mailserver.smtp: . 218352:219752(1400) ack 553 win 7504 13:44:51.681540 IP mailserver.smtp > linux.41085: P 3630759848:3630759915(67) ack 1371960018 win 65535 13:44:51.681568 IP linux.41085 > mailserver.smtp: R 1371960018:1371960018(0) win 0 13:44:51.681583 IP mailserver.smtp > linux.41085: F 67:67(0) ack 1 win 65535 13:44:51.681594 IP linux.41085 > mailserver.smtp: R 1371960018:1371960018(0) win 0 ----------------------------------------------------------- It just looks like some ACKs never made it to the linux box. Any idea how I can further troubleshoot the stalling connection? Please CC: comments, I'm only on netfilter-devel. Thanks in advance, Thomas [-- Attachment #2: smtp.tcpdump.bz2 --] [-- Type: application/x-bzip2, Size: 173068 bytes --] ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-07 13:18 ` TCP connection stalls under 2.6.24.7 Thomas Jarosch @ 2008-07-10 13:17 ` Jozsef Kadlecsik 2008-07-10 14:12 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Jozsef Kadlecsik @ 2008-07-10 13:17 UTC (permalink / raw) To: Thomas Jarosch Cc: netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List On Mon, 7 Jul 2008, Thomas Jarosch wrote: > On Monday, 7. July 2008 11:18:32 you wrote: > > I'll upgrade to 2.6.25.10 and see if it helps, > > there is a TCP connection timeout fix in there: > > http://kerneltrap.org/mailarchive/linux-kernel/2008/6/14/2122714 > > After upgrading to 2.6.25.10, the TCP connection still stalls. > > I temporarily disabled PMTU discovery, TCP window scaling, TCP SACK > and manually forced the MTU to 1400 with no noticable effect. > > I also added a "iptables -I INPUT -s IP.OF.MAIL.RELAY -j ACCEPT" > to make sure it's not related to conntrack on the double. > > So here are the current results: > - 2.6.23.16: Working > - 2.6.24: Stalling connection > - 2.6.24.7: Stalling connection > - 2.6.25.10: Stalling connection > > Attached is a tcpdump of a stalling connection with the > sensitive information replaced by "xxxxx", so please ignore the broken > checkums at the beginning. The dump was created using 2.6.24.7. > > Jozsef Kadlecsik suggested this is not related to netfilter, > so I'm now asking for help on netdev. > > Here's the text output from tcpdump: > ----------------------------------------------------------- > 13:40:14.140625 IP linux.53132 > mailserver.smtp: S 943411848:943411848(0) win 5808 <mss 1452,sackOK,timestamp 5386646 0,nop,wscale 2> > 13:40:14.206523 IP mailserver.smtp > linux.53132: S 4213328541:4213328541(0) ack 943411849 win 65535 <mss 1400> [...] > 13:42:04.198938 IP linux.53132 > mailserver.smtp: . 214152:215552(1400) ack 553 win 7504 > 13:42:04.198970 IP linux.53132 > mailserver.smtp: . 215552:216952(1400) ack 553 win 7504 > 13:43:50.437541 IP linux.53132 > mailserver.smtp: . 155928:157328(1400) ack 553 win 7504 > 13:43:50.696615 IP mailserver.smtp > linux.53132: . ack 157328 win 65535 > 13:43:50.696641 IP linux.53132 > mailserver.smtp: . 216952:218352(1400) ack 553 win 7504 > 13:43:50.696671 IP linux.53132 > mailserver.smtp: . 218352:219752(1400) ack 553 win 7504 It looks as the smtp server receives the packets slowly and it's just behind the client. There's no more packet to/from port 53132 in the tcpdump. > 13:44:51.681540 IP mailserver.smtp > linux.41085: P 3630759848:3630759915(67) ack 1371960018 win 65535 > 13:44:51.681568 IP linux.41085 > mailserver.smtp: R 1371960018:1371960018(0) win 0 > 13:44:51.681583 IP mailserver.smtp > linux.41085: F 67:67(0) ack 1 win 65535 > 13:44:51.681594 IP linux.41085 > mailserver.smtp: R 1371960018:1371960018(0) win 0 But the first packet above from the server looks just wrong in the context: the port of the client "changed". This is why the client sends the RST packet back as there's no such TCP connection there. Makes no sense at all... [Wild guessing: broken virtualized SMTP server migration?] Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-10 13:17 ` Jozsef Kadlecsik @ 2008-07-10 14:12 ` Thomas Jarosch 2008-07-10 21:21 ` Jozsef Kadlecsik 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-10 14:12 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List On Thursday, 10. July 2008 15:17:53 you wrote: > It looks as the smtp server receives the packets slowly and it's just > behind the client. There's no more packet to/from port 53132 in the > tcpdump. Thanks for looking into this, Jozsef. If you take a look at the timing information, the connection was already running ~270 seconds without real data transfer. The mailserver then aborts the SMTP connection with the error msg: "421 mailbackup.webpage.t-com.de Lost connection to [217.85.147.6]" Only after that the port is "changed" to the wrong one. The time ranges between the retransmissions seem to really go downhill after the first retransmission. The linux box is connected via a mostly idle 2 mbit SDSL line, the mailserver is located at the provider. So theoretically this shouldn't be slow at all. This is also proved as 2.6.23.17 works without trouble. As noted before, small mails below ~220kb always seem to get through. Is there any feature in TCP that could trigger such a behaviour? This smells like some queue getting full. I'll double check there is not some kind of traffic shaping in place. > > 13:44:51.681540 IP mailserver.smtp > linux.41085: P > ... > > But the first packet above from the server looks just wrong in the > context: the port of the client "changed". This is why the client sends > the RST packet back as there's no such TCP connection there. > > Makes no sense at all... > > [Wild guessing: broken virtualized SMTP server migration?] Oh, that really looks strange. Maybe the error handling of the server/load balancer/whatever is broken. The Fritz!box router-in-between worked fine for a day, but now we just had another mail stuck in the queue. So it seems to soften the problem a bit, but does not solve it. Thomas ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-10 14:12 ` Thomas Jarosch @ 2008-07-10 21:21 ` Jozsef Kadlecsik 2008-07-11 14:33 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Jozsef Kadlecsik @ 2008-07-10 21:21 UTC (permalink / raw) To: Thomas Jarosch Cc: netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List On Thu, 10 Jul 2008, Thomas Jarosch wrote: > On Thursday, 10. July 2008 15:17:53 you wrote: > > It looks as the smtp server receives the packets slowly and it's just > > behind the client. There's no more packet to/from port 53132 in the > > tcpdump. > > Thanks for looking into this, Jozsef. If you take a look at the timing > information, the connection was already running ~270 seconds without real > data transfer. The mailserver then aborts the SMTP connection with the error > msg: "421 mailbackup.webpage.t-com.de Lost connection to [217.85.147.6]" > Only after that the port is "changed" to the wrong one. > > The time ranges between the retransmissions seem > to really go downhill after the first retransmission. > > The linux box is connected via a mostly idle 2 mbit SDSL line, the mailserver > is located at the provider. So theoretically this shouldn't be slow at all. > This is also proved as 2.6.23.17 works without trouble. You did not mention the type of your driver. Isn't there some changes in the driver code between 2.6.23.17 and 2.6.24 which could cause such stallings? Best regards, Jozsef - E-mail : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt Address : KFKI Research Institute for Particle and Nuclear Physics H-1525 Budapest 114, POB. 49, Hungary ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-10 21:21 ` Jozsef Kadlecsik @ 2008-07-11 14:33 ` Thomas Jarosch 2008-07-15 11:47 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-11 14:33 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List [-- Attachment #1: Type: text/plain, Size: 18655 bytes --] On Thursday, 10. July 2008 23:21:37 Jozsef Kadlecsik wrote: > You did not mention the type of your driver. Isn't there some changes in > the driver code between 2.6.23.17 and 2.6.24 which could cause such > stallings? It's a 8139too and the "same" hardware works fine at other sites. I tried the nmap trick mentioned by Dâniel Fraga with no noticable difference. Here's another tcpdump created today, maybe it shows some new/different information. This time I didn't capture the complete packets to keep it small: 15:43:30.580952 IP linux.52292 > mailserver.smtp: S 1353475948:1353475948(0) win 5808 <mss 1452,sackOK,timestamp 87884244[|tcp]> 15:43:30.646396 IP mailserver.smtp > linux.52292: S 3868230700:3868230700(0) ack 1353475949 win 65535 <mss 1448> 15:43:30.646421 IP linux.52292 > mailserver.smtp: . ack 1 win 5808 15:43:30.711442 IP mailserver.smtp > linux.52292: P 1:79(78) ack 1 win 65535 15:43:30.711457 IP linux.52292 > mailserver.smtp: . ack 79 win 5808 15:43:30.711525 IP linux.52292 > mailserver.smtp: P 1:26(25) ack 79 win 5808 15:43:30.781440 IP mailserver.smtp > linux.52292: P 79:246(167) ack 26 win 65535 15:43:30.781565 IP linux.52292 > mailserver.smtp: P 26:38(12) ack 246 win 6432 15:43:30.844482 IP mailserver.smtp > linux.52292: P 246:264(18) ack 38 win 65535 15:43:30.844551 IP linux.52292 > mailserver.smtp: P 38:68(30) ack 264 win 6432 15:43:30.908539 IP mailserver.smtp > linux.52292: P 264:282(18) ack 68 win 65535 15:43:30.908580 IP linux.52292 > mailserver.smtp: P 68:82(14) ack 282 win 6432 15:43:30.978507 IP mailserver.smtp > linux.52292: P 282:313(31) ack 82 win 65535 15:43:30.978647 IP linux.52292 > mailserver.smtp: P 82:224(142) ack 313 win 6432 15:43:31.056371 IP mailserver.smtp > linux.52292: P 313:542(229) ack 224 win 65535 15:43:31.056476 IP linux.52292 > mailserver.smtp: . 224:1672(1448) ack 542 win 7504 15:43:31.056510 IP linux.52292 > mailserver.smtp: . 1672:3120(1448) ack 542 win 7504 15:43:31.056541 IP linux.52292 > mailserver.smtp: P 3120:4320(1200) ack 542 win 7504 15:43:31.239389 IP mailserver.smtp > linux.52292: . ack 3120 win 65535 15:43:31.239425 IP linux.52292 > mailserver.smtp: . 4320:5768(1448) ack 542 win 7504 15:43:31.239461 IP linux.52292 > mailserver.smtp: . 5768:7216(1448) ack 542 win 7504 15:43:31.420453 IP mailserver.smtp > linux.52292: . ack 7216 win 65535 15:43:31.420481 IP linux.52292 > mailserver.smtp: . 7216:8664(1448) ack 542 win 7504 15:43:31.420513 IP linux.52292 > mailserver.smtp: . 8664:10112(1448) ack 542 win 7504 15:43:31.420542 IP linux.52292 > mailserver.smtp: . 10112:11560(1448) ack 542 win 7504 15:43:31.601274 IP mailserver.smtp > linux.52292: . ack 10112 win 65535 15:43:31.601300 IP linux.52292 > mailserver.smtp: . 11560:13008(1448) ack 542 win 7504 15:43:31.601331 IP linux.52292 > mailserver.smtp: . 13008:14456(1448) ack 542 win 7504 15:43:31.730384 IP mailserver.smtp > linux.52292: . ack 13008 win 65535 15:43:31.730423 IP linux.52292 > mailserver.smtp: . 14456:15904(1448) ack 542 win 7504 15:43:31.730457 IP linux.52292 > mailserver.smtp: . 15904:17352(1448) ack 542 win 7504 15:43:31.867331 IP mailserver.smtp > linux.52292: . ack 15904 win 65535 15:43:31.867348 IP linux.52292 > mailserver.smtp: . 17352:18800(1448) ack 542 win 7504 15:43:31.867378 IP linux.52292 > mailserver.smtp: . 18800:20248(1448) ack 542 win 7504 15:43:31.867407 IP linux.52292 > mailserver.smtp: P 20248:20704(456) ack 542 win 7504 15:43:32.012416 IP mailserver.smtp > linux.52292: . ack 18800 win 65535 15:43:32.012449 IP linux.52292 > mailserver.smtp: . 20704:22152(1448) ack 542 win 7504 15:43:32.012482 IP linux.52292 > mailserver.smtp: . 22152:23600(1448) ack 542 win 7504 15:43:32.079207 IP mailserver.smtp > linux.52292: . ack 20248 win 65535 15:43:32.079251 IP linux.52292 > mailserver.smtp: . 23600:25048(1448) ack 542 win 7504 15:43:32.224314 IP mailserver.smtp > linux.52292: . ack 23600 win 65535 15:43:32.224348 IP linux.52292 > mailserver.smtp: . 25048:26496(1448) ack 542 win 7504 15:43:32.224382 IP linux.52292 > mailserver.smtp: . 26496:27944(1448) ack 542 win 7504 15:43:32.224411 IP linux.52292 > mailserver.smtp: P 27944:28896(952) ack 542 win 7504 15:43:32.289328 IP mailserver.smtp > linux.52292: . ack 25048 win 65535 15:43:32.289348 IP linux.52292 > mailserver.smtp: . 28896:30344(1448) ack 542 win 7504 15:43:32.415220 IP mailserver.smtp > linux.52292: . ack 27944 win 65535 15:43:32.415242 IP linux.52292 > mailserver.smtp: . 30344:31792(1448) ack 542 win 7504 15:43:32.415284 IP linux.52292 > mailserver.smtp: . 31792:33240(1448) ack 542 win 7504 15:43:32.499213 IP mailserver.smtp > linux.52292: . ack 28896 win 65535 15:43:32.499234 IP linux.52292 > mailserver.smtp: . 33240:34688(1448) ack 542 win 7504 15:43:32.590355 IP mailserver.smtp > linux.52292: . ack 31792 win 65535 15:43:32.590376 IP linux.52292 > mailserver.smtp: . 34688:36136(1448) ack 542 win 7504 15:43:32.590408 IP linux.52292 > mailserver.smtp: P 36136:37088(952) ack 542 win 7504 15:43:32.708128 IP mailserver.smtp > linux.52292: . ack 33240 win 65535 15:43:32.708151 IP linux.52292 > mailserver.smtp: . 37088:38536(1448) ack 542 win 7504 15:43:32.789176 IP mailserver.smtp > linux.52292: . ack 36136 win 65535 15:43:32.789201 IP linux.52292 > mailserver.smtp: . 38536:39984(1448) ack 542 win 7504 15:43:32.789245 IP linux.52292 > mailserver.smtp: . 39984:41432(1448) ack 542 win 7504 15:43:32.915336 IP mailserver.smtp > linux.52292: . ack 38536 win 65535 15:43:32.915364 IP linux.52292 > mailserver.smtp: . 41432:42880(1448) ack 542 win 7504 15:43:32.915395 IP linux.52292 > mailserver.smtp: . 42880:44328(1448) ack 542 win 7504 15:43:33.024198 IP mailserver.smtp > linux.52292: . ack 41432 win 65535 15:43:33.024237 IP linux.52292 > mailserver.smtp: . 44328:45776(1448) ack 542 win 7504 15:43:33.024271 IP linux.52292 > mailserver.smtp: . 45776:47224(1448) ack 542 win 7504 15:43:33.024300 IP linux.52292 > mailserver.smtp: . 47224:48672(1448) ack 542 win 7504 15:43:33.119325 IP mailserver.smtp > linux.52292: . ack 42880 win 65535 15:43:33.119367 IP linux.52292 > mailserver.smtp: P 48672:49376(704) ack 542 win 7504 15:43:33.226202 IP mailserver.smtp > linux.52292: . ack 45776 win 65535 15:43:33.226225 IP linux.52292 > mailserver.smtp: . 49376:50824(1448) ack 542 win 7504 15:43:33.226260 IP linux.52292 > mailserver.smtp: . 50824:52272(1448) ack 542 win 7504 15:43:33.440286 IP mailserver.smtp > linux.52292: . ack 47224 win 65535 15:43:33.440320 IP linux.52292 > mailserver.smtp: . 52272:53720(1448) ack 542 win 7504 15:43:33.440403 IP mailserver.smtp > linux.52292: . ack 50824 win 65535 15:43:33.440420 IP linux.52292 > mailserver.smtp: . 53720:55168(1448) ack 542 win 7504 15:43:33.440453 IP linux.52292 > mailserver.smtp: . 55168:56616(1448) ack 542 win 7504 15:43:33.440477 IP linux.52292 > mailserver.smtp: P 56616:57568(952) ack 542 win 7504 15:43:33.539092 IP mailserver.smtp > linux.52292: . ack 52272 win 65535 15:43:33.539132 IP linux.52292 > mailserver.smtp: . 57568:59016(1448) ack 542 win 7504 15:43:33.621109 IP mailserver.smtp > linux.52292: . ack 55168 win 65535 15:43:33.621153 IP linux.52292 > mailserver.smtp: . 59016:60464(1448) ack 542 win 7504 15:43:33.621201 IP linux.52292 > mailserver.smtp: . 60464:61912(1448) ack 542 win 7504 15:43:33.749209 IP mailserver.smtp > linux.52292: . ack 57568 win 65535 15:43:33.749232 IP linux.52292 > mailserver.smtp: . 61912:63360(1448) ack 542 win 7504 15:43:33.749262 IP linux.52292 > mailserver.smtp: . 63360:64808(1448) ack 542 win 7504 15:43:33.865258 IP mailserver.smtp > linux.52292: . ack 60464 win 65535 15:43:33.865301 IP linux.52292 > mailserver.smtp: . 64808:66256(1448) ack 542 win 7504 15:43:33.865336 IP linux.52292 > mailserver.smtp: . 66256:67704(1448) ack 542 win 7504 15:43:33.959118 IP mailserver.smtp > linux.52292: . ack 61912 win 65535 15:43:33.959158 IP linux.52292 > mailserver.smtp: . 67704:69152(1448) ack 542 win 7504 15:43:34.608440 IP linux.52292 > mailserver.smtp: . 61912:63360(1448) ack 542 win 7504 15:43:34.729935 IP mailserver.smtp > linux.52292: . ack 64808 win 65535 15:43:34.729964 IP linux.52292 > mailserver.smtp: . 69152:70600(1448) ack 542 win 7504 ... 15:43:45.324086 IP linux.52292 > mailserver.smtp: . 349832:351280(1448) ack 542 win 7504 15:43:45.324117 IP linux.52292 > mailserver.smtp: . 351280:352728(1448) ack 542 win 7504 15:43:45.324146 IP linux.52292 > mailserver.smtp: . 352728:354176(1448) ack 542 win 7504 15:43:45.445990 IP mailserver.smtp > linux.52292: . ack 303824 win 65535 15:43:45.446010 IP linux.52292 > mailserver.smtp: . 354176:355624(1448) ack 542 win 7504 15:43:45.446060 IP linux.52292 > mailserver.smtp: . 355624:357072(1448) ack 542 win 7504 15:43:45.569904 IP mailserver.smtp > linux.52292: . ack 306720 win 65535 15:43:45.569928 IP linux.52292 > mailserver.smtp: . 357072:358520(1448) ack 542 win 7504 15:43:45.569960 IP linux.52292 > mailserver.smtp: . 358520:359968(1448) ack 542 win 7504 15:43:45.569989 IP linux.52292 > mailserver.smtp: . 359968:361416(1448) ack 542 win 7504 15:43:45.667941 IP mailserver.smtp > linux.52292: . ack 308168 win 65535 15:43:45.667964 IP linux.52292 > mailserver.smtp: . 361416:362864(1448) ack 542 win 7504 15:43:45.752939 IP mailserver.smtp > linux.52292: . ack 311064 win 65535 15:43:45.752958 IP linux.52292 > mailserver.smtp: . 362864:364312(1448) ack 542 win 7504 15:43:45.752989 IP linux.52292 > mailserver.smtp: . 364312:365760(1448) ack 542 win 7504 15:43:45.753018 IP linux.52292 > mailserver.smtp: . 365760:367208(1448) ack 542 win 7504 15:43:45.872905 IP mailserver.smtp > linux.52292: . ack 313960 win 65535 15:43:45.872930 IP linux.52292 > mailserver.smtp: . 367208:368656(1448) ack 542 win 7504 15:43:45.872962 IP linux.52292 > mailserver.smtp: . 368656:370104(1448) ack 542 win 7504 15:43:45.993879 IP mailserver.smtp > linux.52292: . ack 316856 win 65535 15:43:45.993913 IP linux.52292 > mailserver.smtp: . 370104:371552(1448) ack 542 win 7504 15:43:45.993946 IP linux.52292 > mailserver.smtp: . 371552:373000(1448) ack 542 win 7504 15:43:45.993975 IP linux.52292 > mailserver.smtp: . 373000:374448(1448) ack 542 win 7504 15:43:46.088019 IP mailserver.smtp > linux.52292: . ack 318304 win 65535 15:43:46.088081 IP linux.52292 > mailserver.smtp: . 374448:375896(1448) ack 542 win 7504 15:43:46.176935 IP mailserver.smtp > linux.52292: . ack 321200 win 65535 15:43:46.176972 IP linux.52292 > mailserver.smtp: . 375896:377344(1448) ack 542 win 7504 15:43:46.177007 IP linux.52292 > mailserver.smtp: . 377344:378792(1448) ack 542 win 7504 15:43:46.177035 IP linux.52292 > mailserver.smtp: . 378792:380240(1448) ack 542 win 7504 15:43:46.289744 IP mailserver.smtp > linux.52292: . ack 322648 win 65535 15:43:46.289766 IP linux.52292 > mailserver.smtp: . 380240:381688(1448) ack 542 win 7504 15:43:46.357708 IP mailserver.smtp > linux.52292: . ack 325544 win 65535 15:43:46.357725 IP linux.52292 > mailserver.smtp: . 381688:383136(1448) ack 542 win 7504 15:43:46.357756 IP linux.52292 > mailserver.smtp: . 383136:384584(1448) ack 542 win 7504 15:43:46.478932 IP mailserver.smtp > linux.52292: . ack 328440 win 65535 15:43:46.478953 IP linux.52292 > mailserver.smtp: . 384584:386032(1448) ack 542 win 7504 15:43:46.478983 IP linux.52292 > mailserver.smtp: . 386032:387480(1448) ack 542 win 7504 15:43:46.479011 IP linux.52292 > mailserver.smtp: . 387480:388928(1448) ack 542 win 7504 15:43:46.599901 IP mailserver.smtp > linux.52292: . ack 331336 win 65535 15:43:46.599925 IP linux.52292 > mailserver.smtp: . 388928:390376(1448) ack 542 win 7504 15:43:46.599957 IP linux.52292 > mailserver.smtp: . 390376:391824(1448) ack 542 win 7504 15:43:46.706829 IP mailserver.smtp > linux.52292: . ack 332784 win 65535 15:43:46.706849 IP linux.52292 > mailserver.smtp: . 391824:393272(1448) ack 542 win 7504 15:43:46.781694 IP mailserver.smtp > linux.52292: . ack 335680 win 65535 15:43:46.781710 IP linux.52292 > mailserver.smtp: . 393272:394720(1448) ack 542 win 7504 15:43:46.781740 IP linux.52292 > mailserver.smtp: . 394720:396168(1448) ack 542 win 7504 15:43:46.781768 IP linux.52292 > mailserver.smtp: . 396168:397616(1448) ack 542 win 7504 15:43:46.901696 IP mailserver.smtp > linux.52292: . ack 338576 win 65535 15:43:46.901723 IP linux.52292 > mailserver.smtp: . 397616:399064(1448) ack 542 win 7504 15:43:46.901756 IP linux.52292 > mailserver.smtp: . 399064:400512(1448) ack 542 win 7504 15:43:47.023889 IP mailserver.smtp > linux.52292: . ack 341472 win 65535 15:43:47.023914 IP linux.52292 > mailserver.smtp: . 400512:401960(1448) ack 542 win 7504 15:43:47.023946 IP linux.52292 > mailserver.smtp: . 401960:403408(1448) ack 542 win 7504 15:43:47.126865 IP mailserver.smtp > linux.52292: . ack 342920 win 65535 15:43:47.126899 IP linux.52292 > mailserver.smtp: . 403408:404856(1448) ack 542 win 7504 15:43:47.126933 IP linux.52292 > mailserver.smtp: . 404856:406304(1448) ack 542 win 7504 15:43:47.205693 IP mailserver.smtp > linux.52292: . ack 345816 win 65535 15:43:47.205726 IP linux.52292 > mailserver.smtp: . 406304:407752(1448) ack 542 win 7504 15:43:47.205760 IP linux.52292 > mailserver.smtp: . 407752:409200(1448) ack 542 win 7504 15:43:47.337721 IP mailserver.smtp > linux.52292: . ack 348384 win 65535 15:43:47.337745 IP linux.52292 > mailserver.smtp: . 409200:410648(1448) ack 542 win 7504 15:43:47.337778 IP linux.52292 > mailserver.smtp: . 410648:412096(1448) ack 542 win 7504 15:43:47.434787 IP mailserver.smtp > linux.52292: . ack 351280 win 65535 15:43:47.434808 IP linux.52292 > mailserver.smtp: . 412096:413544(1448) ack 542 win 7504 15:43:47.434840 IP linux.52292 > mailserver.smtp: . 413544:414992(1448) ack 542 win 7504 15:43:47.434868 IP linux.52292 > mailserver.smtp: . 414992:416440(1448) ack 542 win 7504 15:43:47.547611 IP mailserver.smtp > linux.52292: . ack 352728 win 65535 15:43:47.547635 IP linux.52292 > mailserver.smtp: . 416440:417888(1448) ack 542 win 7504 15:43:47.615600 IP mailserver.smtp > linux.52292: . ack 355624 win 65535 15:43:47.615618 IP linux.52292 > mailserver.smtp: . 417888:419336(1448) ack 542 win 7504 15:43:47.615650 IP linux.52292 > mailserver.smtp: . 419336:420784(1448) ack 542 win 7504 15:43:47.743710 IP mailserver.smtp > linux.52292: . ack 358520 win 65535 15:43:47.743727 IP linux.52292 > mailserver.smtp: . 420784:422232(1448) ack 542 win 7504 15:43:47.743759 IP linux.52292 > mailserver.smtp: . 422232:423680(1448) ack 542 win 7504 15:43:47.870591 IP mailserver.smtp > linux.52292: . ack 361416 win 65535 15:43:47.870613 IP linux.52292 > mailserver.smtp: . 423680:425128(1448) ack 542 win 7504 15:43:47.870644 IP linux.52292 > mailserver.smtp: . 425128:426576(1448) ack 542 win 7504 15:43:47.967659 IP mailserver.smtp > linux.52292: . ack 362864 win 65535 15:43:47.967695 IP linux.52292 > mailserver.smtp: . 426576:428024(1448) ack 542 win 7504 15:43:48.054611 IP mailserver.smtp > linux.52292: . ack 365760 win 65535 15:43:48.054643 IP linux.52292 > mailserver.smtp: . 428024:429472(1448) ack 542 win 7504 15:43:48.054677 IP linux.52292 > mailserver.smtp: . 429472:430920(1448) ack 542 win 7504 15:43:48.177804 IP mailserver.smtp > linux.52292: . ack 368656 win 65535 15:43:48.177845 IP linux.52292 > mailserver.smtp: . 430920:432368(1448) ack 542 win 7504 15:43:48.177879 IP linux.52292 > mailserver.smtp: . 432368:433816(1448) ack 542 win 7504 15:43:48.296768 IP mailserver.smtp > linux.52292: . ack 371552 win 65535 15:43:48.296790 IP linux.52292 > mailserver.smtp: . 433816:435264(1448) ack 542 win 7504 15:43:48.296820 IP linux.52292 > mailserver.smtp: . 435264:436712(1448) ack 542 win 7504 15:43:48.386685 IP mailserver.smtp > linux.52292: . ack 373000 win 65535 15:43:48.386702 IP linux.52292 > mailserver.smtp: . 436712:438160(1448) ack 542 win 7504 15:43:48.478585 IP mailserver.smtp > linux.52292: . ack 375896 win 65535 15:43:48.478605 IP linux.52292 > mailserver.smtp: . 438160:439608(1448) ack 542 win 7504 15:43:48.478636 IP linux.52292 > mailserver.smtp: . 439608:441056(1448) ack 542 win 7504 15:43:48.597559 IP mailserver.smtp > linux.52292: . ack 377344 win 65535 15:43:48.597581 IP linux.52292 > mailserver.smtp: . 441056:442504(1448) ack 542 win 7504 15:43:48.806718 IP mailserver.smtp > linux.52292: . ack 378792 win 65535 15:43:48.806740 IP linux.52292 > mailserver.smtp: . 442504:443952(1448) ack 542 win 7504 15:43:52.032437 IP linux.52292 > mailserver.smtp: . 378792:380240(1448) ack 542 win 7504 15:43:52.153192 IP mailserver.smtp > linux.52292: . ack 383136 win 65535 15:43:52.153228 IP linux.52292 > mailserver.smtp: . 443952:445400(1448) ack 542 win 7504 15:43:52.153262 IP linux.52292 > mailserver.smtp: . 445400:446848(1448) ack 542 win 7504 15:43:52.153362 IP mailserver.smtp > linux.52292: . ack 387480 win 65535 15:43:52.153381 IP linux.52292 > mailserver.smtp: . 446848:448296(1448) ack 542 win 7504 15:43:59.048438 IP linux.52292 > mailserver.smtp: . 387480:388928(1448) ack 542 win 7504 15:43:59.170481 IP mailserver.smtp > linux.52292: . ack 390376 win 65535 15:43:59.170516 IP linux.52292 > mailserver.smtp: . 448296:449744(1448) ack 542 win 7504 15:43:59.170551 IP linux.52292 > mailserver.smtp: . 449744:451192(1448) ack 542 win 7504 15:43:59.172438 IP mailserver.smtp > linux.52292: . ack 394720 win 65535 15:43:59.172457 IP linux.52292 > mailserver.smtp: . 451192:452640(1448) ack 542 win 7504 15:43:59.295693 IP mailserver.smtp > linux.52292: . ack 396168 win 65535 15:44:22.548439 IP linux.52292 > mailserver.smtp: . 396168:397616(1448) ack 542 win 7504 15:44:22.669245 IP mailserver.smtp > linux.52292: . ack 399064 win 65535 15:44:22.669266 IP linux.52292 > mailserver.smtp: . 452640:454088(1448) ack 542 win 7504 15:44:22.669299 IP linux.52292 > mailserver.smtp: . 454088:455536(1448) ack 542 win 7504 15:44:22.669399 IP mailserver.smtp > linux.52292: . ack 404856 win 65535 15:44:22.669427 IP linux.52292 > mailserver.smtp: . 455536:456984(1448) ack 542 win 7504 15:45:15.916435 IP linux.52292 > mailserver.smtp: . 404856:406304(1448) ack 542 win 7504 15:45:16.127748 IP mailserver.smtp > linux.52292: . ack 406304 win 65535 15:45:16.127773 IP linux.52292 > mailserver.smtp: . 456984:458432(1448) ack 542 win 7504 15:45:16.127804 IP linux.52292 > mailserver.smtp: . 458432:459880(1448) ack 542 win 7504 ... The full packet flow can be found in the attached dump. Another mail seems to be transmitted while the first was stalling, but the issue is the same, the timing information shows no active transfer for over 60s. Thomas [-- Attachment #2: smtp.tcpdump.bz2 --] [-- Type: application/x-bzip2, Size: 10663 bytes --] ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-11 14:33 ` Thomas Jarosch @ 2008-07-15 11:47 ` Thomas Jarosch 2008-07-15 16:10 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-15 11:47 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List On Friday, 11. July 2008 16:33:41 Thomas Jarosch wrote: > On Thursday, 10. July 2008 23:21:37 Jozsef Kadlecsik wrote: > > You did not mention the type of your driver. Isn't there some changes in > > the driver code between 2.6.23.17 and 2.6.24 which could cause such > > stallings? > > It's a 8139too and the "same" hardware works fine at other sites. > I tried the nmap trick mentioned by Dâniel Fraga with no noticable > difference. I swapped the NIC with a "via-rhine" based card which is installed in the remote box, but without much success. Luckily I'm able to reproduce the problem locally using an ADSL line from the same provider, so I'll now bisect the kernel from 2.6.23.17 to 2.6.24. Thomas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-15 11:47 ` Thomas Jarosch @ 2008-07-15 16:10 ` Thomas Jarosch 2008-07-15 18:30 ` Dâniel Fraga 2008-07-15 20:17 ` Ilpo Järvinen 0 siblings, 2 replies; 107+ messages in thread From: Thomas Jarosch @ 2008-07-15 16:10 UTC (permalink / raw) To: Jozsef Kadlecsik Cc: netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga > Luckily I'm able to reproduce the problem locally using an ADSL line from > the same provider, so I'll now bisect the kernel from 2.6.23.17 to 2.6.24. After bisecting for hours, l only had ten revisions left to test. There was this commit that caught my eye: ------------------------------ commit c96fd3d461fa495400df24be3b3b66f0e0b152f9 Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Date: Thu Sep 20 11:36:37 2007 -0700 [TCP]: Enable SACK enhanced FRTO (RFC4138) by default ------------------------------ This change sets the value of "tcp_frto" to 2 by default. If I reset it to zero, the connection works immediately. @Dâniel Fraga: Does disabling tcp_frto work for you, too? Disabling tcp_sack makes no difference. To summarize the situation, I had two different cases of stalling TCP connections, both connecting to busy SMTP relays servers which probably drop some packets here and there. I can easily reproduce the problem, so how do we go from here? Cheers, Thomas ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-15 16:10 ` Thomas Jarosch @ 2008-07-15 18:30 ` Dâniel Fraga 2008-07-31 4:47 ` Dâniel Fraga 2008-07-15 20:17 ` Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-07-15 18:30 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List On Tue, 15 Jul 2008 18:10:42 +0200 Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > Disabling tcp_sack makes no difference. To summarize the situation, > I had two different cases of stalling TCP connections, both connecting > to busy SMTP relays servers which probably drop some packets here and there. > > I can easily reproduce the problem, so how do we go from here? I'm using kernel 2.6.26 and the problem has gone. No stalled connections anymore. The problem was with 2.6.25 kernel only. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-15 18:30 ` Dâniel Fraga @ 2008-07-31 4:47 ` Dâniel Fraga 2008-07-31 7:39 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-07-31 4:47 UTC (permalink / raw) To: netfilter-devel; +Cc: netdev On Tue, 15 Jul 2008 15:30:45 -0300 Dâniel Fraga <fragabr@gmail.com> wrote: > I'm using kernel 2.6.26 and the problem has gone. No stalled > connections anymore. The problem was with 2.6.25 kernel only. Sorry, I was wrong. In 2.6.26 the problem remains. Everyday my connection stalls to my NNTP server using 2.6.26 too. Before I just used a "nmap -sS server" and the connection would come back, but now it doesn't work. -- Linux 2.6.26: Rotary Wombat http://u-br.net -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-31 4:47 ` Dâniel Fraga @ 2008-07-31 7:39 ` Ilpo Järvinen 2008-08-02 12:24 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-31 7:39 UTC (permalink / raw) To: Dâniel Fraga; +Cc: netdev, netfilter-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 822 bytes --] On Thu, 31 Jul 2008, Dâniel Fraga wrote: > On Tue, 15 Jul 2008 15:30:45 -0300 > Dâniel Fraga <fragabr@gmail.com> wrote: > > > I'm using kernel 2.6.26 and the problem has gone. No stalled > > connections anymore. The problem was with 2.6.25 kernel only. > > Sorry, I was wrong. In 2.6.26 the problem remains. > > Everyday my connection stalls to my NNTP server using 2.6.26 > too. Before I just used a "nmap -sS server" and the connection would > come back, but now it doesn't work. Tcpdumping it would help some... :-) Can you try the suggested patch if it changes any (though would I have a tcpdump showing the problem, I could probably tell right away if the patch would help or not, and also come up with something else if necessary :-)): http://marc.info/?l=linux-netdev&m=121699478406378&w=2 -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-31 7:39 ` Ilpo Järvinen @ 2008-08-02 12:24 ` Dâniel Fraga 0 siblings, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-08-02 12:24 UTC (permalink / raw) To: Ilpo Järvinen; +Cc: netdev, netfilter-devel On Thu, 31 Jul 2008 10:39:56 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Tcpdumping it would help some... :-) Can you try the suggested patch if it > changes any (though would I have a tcpdump showing the problem, I could > probably tell right away if the patch would help or not, and also come up > with something else if necessary :-)): > > http://marc.info/?l=linux-netdev&m=121699478406378&w=2 Hi, I'm using the patch and I can confirm it solved my problem. No more stalled connections :). Is there any chance this patch can be merged in the kernel? Thank you. -- -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-15 16:10 ` Thomas Jarosch 2008-07-15 18:30 ` Dâniel Fraga @ 2008-07-15 20:17 ` Ilpo Järvinen 2008-07-16 8:07 ` Thomas Jarosch 2008-07-16 9:03 ` Thomas Jarosch 1 sibling, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-15 20:17 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga [-- Attachment #1: Type: TEXT/PLAIN, Size: 1491 bytes --] On Tue, 15 Jul 2008, Thomas Jarosch wrote: > > Luckily I'm able to reproduce the problem locally using an ADSL line from > > the same provider, so I'll now bisect the kernel from 2.6.23.17 to 2.6.24. > > After bisecting for hours, l only had ten revisions left to test. > There was this commit that caught my eye: > > ------------------------------ > commit c96fd3d461fa495400df24be3b3b66f0e0b152f9 > Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> > Date: Thu Sep 20 11:36:37 2007 -0700 > > [TCP]: Enable SACK enhanced FRTO (RFC4138) by default > ------------------------------ > > This change sets the value of "tcp_frto" to 2 by default. > If I reset it to zero, the connection works immediately. > @Dâniel Fraga: Does disabling tcp_frto work for you, too? > > Disabling tcp_sack makes no difference. To summarize the situation, > I had two different cases of stalling TCP connections, both connecting > to busy SMTP relays servers which probably drop some packets here and there. > > I can easily reproduce the problem, so how do we go from here? FRTO in 2.6.24.y is broken, I recently fixed couple of things in FRTO, late 2.6.25.y or 2.6.26 should be used to have all the fixes. If you can reproce with either one, please tcpdump it (I just returned, was couple of weeks away, so I'm slowly catching up what has happened in between here). ...I guess somebody had dumped at least 2.6.24.y but that's not interesting due to known (and fixed) bugs with FRTO. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-15 20:17 ` Ilpo Järvinen @ 2008-07-16 8:07 ` Thomas Jarosch 2008-07-16 9:03 ` Thomas Jarosch 1 sibling, 0 replies; 107+ messages in thread From: Thomas Jarosch @ 2008-07-16 8:07 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jozsef Kadlecsik, netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga Terve Ilpo, On Tuesday, 15. July 2008 22:17:47 Ilpo Järvinen wrote: > > I can easily reproduce the problem, so how do we go from here? > > FRTO in 2.6.24.y is broken, I recently fixed couple of things in FRTO, > late 2.6.25.y or 2.6.26 should be used to have all the fixes. If you can > reproce with either one, please tcpdump it (I just returned, was couple of > weeks away, so I'm slowly catching up what has happened in between here). > ...I guess somebody had dumped at least 2.6.24.y but that's not > interesting due to known (and fixed) bugs with FRTO. I tried 2.6.25.10 without luck. I have a git "master" tree from yesterday which is also stalling for some seconds around ~220kb and then recovering. The connection completly stalls at around ~1.3mb. I'll send you a tcpdump in private soon as it's going to be rather big for the mailinglist. Thomas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-15 20:17 ` Ilpo Järvinen 2008-07-16 8:07 ` Thomas Jarosch @ 2008-07-16 9:03 ` Thomas Jarosch 2008-07-17 13:55 ` Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-16 9:03 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jozsef Kadlecsik, netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga On Tuesday, 15. July 2008 22:17:47 Ilpo Järvinen wrote: > FRTO in 2.6.24.y is broken, I recently fixed couple of things in FRTO, > late 2.6.25.y or 2.6.26 should be used to have all the fixes. If you can > reproce with either one, please tcpdump it As the dumps are really big, I uploaded them to a temporary space. Included are two tcpdumps of stalling connections using git "master". The first one stalls around ~1.3mb, the second one around ~4mb. Get it from here: http://www.intra2net.com/de/download/tcpdump/tcp_frto_tcpdumps.tar.bz2 There is another box in front of my test system doing NAT which is running 2.6.24.7. I've tested with and without tcp_frto on that box to make sure it's not FRTO related. I've also included a tcpdump with FRTO disabled, so you can see the connection is actually working. Just by looking at the packet flow while tracing the connection looks much smoother without FRTO and doesn't stall for seconds here and there. Cheers, Thomas -- Address (better: trap) for people I really don't want to get mail from: jessica.hope@cactusamerica.com -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-16 9:03 ` Thomas Jarosch @ 2008-07-17 13:55 ` Ilpo Järvinen 2008-07-17 15:15 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-17 13:55 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 5125 bytes --] On Wed, 16 Jul 2008, Thomas Jarosch wrote: > On Tuesday, 15. July 2008 22:17:47 Ilpo Järvinen wrote: > > FRTO in 2.6.24.y is broken, I recently fixed couple of things in FRTO, > > late 2.6.25.y or 2.6.26 should be used to have all the fixes. If you can > > reproce with either one, please tcpdump it > > As the dumps are really big, I uploaded them to a temporary space. > Included are two tcpdumps of stalling connections using git "master". > The first one stalls around ~1.3mb, the second one around ~4mb. > > Get it from here: > http://www.intra2net.com/de/download/tcpdump/tcp_frto_tcpdumps.tar.bz2 Thanks for the dumps, it's pretty clear picture now... Also, I read this thread fully today, your note in the initial mail is correct and relevant: "The picture is similar to Sven's issue reported backed in march: Some ACK packets are missing (as if the remote side never sent them)." > There is another box in front of my test system doing NAT > which is running 2.6.24.7. I've tested with and without tcp_frto > on that box to make sure it's not FRTO related. Did you accidently add "not" here? :-) > I've also included a tcpdump with FRTO disabled, so you can see > the connection is actually working. Just by looking at the packet flow > while tracing the connection looks much smoother without FRTO > and doesn't stall for seconds here and there. Yes, but why it happens, let me explain... "A TCP receiver SHOULD send an immediate duplicate ACK when an out- of-order segment arrives." [RFC2581] FRTO is partially built on assumption that the receiver does the right thing (tm), ie., sends duplicate ACKs. But in this case the server for some reason has chosen to ignore this SHOULD here in the standards, which stands for this: "3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course." [RFC2119] It could be that the duplicate ACKs are missing due to bug, misconfiguration or broken middlebox at the provider. This is somewhat similar to the case we worked-around recently with the network printers that do accept data only in-order and just dupack rest. ...I actually predicted this dupACK-less receiver problem back then (not sure if I mentioned it in a mail though) but it seemed like small box problem rather than some big box like mail server problem. It seems hardly a reasonable way to interpret "in particular circumstances" as never send dupACKs (which have other benefits too). Because those duplicate ACKs never arrive for the new data segments FRTO is segment, FRTO never falls back to conventional recovery but RTO expires again for a different segment and FRTO algorithm is retried with the same results. So TCP is basically in RTO loop making slowly progress. If there isn't external timeout, the situation is eventually recovered when all data ACKed by a big cumulative ACK or earlier when a temporary dupACK lossage end (like it should be at worst). It would quite interesting to know more details about the mail server and why the duplicate ACKs are not generated or don't ever reach the sender but I guess the details are out of reach? One option would be to disable reentry to FRTO when some progress was made... Please try with the patch below... It has some non-desirable properties in microbenchmarks but adds robustness, it's not clear to me how often the reentry would benefit in real life scenarios but I'd assume that most RTOs that occur for a later segment are not spurious anyway even when the first was. -- i. -- [PATCH] tcp FRTO: workaround dupACK-less receivers FRTO assumes that dupACKs arrive in-order to fallback into conventional recovery. Some receivers, due to unknown reasons, care not to send duplicate ACKs at all, which seems quite unreasonable because RFC2581 is using SHOULD for ofo segment duplicate ACKs. ...A more likely cause might be some broken middlebox which blocks dupACKs. If no duplicate ACKs arrive, TCP goes into RTO-loop due to FRTO, because only new data is getting sent after the retransmission of the head segment (and its partial ACK). The situation continues until a big cumulative ACK covers all outstanding data. This impacts FRTO accuracy as we lose ability to detect more than one spurious segment per window with NewReno. Performance impact might not be visible unless one sets up an microbenchmark... :-) Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- net/ipv4/tcp_input.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d6ea970..3f7cce9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1714,6 +1714,10 @@ int tcp_use_frto(struct sock *sk) if (tcp_is_sackfrto(tp)) return 1; + /* dupACK-less receiver workaround */ + if (tp->frto_counter > 1) + return 0; + /* Avoid expensive walking of rexmit queue if possible */ if (tp->retrans_out > 1) return 0; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-17 13:55 ` Ilpo Järvinen @ 2008-07-17 15:15 ` Thomas Jarosch 2008-07-17 15:53 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-17 15:15 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller On Thursday, 17. July 2008 15:55:25 Ilpo Järvinen wrote: > It would quite interesting to know more details about the mail server and > why the duplicate ACKs are not generated or don't ever reach the sender > but I guess the details are out of reach? It will be quite difficult to get more details as it's the SMTP relay sever of Germany's biggest ISP. There's a comment about them in Patrick's blog from 2008-06-23 if you are curious ;-) We see the same issue with a MX server from "United Internet". Normally they are pretty accurate about standards (they run GMX), so I guess this must be a problem of a router in between. This is also supported by the fact that 935 of our boxes already updated to kernel 2.6.24.7, yet the problem occured only at three sites and I guess there are more people out there using that SMTP relay server. Could you somehow "probe" the servers to see if they normally send duplicated ACKs by faking/forcing a retransmission? Though I guess this would invole writing some TCP "test" code. > One option would be to disable reentry to FRTO when some progress was > made... Please try with the patch below... Thanks for the patch. It seemed to help a bit. Here are two more traces: http://www.intra2net.com/de/download/tcpdump/tcp_frto_with_patch.tar.bz2 The first connection somehow made it after 400 seconds, the second one stalled and timed out :-( Hope they dumps are useful to you. Chers, Thomas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-17 15:15 ` Thomas Jarosch @ 2008-07-17 15:53 ` Ilpo Järvinen 2008-07-18 9:14 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-17 15:53 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 3969 bytes --] On Thu, 17 Jul 2008, Thomas Jarosch wrote: > On Thursday, 17. July 2008 15:55:25 Ilpo Järvinen wrote: > > It would quite interesting to know more details about the mail server and > > why the duplicate ACKs are not generated or don't ever reach the sender > > but I guess the details are out of reach? > > It will be quite difficult to get more details as it's the SMTP relay sever > of Germany's biggest ISP. There's a comment about them > in Patrick's blog from 2008-06-23 if you are curious ;-) ...I thought so, unless one has some connections they're not that willing, ever :-). > We see the same issue with a MX server from "United Internet". > Normally they are pretty accurate about standards (they run GMX), > so I guess this must be a problem of a router in between. I'd vote for middlebox, e.g., some kind of TCP proxy, split TCP brokeness or misconfigured firewall or such (or perhaps it's just because of some misguided one who have been thought that duplicate ACKs are a serious threat :-))... > Could you somehow "probe" the servers to see if they normally > send duplicated ACKs by faking/forcing a retransmission? > Though I guess this would invole writing some TCP "test" code. Yes, it wouldn't even be that hard to do with hping3. I might actually try to come up with something (but not now). > > One option would be to disable reentry to FRTO when some progress was > > made... Please try with the patch below... > > Thanks for the patch. It seemed to help a bit. Here are two more traces: > http://www.intra2net.com/de/download/tcpdump/tcp_frto_with_patch.tar.bz2 > > The first connection somehow made it after 400 seconds, > the second one stalled and timed out :-( > Hope they dumps are useful to you. Ah, I just forgot that the situation might persist... Try with this one instead... -- i. [PATCH] tcp FRTO: workaround dupACK-less receivers FRTO assumes that dupACKs arrive in-order to fallback into conventional recovery. Some receivers, due to unknown reasons, care not to send duplicate ACKs at all, which seems quite unreasonable because RFC2581 is using SHOULD for ofo segment duplicate ACKs. ...A more likely cause might be some broken middlebox which blocks dupACKs. If no duplicate ACKs arrive, TCP goes into RTO-loop due to FRTO, because only new data is getting sent after the retransmission of the head segment (and its partial ACK). The situation continues until a big cumulative ACK covers all outstanding data (or until somebody gives up). The new approach prevents reentry to FRTO when a previous FRTO recovery is underway. This alone was found inadequate solution because the situation may persist with some receivers even after the first fallback has occured. Thus cover anything in CA_Loss state too. This impacts FRTO accuracy as we lose ability to detect more than one spurious segment per window with NewReno. Performance impact in real world is hard to estimate because it's hard to know how often second RTO would be spurious in practice, however, the worst case behavior will still be as without FRTO so it just reduces the benefits of FRTO. This issue was reported by Thomas Jarosch and probably a number of other people (though there was other case which was a real bug with similar symptoms that was fixed in 2.6.25.7). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> --- net/ipv4/tcp_input.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d6ea970..764c084 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1714,6 +1714,10 @@ int tcp_use_frto(struct sock *sk) if (tcp_is_sackfrto(tp)) return 1; + /* dupACK-less receiver workaround */ + if (tp->frto_counter > 1 || icsk->icsk_ca_state == TCP_CA_Loss) + return 0; + /* Avoid expensive walking of rexmit queue if possible */ if (tp->retrans_out > 1) return 0; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-17 15:53 ` Ilpo Järvinen @ 2008-07-18 9:14 ` Thomas Jarosch 2008-07-18 13:55 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-18 9:14 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller Moi Ilpo, On Thursday, 17. July 2008 17:53:01 Ilpo Järvinen wrote: > > > One option would be to disable reentry to FRTO when some progress was > > > made... Please try with the patch below... > > Ah, I just forgot that the situation might persist... Try with this > one instead... Good news everyone: Two connections made it to the finish line. The bad part: One transfer took four minutes, the other sixteen minutes. A colleague commented it's still much faster than carrying the message by plane ;-) A session without FRTO takes around 84 seconds. I've added debug printks() to every return path in tcp_use_frto(), so you can see what's going on. They look like this: Jul 18 10:20:40 intratest131 kernel: [ 957.318006] tcp_use_frto: ENTER: frto_counter: 0, icsk->icsk_ca_state: 0 Jul 18 10:20:40 intratest131 kernel: [ 957.318011] tcp_use_frto: DEFAULT RETURN 1; Jul 18 10:21:08 intratest131 kernel: [ 984.446006] tcp_use_frto: ENTER: frto_counter: 3, icsk->icsk_ca_state: 0 Jul 18 10:21:08 intratest131 kernel: [ 984.446011] tcp_use_frto: RETURN in "tp->frto_counter > 1 || icsk->icsk_ca_state == TCP_CA_Loss" Jul 18 10:21:14 intratest131 kernel: [ 991.058006] tcp_use_frto: ENTER: frto_counter: 0, icsk->icsk_ca_state: 0 Jul 18 10:21:14 intratest131 kernel: [ 991.058011] tcp_use_frto: DEFAULT RETURN 1; Here are two new dumps and the corresponding debug traces: http://www.intra2net.com/de/download/tcpdump/tcp_frto_second_patch.tar.bz2 Enjoy, Thomas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-18 9:14 ` Thomas Jarosch @ 2008-07-18 13:55 ` Ilpo Järvinen 2008-07-18 14:02 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-18 13:55 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 2681 bytes --] On Fri, 18 Jul 2008, Thomas Jarosch wrote: > On Thursday, 17. July 2008 17:53:01 Ilpo Järvinen wrote: > > > > One option would be to disable reentry to FRTO when some progress was > > > > made... Please try with the patch below... > > > > Ah, I just forgot that the situation might persist... Try with this > > one instead... > > Good news everyone: Two connections made it to the finish line. > > The bad part: One transfer took four minutes, the other sixteen minutes. > A colleague commented it's still much faster than carrying the message > by plane ;-) A session without FRTO takes around 84 seconds. ...I guess if you would limit ssthresh to some small value you might beat that value even without FRTO. > I've added debug printks() to every return path in tcp_use_frto(), > so you can see what's going on. They look like this: > > Jul 18 10:20:40 intratest131 kernel: [ 957.318006] tcp_use_frto: ENTER: frto_counter: 0, icsk->icsk_ca_state: 0 > Jul 18 10:20:40 intratest131 kernel: [ 957.318011] tcp_use_frto: DEFAULT RETURN 1; > Jul 18 10:21:08 intratest131 kernel: [ 984.446006] tcp_use_frto: ENTER: frto_counter: 3, icsk->icsk_ca_state: 0 > Jul 18 10:21:08 intratest131 kernel: [ 984.446011] tcp_use_frto: RETURN in "tp->frto_counter > 1 || icsk->icsk_ca_state == TCP_CA_Loss" > Jul 18 10:21:14 intratest131 kernel: [ 991.058006] tcp_use_frto: ENTER: frto_counter: 0, icsk->icsk_ca_state: 0 > Jul 18 10:21:14 intratest131 kernel: [ 991.058011] tcp_use_frto: DEFAULT RETURN 1; > > Here are two new dumps and the corresponding debug traces: > http://www.intra2net.com/de/download/tcpdump/tcp_frto_second_patch.tar.bz2 It seems that with FRTO the retransmission timeout grows much higher which causes longer delays when things continue by RTO, this might be plainly due to the fact that some timeouts seem indeed spurious, and with FRTO we can take RTT measures out of such. I'll keep digging deeper... The receiver is definately doing something crazy as well, eg.: 6.1.131.56060: . ack 1995587 win 65535 152.31.131.25: . 1998387:1999787(1400) ack 562 win 7504 (DF) 152.31.131.25: . 1999787:2001187(1400) ack 562 win 7504 (DF) 152.31.131.25: . 2001187:2002587(1400) ack 562 win 7504 (DF) 6.1.131.56060: . ack 1995587 win 8192 (DF) 6.1.131.56060: . ack 1996987 win 8192 (DF) 6.1.131.56060: . ack 1996987 win 8192 (DF) 6.1.131.56060: . ack 1996987 win 8192 (DF) ...The receiver shrunk the window here (it's not the only example) :-), though on the bright side, those are duplicate ACKs... :-D Btw, on which kernel you ran these things (I hope it wasn't 2.6.24.7, which has FRTO related bugs anyway that the patches I've sent now won't fix)? -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-18 13:55 ` Ilpo Järvinen @ 2008-07-18 14:02 ` Thomas Jarosch 2008-07-19 7:35 ` Ilpo Järvinen 2008-07-25 10:00 ` Ilpo Järvinen 0 siblings, 2 replies; 107+ messages in thread From: Thomas Jarosch @ 2008-07-18 14:02 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller On Friday, 18. July 2008 15:55:22 Ilpo Järvinen wrote: > Btw, on which kernel you ran these things (I hope it wasn't 2.6.24.7, > which has FRTO related bugs anyway that the patches I've sent now won't > fix)? It's the git "master" tree from two days ago, so it should be 2.6.27-pre. Like I wrote before, there's another box doing NAT in front of it running 2.6.24.7. FRTO is disabled on that box. Hope that helps a bit. Thomas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-18 14:02 ` Thomas Jarosch @ 2008-07-19 7:35 ` Ilpo Järvinen 2008-07-25 10:00 ` Ilpo Järvinen 1 sibling, 0 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-19 7:35 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 1434 bytes --] On Fri, 18 Jul 2008, Thomas Jarosch wrote: > On Friday, 18. July 2008 15:55:22 Ilpo Järvinen wrote: > > Btw, on which kernel you ran these things (I hope it wasn't 2.6.24.7, > > which has FRTO related bugs anyway that the patches I've sent now won't > > fix)? > > It's the git "master" tree from two days ago, so it should be 2.6.27-pre. > Like I wrote before, there's another box doing NAT in front of it running > 2.6.24.7. FRTO is disabled on that box. Hope that helps a bit. Hmm, those were spurious RTOs indeed or a sign of perverted TCP "proxy" (or whatever they call them), longest delay spike I've found so far is this: 11:27:28.454827 172.16.1.131.56060 > 80.152.31.131.25: . 3989187:3990587(1400) ... 11:28:00.188835 80.152.31.131.25 > 172.16.1.131.56060: . ack 3990587 win 65535 That's 32 seconds? :-D What should TCP do with that :-) ...disregard that measurement because some other TCP variant would not be able to use the same measurement due to ambiguity problem(?), I don't think so... It seems that non-FRTO TCP just misses those signs and acts _too_ aggressively ;-), which is well known to happen when spurious RTO occurs. ...Also, those duplicate ACKs I pointed out earlier are a sign of unnecessary retransmissions (they occur both with and without FRTO). I actually doubt you have any real losses there, I'll probably next calculate RTTs based on that assumption in the non-FRTO dump too... -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-18 14:02 ` Thomas Jarosch 2008-07-19 7:35 ` Ilpo Järvinen @ 2008-07-25 10:00 ` Ilpo Järvinen 2008-07-25 13:00 ` Thomas Jarosch 1 sibling, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-25 10:00 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 6382 bytes --] On Fri, 18 Jul 2008, Thomas Jarosch wrote: > On Friday, 18. July 2008 15:55:22 Ilpo Järvinen wrote: > > Btw, on which kernel you ran these things (I hope it wasn't 2.6.24.7, > > which has FRTO related bugs anyway that the patches I've sent now won't > > fix)? > > It's the git "master" tree from two days ago, so it should be 2.6.27-pre. > Like I wrote before, there's another box doing NAT in front of it running > 2.6.24.7. FRTO is disabled on that box. Hope that helps a bit. Ok. I looked more into it, there indeed is a large number of spurious RTOs with extremely large round-trip times, though I suspect they occur due to some broken hw/cfg or whatever rather than due to a real wire+queueing delays, and that some external event is required to get things going again with it/them... but that's purely speculation since we don't know about the isp's stuff... :-) Here are some example time-seqno graphs, the second was includes the first one in the lower left corner: http://www.cs.helsinki.fi/u/ijjarvin/tcp/bigrto1.jpg http://www.cs.helsinki.fi/u/ijjarvin/tcp/bigrto2.jpg Larger boxes - data packets Smaller boxes - ACKs (& receiver's advertized window) ...both are connected with lines in time order for easier tracking RTOs occur when the data transfer line falls down, if there is more than one cumulative (advancing ACK) with FRTO sending pattern (ie., when there are two new datas following the retransmission) following the retransmission, it basically means that the original data segments made it through, and in the extreme cases it was sent much earlier!!! The longest round-trips are around 50 seconds in there. These increasing RTT measurements cause tp->rttvar to grow exponentially per each spurious RTO, which is very good to avoid spurious RTOs in future but obviously breaks down if future progress is also bound to actually triggering those RTOs ...I bet we could measure any desired value for RTT with those servers... except there's the application level timeout on the way... :-) Could you try if the patch below helps any... -- i. [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround Hmm, it wasn't non-dup ACKing receiver, there were dupACKs when an unnecessary retransmission was made (though those ACKs revoke a part of the advertized window, which is strange enough in itself :-)). 2nd try: This is probably due to some broken middlebox but that's purely speculation since the details of the not named ISP's (you can find some hint in Patrick's blog though ;-)) equipment are not available to us. It seems that we will have to consciously attempt to violate packet conservation principle and do a spammy go-back-n in case there's a middlebox using split TCPish approach by waiting an arrival of TCP layer retransmission and then doing an in-order delivery (basically violates end-to-end semantics of a TCP connection). I.e., the proxy intentionally reorders segment by _any_ amount (well, there's some upper limit based on the advertized window I guess), it's ridiculously fragile approach... Such middleboxes basically mean two things: First, any measured RTT value when a loss occurred is entirely bogus, yet all indication of the existance of that loss is hidden intentionally, so the correct operation basically depends on ambiguity problem and the inability to measure RTTs during it. Secondly, a timely feedback from network is non-existing, ie., no fast recovery & friends... This goodbye for RFC2581 clearly signifies that such way of behavior is contradicting some very fundamental assumptions a standard TCP is allowed to make about the network, would the RFC2581 stuff work, also FRTO would work. ...Finally I see something which resembles something as pre-historic as TCP Tahoe (in the real world) :-). FRTO assumes reordering is relatively rare thing, but this middlebox has decided to _always_ reorder the key segments FRTO depends on... Thus FRTO makes "wrong" decision and declares the RTO spurious, which is not in fact wrong at all because the receiver probably received the segments in that order (or at least its TCP layer did) and clearly indicates it by the cumulative ACK pattern. A cumulative ACK for a not retransmitted range basically means that one of those segments just arrived, in this case it's after ridiculous RTT, even 50 seconds were measured in practice!! As a result, tp->rttvar flies to outer space when exponentially increasing RTTs get sampled. But this increase is much desired, in general, to avoid future RTOs would the real RTT really grow that fast. The workaround prevents reentry to FRTO when a previous FRTO recovery occurred within the last window (though multiple RTOs for a single segment are still allowed to go into FRTO each time). This workaround impacts FRTO accuracy as we lose ability to detect more than one spurious segment per window. We just consciously violate packet conservation principle by retransmitting unnecessarily to make rest of the high RTT readings ambiguous and that's it... :-) Though even go-back-N as fallback this won't guarantee anything if we're just unlucky because RTTs we measure can still grow if losses occur too frequently so that period in between is not enough to lower RTT estimation :-). In contrast, non-FRTO TCP can always happily ignore high RTT readings because of the ambiguity problem, ie., by violating packet conservation principle by design :-). I'm not that sure if this is worthwhile modification to the kernel due to the reasons that are explained above. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> --- net/ipv4/tcp_input.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1f5e604..2a7528c 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1721,6 +1721,13 @@ int tcp_use_frto(struct sock *sk) if (tcp_is_sackfrto(tp)) return 1; + /* in-order-only "TCP proxy" fragility workaround, spam by go-back-n, + * ie., consciously attempt to violate packet conservation principle + * to cover every loss in the outstanding window on a single RTT + */ + if (!tp->frto_counter && tp->frto_highmark) + return 0; + /* Avoid expensive walking of rexmit queue if possible */ if (tp->retrans_out > 1) return 0; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-25 10:00 ` Ilpo Järvinen @ 2008-07-25 13:00 ` Thomas Jarosch 2008-07-25 14:06 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-25 13:00 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller Ilpo, On Friday, 25. July 2008 12:00:29 Ilpo Järvinen wrote: > [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround The latest patch works quite good. I accidentally had your previous patch applied, too, which gave even better results. Though I don't know enough about the gory details of FRTO if this effectivly disables it... Here are two fresh tcpdumps, one with the last patch only and one which also includes your previous patch: http://www.intra2net.com/de/download/tcpdump/tcp_frto_highmark_patch.tar.bz2 > ... > This is probably due to some broken middlebox but that's purely > speculation since the details of the not named ISP's (you can > find some hint in Patrick's blog though ;-)) equipment are not > available to us. LOL, this reminds me about the post on kernel.org from 2007-03-01: "Kudos ... to Hewlett Packard for building a machine that can take the beating of an unnamed shipping company and keep on ticking". Just think of "unnamed" while looking at the images of the broken server ;-) Have a nice weekend, Thomas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-25 13:00 ` Thomas Jarosch @ 2008-07-25 14:06 ` Ilpo Järvinen 2008-07-25 15:34 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-07-25 14:06 UTC (permalink / raw) To: Thomas Jarosch Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 6992 bytes --] On Fri, 25 Jul 2008, Thomas Jarosch wrote: > On Friday, 25. July 2008 12:00:29 Ilpo Järvinen wrote: > > [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround > > The latest patch works quite good. I accidentally had your > previous patch applied, too, which gave even better results. > Though I don't know enough about the gory details of FRTO > if this effectivly disables it... Indeed, it seems that with the earlier patch (or at least part of it) one can achieve even better performance, though limiting sending window would probably be the most efficient way to communicate through the middlebox to avoid capacity waste that is going on whole the time due to it. This patch alone could occassionally leave TCP hanging until a new RTO occurs when it has already gotten the first ACK after RTO (but the second is not coming until we kick the middlebox again by retransmitting the missing segment). But other than that, it worked as expected and solved many of the situations... I guess the patch below would be enough in itself to create the desired effect (though "desired" is hardly a negative enough word to describe a workaround of this kind). Currently the workaround is only for SACKless TCP, though I guess there could be some "engineers" around who could without a doubt design a system which allows negotiating SACK, yet, doing all delivery in-order... :-) I think SACKless is enough though this same problem could occur with SACK too but that's not as likely as without SACK. Funny, the violation of packet conservation principle leads to another queue overflow (as often expected) in more than half of the cases and therefore another RTO is needed... :-) There is a new things in the logs too (I didn't study all details of the earlier ones so I might have missed them in there), probably signs about link-layer retransmissions... and that "notch" in advertized window is hilarious... :-) Some statistics; unnecessary retransmissions (%, n), packets, filename: 0.0000 0 3026 stalling2 0.0000 0 698 stalling1 2.2693 137 6037 smtp_slooow 3.4316 221 6440 smtp_sixteen_minutes 4.3833 284 6479 smtp_worked_but_stalling_here_and_there 4.8030 50 1041 smtp_stalled 5.2868 340 6431 smtp_highmark_and_TCP_CA_Loss 6.0382 392 6492 smtp_highmark_only 6.8752 435 6327 working_no_frto Ie., in the worst case 6.8% of your link's capacity was wasted during the transfer due to inefficiency cause by that middlebox, not counting the under-utilization that occurs both because of a small window or a wait for RTOs, not bad result at all... :-D Try the patch below (alone) which should be close to the behavior of the both patches put together. -- i. [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround Hmm, it wasn't non-dup ACKing receiver, there were dupACKs when an unnecessary retransmission was made (though those ACKs revoke a part of the advertized window, which is strange enough in itself :-)). 2nd try: This is probably due to some broken middlebox but that's purely speculation since the details of the not named ISP's (you can find some hint in Patrick's blog though ;-)) equipment are not available to us. It seems that we will have to consciously attempt to violate packet conservation principle and do a spammy go-back-n in case there's a middlebox using split TCPish approach by waiting an arrival of TCP layer retransmission and then doing an in-order delivery (basically violates end-to-end semantics of a TCP connection). I.e., the proxy intentionally reorders segment by _any_ amount (well, there's some upper limit based on the advertized window I guess), it's ridiculously fragile approach... Such middleboxes basically mean two things: First, any measured RTT value when a loss occurred is entirely bogus, yet all indication of the existance of that loss is hidden intentionally, so the correct operation basically depends on ambiguity problem and the inability to measure RTTs during it. Secondly, a timely feedback from network is non-existing, ie., no fast recovery & friends... This goodbye for RFC2581 clearly signifies that such way of behavior is contradicting some very fundamental assumptions a standard TCP is allowed to make about the network, would the RFC2581 stuff work, also FRTO would work. ...Finally I see something which resembles something as pre-historic as TCP Tahoe (in the real world) :-). FRTO assumes reordering is relatively rare thing, but this middlebox has decided to _always_ reorder the key segments FRTO depends on... Thus FRTO makes "wrong" decision and declares the RTO spurious, which is not in fact wrong at all because the receiver probably received the segments in that order (or at least its TCP layer did) and clearly indicates it by the cumulative ACK pattern. A cumulative ACK for a not retransmitted range basically means that one of those segments just arrived, in this case it's after ridiculous RTT, even 50 seconds were measured in practice!! As a result, tp->rttvar flies to outer space when exponentially increasing RTTs get sampled. But this increase is much desired, in general, to avoid future RTOs would the real RTT really grow that fast. The workaround prevents reentry to FRTO when a previous FRTO recovery occurred within the last window (though multiple RTOs for a single segment are still allowed to go into FRTO each time). This workaround impacts FRTO accuracy as we lose ability to detect more than one spurious segment per window. We just consciously violate packet conservation principle by retransmitting unnecessarily to make rest of the high RTT readings ambiguous and that's it... :-) Though even go-back-N as fallback this won't guarantee anything if we're just unlucky because RTTs we measure can still grow if losses occur too frequently so that period in between is not enough to lower RTT estimation :-). In contrast, non-FRTO TCP can always happily ignore high RTT readings because of the ambiguity problem, ie., by violating packet conservation principle by design :-). I'm not that sure if this is worthwhile modification to the kernel due to the reasons that are explained above. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> --- net/ipv4/tcp_input.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 75efd24..314bd55 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1721,6 +1721,13 @@ int tcp_use_frto(struct sock *sk) if (tcp_is_sackfrto(tp)) return 1; + /* in-order-only "TCP proxy" fragility workaround, spam by go-back-n, + * ie., consciously attempt to violate packet conservation principle + * to cover every loss in the outstanding window on a single RTT + */ + if (tp->frto_counter != 1 && tp->frto_highmark) + return 0; + /* Avoid expensive walking of rexmit queue if possible */ if (tp->retrans_out > 1) return 0; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-25 14:06 ` Ilpo Järvinen @ 2008-07-25 15:34 ` Thomas Jarosch 2008-07-31 7:39 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-25 15:34 UTC (permalink / raw) To: Ilpo Järvinen Cc: Jozsef Kadlecsik, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Dâniel Fraga, David Miller Ilpo, On Friday, 25. July 2008 16:06:04 Ilpo Järvinen wrote: > I guess the patch below would be enough in itself to create the desired > effect (though "desired" is hardly a negative enough word to describe a > workaround of this kind). Yeah, the result feels good enough for me. Here's the latest tcpdump before I run out of good filenames for the dumps: http://www.intra2net.com/de/download/tcpdump/tcp_frto_combined_patch.tar.bz2 > Ie., in the worst case 6.8% of your link's capacity was wasted during the > transfer due to inefficiency cause by that middlebox, not counting the > under-utilization that occurs both because of a small window or a wait for > RTOs, not bad result at all... :-D IIRC our outbound box does traffic shaping, so some percents are to be accounted to packets being dropped to slow down the connection a bit if they come (in) too fast. Anyway, we now just have to flip a coin if this gets into the kernel or not :-) I really hope this could save someone from doing the same debug session all over again... Thanks for the hard work you put into debugging this. Thomas ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-25 15:34 ` Thomas Jarosch @ 2008-07-31 7:39 ` Thomas Jarosch 2008-07-31 12:44 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Thomas Jarosch @ 2008-07-31 7:39 UTC (permalink / raw) To: Dâniel Fraga Cc: Ilpo Järvinen, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller Hi Dâniel, On Thursday, 31. July 2008 06:47:38 you wrote: > On Tue, 15 Jul 2008 15:30:45 -0300 > > Dâniel Fraga <fragabr@gmail.com> wrote: > > I'm using kernel 2.6.26 and the problem has gone. No stalled > > connections anymore. The problem was with 2.6.25 kernel only. > > Sorry, I was wrong. In 2.6.26 the problem remains. > > Everyday my connection stalls to my NNTP server using 2.6.26 > too. Before I just used a "nmap -sS server" and the connection would > come back, but now it doesn't work. Ok. Please try the latest patch Ilpo CC:ed you. Here's a link to the post: http://marc.info/?l=linux-netdev&m=121699478406378&w=2 Where is the NNTP server located? At your provider? Thomas ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-31 7:39 ` Thomas Jarosch @ 2008-07-31 12:44 ` Dâniel Fraga 2008-07-31 13:47 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-07-31 12:44 UTC (permalink / raw) To: Thomas Jarosch Cc: Ilpo Järvinen, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Thu, 31 Jul 2008 09:39:40 +0200 Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > Ok. Please try the latest patch Ilpo CC:ed you. Here's a link to the post: > http://marc.info/?l=linux-netdev&m=121699478406378&w=2 Before I try could this issue be related to some of these kernel parameters? echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts echo 0 > /proc/sys/net/ipv4/conf/all/accept_source_route echo 0 > /proc/sys/net/ipv4/conf/all/accept_redirects echo 1 > /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses I ask it because I decided to comment these lines (on my NATted desktop and on the server) and until now I don't have the problem anymore. But I'll keep testing all day and if the problem comes back I'll try the patch ok? > Where is the NNTP server located? At your provider? It's my nntp server: nntp://news.abusar.org You can post test messages on grupo "u-br.teste". But there's an issue. My connection was stalled mainly when I ran some application with sudo (for example fetchnews etc). Then I'd do an nmap -sS and the connection would come back alive. Sometimes it would be necessary a nmap on my desktop (local machine) and sometimes on the server (news.abusar.org). -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-31 12:44 ` Dâniel Fraga @ 2008-07-31 13:47 ` Thomas Jarosch 2008-07-31 14:11 ` Dâniel Fraga 2008-08-06 18:53 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Thomas Jarosch @ 2008-07-31 13:47 UTC (permalink / raw) To: Dâniel Fraga Cc: Ilpo Järvinen, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Thursday, 31. July 2008 14:44:36 Dâniel Fraga wrote: > > Ok. Please try the latest patch Ilpo CC:ed you. Here's a link to the > > post: http://marc.info/?l=linux-netdev&m=121699478406378&w=2 > > Before I try could this issue be related to some of these > kernel parameters? If your problem is really FRTO related (that what the patch is for), you could try to disable FRTO temporarily: echo 0 > /proc/sys/net/ipv4/tcp_frto > > Where is the NNTP server located? At your provider? > > It's my nntp server: > > nntp://news.abusar.org > > You can post test messages on grupo "u-br.teste". Nice, so Ilpo can "test" (=bombard) it with big messages ;-) Thomas -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-31 13:47 ` Thomas Jarosch @ 2008-07-31 14:11 ` Dâniel Fraga 2008-08-06 18:53 ` Dâniel Fraga 1 sibling, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-07-31 14:11 UTC (permalink / raw) To: Thomas Jarosch Cc: Ilpo Järvinen, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Thu, 31 Jul 2008 15:47:55 +0200 Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > echo 0 > /proc/sys/net/ipv4/tcp_frto Ok, i'm testing here. If I have any conclusions I'll return. > Nice, so Ilpo can "test" (=bombard) it with big messages ;-) ehehe no problem. As long as you post on u-br.teste, you can do whatever tests you want ;) But I'm not completely sure my problem is tcp_frto related, since sometimes it just happened when I "sudo" some program... I'll keep investigating it. Thanks. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-07-31 13:47 ` Thomas Jarosch 2008-07-31 14:11 ` Dâniel Fraga @ 2008-08-06 18:53 ` Dâniel Fraga 2008-08-07 6:54 ` Ilpo Järvinen 2008-08-07 11:33 ` [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround Ilpo Järvinen 1 sibling, 2 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-08-06 18:53 UTC (permalink / raw) To: Thomas Jarosch Cc: Ilpo Järvinen, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Thu, 31 Jul 2008 15:47:55 +0200 Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > If your problem is really FRTO related (that what the patch is for), > you could try to disable FRTO temporarily: Hi, the patch helped, but what's the conclusion? Is the problem "solved"? Will this patch be merged in the next kernel? This thread seems to be forgotten. Thank you. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-08-06 18:53 ` Dâniel Fraga @ 2008-08-07 6:54 ` Ilpo Järvinen 2008-08-07 11:50 ` Denys Fedoryshchenko 2008-08-07 11:33 ` [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-07 6:54 UTC (permalink / raw) To: Dâniel Fraga Cc: Thomas Jarosch, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 712 bytes --] On Wed, 6 Aug 2008, Dâniel Fraga wrote: > On Thu, 31 Jul 2008 15:47:55 +0200 > Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > > > If your problem is really FRTO related (that what the patch is for), > > you could try to disable FRTO temporarily: > > Hi, the patch helped, but what's the conclusion? Is the problem > "solved"? Will this patch be merged in the next kernel? This thread > seems to be forgotten. I was yesterday preparing the patch description by adding some more thoughts to it (as if there weren't enough already) but didn't yet send it with new cover (to sort of notify davem). I give no guarantees about the _next_ kernel but some 2.6.26.y and 2.6.27 is more likely bet. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-08-07 6:54 ` Ilpo Järvinen @ 2008-08-07 11:50 ` Denys Fedoryshchenko 2008-08-07 12:11 ` Thomas Jarosch 2008-08-07 12:14 ` Ilpo Järvinen 0 siblings, 2 replies; 107+ messages in thread From: Denys Fedoryshchenko @ 2008-08-07 11:50 UTC (permalink / raw) To: Ilpo Järvinen Cc: Dâniel Fraga, Thomas Jarosch, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Thursday 07 August 2008, Ilpo Järvinen wrote: > On Wed, 6 Aug 2008, Dâniel Fraga wrote: > > On Thu, 31 Jul 2008 15:47:55 +0200 > > > > Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > > > If your problem is really FRTO related (that what the patch is for), > > > you could try to disable FRTO temporarily: > > > > Hi, the patch helped, but what's the conclusion? Is the problem > > "solved"? Will this patch be merged in the next kernel? This thread > > seems to be forgotten. > > I was yesterday preparing the patch description by adding some more > thoughts to it (as if there weren't enough already) but didn't yet send it > with new cover (to sort of notify davem). > > I give no guarantees about the _next_ kernel but some 2.6.26.y and 2.6.27 > is more likely bet. By the way, i had also problem with frto with local connections, and it was trivial to reproduce. But because of proprioetary(but i have sources) userspace application and specific way of using it - i didn't report to maillist. But after patch is ready, add me please in cc, i will test it with me too. For me disabling frto helps to solve problem. With frto i have connections "stalling", if there is trasferred large chunks of data over loopback. It is complicated way how it works all - but i can try to explain how everything works, if required. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-08-07 11:50 ` Denys Fedoryshchenko @ 2008-08-07 12:11 ` Thomas Jarosch 2008-08-07 12:14 ` Ilpo Järvinen 1 sibling, 0 replies; 107+ messages in thread From: Thomas Jarosch @ 2008-08-07 12:11 UTC (permalink / raw) To: Denys Fedoryshchenko Cc: Ilpo Järvinen, Dâniel Fraga, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Thursday, 7. August 2008 13:50:42 Denys Fedoryshchenko wrote: > By the way, i had also problem with frto with local connections, and it was > trivial to reproduce. But because of proprioetary(but i have sources) > userspace application and specific way of using it - i didn't report to > maillist. But after patch is ready, add me please in cc, i will test it > with me too. > > For me disabling frto helps to solve problem. With frto i have > connections "stalling", if there is trasferred large chunks of data over > loopback. It is complicated way how it works all - but i can try to explain > how everything works, if required. What kernel version are you using? IMHO this could only happen on the loopback interface if you are a) using an "old" kernel version like 2.6.24/2.6.25 or b) there might be a bug hidden somewhere else. Ilpo has sent the "final" patch to linux-netdev some minutes ago, give it a try. Thomas ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-08-07 11:50 ` Denys Fedoryshchenko 2008-08-07 12:11 ` Thomas Jarosch @ 2008-08-07 12:14 ` Ilpo Järvinen 2008-08-07 12:23 ` Denys Fedoryshchenko 1 sibling, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-07 12:14 UTC (permalink / raw) To: Denys Fedoryshchenko Cc: Dâniel Fraga, Thomas Jarosch, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 2530 bytes --] On Thu, 7 Aug 2008, Denys Fedoryshchenko wrote: > On Thursday 07 August 2008, Ilpo Järvinen wrote: > > On Wed, 6 Aug 2008, Dâniel Fraga wrote: > > > On Thu, 31 Jul 2008 15:47:55 +0200 > > > > > > Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > > > > If your problem is really FRTO related (that what the patch is for), > > > > you could try to disable FRTO temporarily: > > > > > > Hi, the patch helped, but what's the conclusion? Is the problem > > > "solved"? Will this patch be merged in the next kernel? This thread > > > seems to be forgotten. > > > > I was yesterday preparing the patch description by adding some more > > thoughts to it (as if there weren't enough already) but didn't yet send it > > with new cover (to sort of notify davem). > > > > I give no guarantees about the _next_ kernel but some 2.6.26.y and 2.6.27 > > is more likely bet. > > By the way, i had also problem with frto with local connections, and it was > trivial to reproduce. But because of proprioetary(but i have sources) > userspace application and specific way of using it - i didn't report to > maillist. I could have still looked to it :-), I can mostly decide anything TCP congestion control related based on solely a tcpdump, and I can even read tcpdump -n -r logfile output if you want to fully hide any payloads (as long as the lines are not split to a mess in an email :-)) though then plotting them is not as easy for me (I could hack my tool someday though to handle that as well). But if there is pre-2.6.25.7/2.6.26 kernel involved, then it's obsolete one and requires upgrade or the relevant fixes from 2.6.25.7. > But after patch is ready, add me please in cc, i will test it with > me too. I already sent it, though vger was in some sort of distress, so it might take some time to arrive... > For me disabling frto helps to solve problem. With frto i have > connections "stalling", if there is trasferred large chunks of data over > loopback. It is complicated way how it works all - but i can try to explain > how everything works, if required. Could you just tcpdump over (at least) one stall? ...That would be useful even if you find the patch I sent working because it's always possible that something has been overlooked in FRTO spec or so and I would like to understand the problem rather than just use a workaround which was intented to fix (possibly) other problem... If there's something I cannot figure out from the dump, I'll then consult you about the userspace details. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-08-07 12:14 ` Ilpo Järvinen @ 2008-08-07 12:23 ` Denys Fedoryshchenko 2008-08-08 9:56 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Denys Fedoryshchenko @ 2008-08-07 12:23 UTC (permalink / raw) To: Ilpo Järvinen Cc: Dâniel Fraga, Thomas Jarosch, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Thursday 07 August 2008, Ilpo Järvinen wrote: > > I could have still looked to it :-), I can mostly decide anything TCP > congestion control related based on solely a tcpdump, and I can even read > tcpdump -n -r logfile output if you want to fully hide any payloads (as > long as the lines are not split to a mess in an email :-)) though then > plotting them is not as easy for me (I could hack my tool someday though > to handle that as well). I will try my best to reproduce it and report (sure on latest stable kernel). On pre and git a bit more difficult but i will try also. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-08-07 12:23 ` Denys Fedoryshchenko @ 2008-08-08 9:56 ` Ilpo Järvinen 2008-08-08 10:32 ` Denys Fedoryshchenko 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-08 9:56 UTC (permalink / raw) To: Denys Fedoryshchenko Cc: Dâniel Fraga, Thomas Jarosch, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller [-- Attachment #1: Type: TEXT/PLAIN, Size: 769 bytes --] On Thu, 7 Aug 2008, Denys Fedoryshchenko wrote: > On Thursday 07 August 2008, Ilpo Järvinen wrote: > > > > I could have still looked to it :-), I can mostly decide anything TCP > > congestion control related based on solely a tcpdump, and I can even read > > tcpdump -n -r logfile output if you want to fully hide any payloads (as > > long as the lines are not split to a mess in an email :-)) though then > > plotting them is not as easy for me (I could hack my tool someday though > > to handle that as well). > > I will try my best to reproduce it and report (sure on latest stable kernel). > On pre and git a bit more difficult but i will try also. I thought it was "trivial to reproduce"... :-) Perhaps it was then just related to pre-2.6.25.7 kernels? -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: TCP connection stalls under 2.6.24.7 2008-08-08 9:56 ` Ilpo Järvinen @ 2008-08-08 10:32 ` Denys Fedoryshchenko 0 siblings, 0 replies; 107+ messages in thread From: Denys Fedoryshchenko @ 2008-08-08 10:32 UTC (permalink / raw) To: Ilpo Järvinen Cc: Dâniel Fraga, Thomas Jarosch, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik, David Miller On Friday 08 August 2008, Ilpo Järvinen wrote: > I thought it was "trivial to reproduce"... :-) Perhaps it was then just > related to pre-2.6.25.7 kernels? Trivial to reproduce, but it is production systems with 600-700 req/s, i cannot take much risk with them, and it semi-embedded distro running on USB flash. I am just trying now to use them, if i will have failures, i will buy PC and build something on my table. ^ permalink raw reply [flat|nested] 107+ messages in thread
* [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-06 18:53 ` Dâniel Fraga 2008-08-07 6:54 ` Ilpo Järvinen @ 2008-08-07 11:33 ` Ilpo Järvinen 2008-08-08 4:42 ` Bill Fink 1 sibling, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-07 11:33 UTC (permalink / raw) To: Dâniel Fraga, Thomas Jarosch, David Miller Cc: Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik [-- Attachment #1: Type: TEXT/PLAIN, Size: 7137 bytes --] On Wed, 6 Aug 2008, Dâniel Fraga wrote: > On Thu, 31 Jul 2008 15:47:55 +0200 > Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > > > If your problem is really FRTO related (that what the patch is for), > > you could try to disable FRTO temporarily: > > Hi, the patch helped, but what's the conclusion? Is the problem > "solved"? Will this patch be merged in the next kernel? This thread > seems to be forgotten. ...Dave, I think we should probably put this FRTO work-around to net-2.6 and -stable to remain somewhat robust (it's currently worked around only for newreno anyway). ...But I leave the final decision up to you. -- i. [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround Hmm, it wasn't non-dup ACKing receiver, there were dupACKs when an unnecessary retransmission was made (though those ACKs revoke a part of the advertized window, which is strange enough in itself :-)). 2nd try: This is probably due to some broken middlebox but that's purely speculation since the details of the not named ISP's (you can find some hint in Patrick's blog though ;-)) equipment are not available to us. It seems that we will have to consciously attempt to violate packet conservation principle and do a spammy go-back-n in case there's a middlebox using split TCPish approach by waiting an arrival of TCP layer retransmission and then doing an in-order delivery (basically violates end-to-end semantics of a TCP connection). I.e., the proxy intentionally reorders segment by _any_ amount (well, there's some upper limit based on the advertized window I guess), it's ridiculously fragile approach... Such middleboxes basically mean two things: First, any measured RTT value when a loss occurred is entirely bogus, yet all indication of the existance of that loss is hidden intentionally, so the correct operation basically depends on ambiguity problem and the inability to measure RTTs during it. Secondly, a timely feedback from network is non-existing, ie., no fast recovery & friends... This goodbye for RFC2581 clearly signifies that such way of behavior is contradicting some very fundamental assumptions a standard TCP is allowed to make about the network, would the RFC2581 stuff work, also FRTO would work. ...Finally I see something which resembles something as pre-historic as TCP Tahoe (I mean in the real world) :-). FRTO assumes reordering is relatively rare thing, but this middlebox has decided to _always_ reorder the key segments FRTO depends on... Thus FRTO makes "wrong" decision and declares the RTO spurious, which is not in fact wrong at all because the receiver probably received the segments in that order (or at least its TCP layer did) and clearly indicates it by the cumulative ACK pattern. A cumulative ACK for a not retransmitted range basically means that one of those segments just arrived when an ACK got sent, in this case it's after ridiculous RTT, even 50 seconds were measured in practice!! As a result, tp->rttvar flies to outer space when exponentially increasing RTTs get sampled. But this increase is much desired, in general, to avoid future RTOs would the real RTT really grow that fast. It just leads to a disaster here because the RTT measurements are sender driven. The workaround prevents reentry to FRTO when a previous FRTO recovery occurred within the last window (though multiple RTOs for a single segment are still allowed to go into FRTO each time). This workaround impacts FRTO accuracy as we lose ability to detect more than one spurious segment per window. We just consciously violate packet conservation principle by retransmitting unnecessarily to make rest of the high RTT readings ambiguous and that's it... :-) Though even go-back-N as fallback this won't guarantee anything if we're just unlucky because RTTs we measure can still grow if losses occur too frequently so that period in between is not enough to lower RTT estimation :-). In contrast, non-FRTO TCP can always happily ignore high RTT readings because of the ambiguity problem, ie., by violating packet conservation principle by design :-). I currently implemented the workaround for newreno only though SACK TCP could be subject to similar middlebox but lets hope that there won't be that many of middleboxes that allow negotiating SACK through them while forcing SACK blocks to extinction. I find this workaround quite controversial, it seems that without FRTO (at all), amusing 6.8% of the transmitted segments were unnecessarily retransmitted, which do cause buffer overflow that often leads to another RTO (in ~50% of cases), which is sort of expected when packet conservation principle gets violated like here. With FRTO, even if its final decision (ie., RTO=spurious) here is probably "flawed" because of the carefully selected reordering, _all_ unnecessary retransmissions are avoided (those duplicate ACKs that indicated old segment arrivals vanished) and with the default response the congestion window gets shrunk anyway so it's not more aggressive than what non-FRTO TCP would be. Sadly enough the RTT times will grow making FRTO approach unbearable without some changes. Still, that kind of middleboxes do no good for any TCP flow and should be fixed. A better workaround would have to consider two things to keep performance on a semi-acceptable level: prevent exponential RTT back-off while avoiding over-aggressive cwnd calculation. The latter seems easy to deal with because either the RTO is genuine spurious RTO within the original window or there's this crazy middlebox which only received the retransmission while the original got lost, both events fall to the same RTT where cwnd was already reduced and therefore it is possible to show that there's no further need for congestion window reduction. But the RTT back-off prevention would be more controversial because as said before, it is a desirable property in case of a genuine spurious RTO. However, it might be possible to argue that this situation where two spurious RTOs hit the same window won't occur that often in practice (for different segments, we already adjusted the RTO value anyway on the first of them). ...I leave that into future consideration. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Tested-by: Dâniel Fraga <fragabr@gmail.com> --- net/ipv4/tcp_input.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 67ccce2..e137578 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1721,6 +1721,13 @@ int tcp_use_frto(struct sock *sk) if (tcp_is_sackfrto(tp)) return 1; + /* in-order-only "TCP proxy" fragility workaround, spam by go-back-n, + * ie., consciously attempt to violate packet conservation principle + * to cover every loss in the outstanding window on a single RTT + */ + if (tp->frto_counter != 1 && tp->frto_highmark) + return 0; + /* Avoid expensive walking of rexmit queue if possible */ if (tp->retrans_out > 1) return 0; -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-07 11:33 ` [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround Ilpo Järvinen @ 2008-08-08 4:42 ` Bill Fink 2008-08-08 10:32 ` Ilpo Järvinen 2008-08-11 21:41 ` David Miller 0 siblings, 2 replies; 107+ messages in thread From: Bill Fink @ 2008-08-08 4:42 UTC (permalink / raw) To: Ilpo Järvinen Cc: Dâniel Fraga , Thomas Jarosch, David Miller, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik On Thu, 7 Aug 2008, Ilpo Järvinen wrote: > On Wed, 6 Aug 2008, Dâniel Fraga wrote: > > > On Thu, 31 Jul 2008 15:47:55 +0200 > > Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > > > > > If your problem is really FRTO related (that what the patch is for), > > > you could try to disable FRTO temporarily: > > > > Hi, the patch helped, but what's the conclusion? Is the problem > > "solved"? Will this patch be merged in the next kernel? This thread > > seems to be forgotten. > > ...Dave, I think we should probably put this FRTO work-around to net-2.6 > and -stable to remain somewhat robust (it's currently worked around only > for newreno anyway). ...But I leave the final decision up to you. Since you suspect the problem is being caused by a broken middlebox, would it perhaps be a better approach to add a per-route option to allow disabling of FRTO for the given destination. This would be similar to Stephen Hemminger's fix for broken middleboxes that don't handle window scaling properly. It seems this would be better than modifying FRTO behavior for everyone else that is being compliant. A question then arises is if the bogus scenario has a TCP signature that could be used to print a warning message for the unsuspecting user so they could then take necessary corrective action. -Bill -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-08 4:42 ` Bill Fink @ 2008-08-08 10:32 ` Ilpo Järvinen 2008-08-11 21:44 ` David Miller 2008-08-11 21:41 ` David Miller 1 sibling, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-08 10:32 UTC (permalink / raw) To: Bill Fink Cc: \Dâniel Fraga\, Thomas Jarosch, David Miller, Netdev, Patrick McHardy, Sven Riedel, Netfilter Developer Mailing List, Jozsef Kadlecsik [-- Attachment #1: Type: TEXT/PLAIN, Size: 3706 bytes --] On Fri, 8 Aug 2008, Bill Fink wrote: > On Thu, 7 Aug 2008, Ilpo Järvinen wrote: > > > On Wed, 6 Aug 2008, Dâniel Fraga wrote: > > > > > On Thu, 31 Jul 2008 15:47:55 +0200 > > > Thomas Jarosch <thomas.jarosch@intra2net.com> wrote: > > > > > > > If your problem is really FRTO related (that what the patch is for), > > > > you could try to disable FRTO temporarily: > > > > > > Hi, the patch helped, but what's the conclusion? Is the problem > > > "solved"? Will this patch be merged in the next kernel? This thread > > > seems to be forgotten. > > > > ...Dave, I think we should probably put this FRTO work-around to net-2.6 > > and -stable to remain somewhat robust (it's currently worked around only > > for newreno anyway). ...But I leave the final decision up to you. > > Since you suspect the problem is being caused by a broken middlebox, It seems very likely, any split-TCPish approach that tries to hide some losses that would happen on access links could cause this though it's very stupid to put such box there when there's a physical wire rather than wireless. And even with wireless the given configuration is not going to help but make things worse :-), the box is plain stupid as is (I guess it's deployed because some marketting guy has convinced some clueless whoever that they need the box :-)). In theory it could be at the receiver below the TCP layer too but that's quite unlikely that smtp server would run on such stack. And also then it's kind of middlebox as TCP works end-to-end (not end host to end host) while the rest remains as black box to it, even if something is performed on the very same host below TCP layer. Even less likely thing is that TCP receiver would do this and it doesn't explain pacing of ACKs at all. ...It would be at least kind of twisting of specs if not out-of-spec somewhere. > would it perhaps be a better approach to add a per-route option to > allow disabling of FRTO for the given destination. This would be > similar to Stephen Hemminger's fix for broken middleboxes that don't > handle window scaling properly. It seems this would be better than > modifying FRTO behavior for everyone else that is being compliant. Sure, but that requires some thought still, I'll try after weekend so that I can think it a bit more because there are plenty of states where we can end to after the detection of the first RTO as spurious. It might even be interesting to run CA_Recovery on RTOs when we detect this kind of middlebox because RTOs basically happen because there's lack of duplicate ACKs and then we could efficiently use partial ACKs to send just the lost segments rather than everything which is causing problems after the recovery has finished because we sent with too high rate while recovering. Then fallbackto CA_Loss if RTO is triggered again in CA_Recovery. But I'm not sure if it's worth of the effort though. > A question then arises is if the bogus scenario has a TCP signature > that could be used to print a warning message for the unsuspecting > user so they could then take necessary corrective action. Probably yes, but I need to add some state. I could probably also make it to switch per flow to more robust approach on-demand when enough evidence is gathered. ...I think I'll add 1-bit history counter per flow so that it's possible to do print the warning and switch when there's third RTO in a single window (while two first were found spurious). IMHO it's unlikely enough that there will be three latency spikes (each longer than the previous) within a single window to make the decision, I wouldn't trust two enough because hand-overs can take time and have non-trivial effects. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-08 10:32 ` Ilpo Järvinen @ 2008-08-11 21:44 ` David Miller 2008-08-12 7:46 ` Thomas Jarosch 0 siblings, 1 reply; 107+ messages in thread From: David Miller @ 2008-08-11 21:44 UTC (permalink / raw) To: ilpo.jarvinen Cc: billfink, fragabr, thomas.jarosch, netdev, kaber, sr, netfilter-devel, kadlec From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi> Date: Fri, 8 Aug 2008 13:32:14 +0300 (EEST) > On Fri, 8 Aug 2008, Bill Fink wrote: > > A question then arises is if the bogus scenario has a TCP signature > > that could be used to print a warning message for the unsuspecting > > user so they could then take necessary corrective action. > > Probably yes, but I need to add some state. I could probably also make it > to switch per flow to more robust approach on-demand when enough evidence > is gathered. ...I think I'll add 1-bit history counter per flow so that > it's possible to do print the warning and switch when there's third RTO in > a single window (while two first were found spurious). IMHO it's unlikely > enough that there will be three latency spikes (each longer than the > previous) within a single window to make the decision, I wouldn't trust > two enough because hand-overs can take time and have non-trivial effects. Trying to come up with a signature for this bogus stuff is both time consuming and having a risk of false positives. And I really question whether this thing is worth it. The sane thing to do in this case is to declare the box inoperative and that it needs to be fixed to avoid this behavior. Any reasonable congestion control scheme is going to run into problems trying to react to the packet patterns this thing creates. It is therefore not really limited to FRTO so it really shouldn't be treated like an FRTO problem even though it shows up more pronounced when FRTO is enabled. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-11 21:44 ` David Miller @ 2008-08-12 7:46 ` Thomas Jarosch 2008-08-12 8:18 ` David Miller 2008-08-22 21:18 ` Ilpo Järvinen 0 siblings, 2 replies; 107+ messages in thread From: Thomas Jarosch @ 2008-08-12 7:46 UTC (permalink / raw) To: David Miller Cc: ilpo.jarvinen, billfink, fragabr, netdev, kaber, sr, netfilter-devel, kadlec On Monday, 11. August 2008 23:44:21 David Miller wrote: > Trying to come up with a signature for this bogus stuff is both time > consuming and having a risk of false positives. And I really question > whether this thing is worth it. > > The sane thing to do in this case is to declare the box inoperative > and that it needs to be fixed to avoid this behavior. > > Any reasonable congestion control scheme is going to run into problems > trying to react to the packet patterns this thing creates. It is > therefore not really limited to FRTO so it really shouldn't be treated > like an FRTO problem even though it shows up more pronounced when > FRTO is enabled. David, I agree with you, though I'm not sure about the end user experience: The kernel is an early adopter of FRTO and will be bitten by bugs of other TCP implementations like we've experienced. I guess most affected users just see stalled or slow connections and won't have the time or knowledge to debug this. A proper warning could help them and the kernel developers to get this issue solved as quickly as possible. We called the hotline of the ISP several times and they always claimed sending big mails with Outlook/Windows works, so it must be linux's fault. That view of things is totally biased, but it's something I want to make sure people can't get away with easily :-) So, if it's possible to detect broken middleware boxes without spending too much time on it, that would really be nice. Thomas ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-12 7:46 ` Thomas Jarosch @ 2008-08-12 8:18 ` David Miller 2008-08-12 17:43 ` Dâniel Fraga 2008-08-13 8:00 ` Thomas Jarosch 2008-08-22 21:18 ` Ilpo Järvinen 1 sibling, 2 replies; 107+ messages in thread From: David Miller @ 2008-08-12 8:18 UTC (permalink / raw) To: thomas.jarosch Cc: ilpo.jarvinen, billfink, fragabr, netdev, kaber, sr, netfilter-devel, kadlec From: Thomas Jarosch <thomas.jarosch@intra2net.com> Date: Tue, 12 Aug 2008 09:46:17 +0200 > David, I agree with you, though I'm not sure about the end user experience: We had the same situation with ECN and window scaling, and my proposal is the same as how we handled those situations involving broken middleware boxes. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-12 8:18 ` David Miller @ 2008-08-12 17:43 ` Dâniel Fraga 2008-08-12 17:52 ` Ilpo Järvinen 2008-08-13 8:00 ` Thomas Jarosch 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-12 17:43 UTC (permalink / raw) To: David Miller Cc: thomas.jarosch, ilpo.jarvinen, billfink, netdev, kaber, sr, netfilter-devel, kadlec On Tue, 12 Aug 2008 01:18:22 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > We had the same situation with ECN and window scaling, and my proposal > is the same as how we handled those situations involving broken > middleware boxes. Sorry for my ignorance (I'm just an user), but if the problem is not with Linux, why this problem appeared just on 2.6.25 kernel? I mean, with 2.6.24 and before I never had stalled connections. Just a coincidence? Or something has changed in 2.6.25 which caused this? Thank you! -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-12 17:43 ` Dâniel Fraga @ 2008-08-12 17:52 ` Ilpo Järvinen 2008-08-13 17:53 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-12 17:52 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 967 bytes --] On Tue, 12 Aug 2008, Dâniel Fraga wrote: > On Tue, 12 Aug 2008 01:18:22 -0700 (PDT) > David Miller <davem@davemloft.net> wrote: > > > We had the same situation with ECN and window scaling, and my proposal > > is the same as how we handled those situations involving broken > > middleware boxes. > > Sorry for my ignorance (I'm just an user), but if the problem > is not with Linux, why this problem appeared just on 2.6.25 kernel? I > mean, with 2.6.24 and before I never had stalled connections. Just a > coincidence? Or something has changed in 2.6.25 which caused this? I still propose that you tcpdump it, then I can tell you (I know enough about Thomas' case but yours has a large number of unknowns)... :-) I don't know why 2.6.24 didn't suffer from the problem as FRTO was enabled already in it. The command you need to create dump.log file: # tcpdump -w dump.log -i <iface> host <peerip> ...you need root rights (or sudo) to do the capturing. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-12 17:52 ` Ilpo Järvinen @ 2008-08-13 17:53 ` Dâniel Fraga 2008-08-13 18:34 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-13 17:53 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Tue, 12 Aug 2008 20:52:37 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > I still propose that you tcpdump it, then I can tell you (I know > enough about Thomas' case but yours has a large number of > unknowns)... :-) I don't know why 2.6.24 didn't suffer from the > problem as FRTO was enabled already in it. The command you need > to create dump.log file: > > # tcpdump -w dump.log -i <iface> host <peerip> > > ...you need root rights (or sudo) to do the capturing. Ok, but the problem is that the bug doesn't happen frequently... yesterday I waited for it to happen and nothing happened :). I'll keep watching it... if I can get the dump, I send it. Thanks. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-13 17:53 ` Dâniel Fraga @ 2008-08-13 18:34 ` Ilpo Järvinen 2008-08-15 4:34 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-13 18:34 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 987 bytes --] On Wed, 13 Aug 2008, Dâniel Fraga wrote: > On Tue, 12 Aug 2008 20:52:37 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > I still propose that you tcpdump it, then I can tell you (I know > > enough about Thomas' case but yours has a large number of > > unknowns)... :-) I don't know why 2.6.24 didn't suffer from the > > problem as FRTO was enabled already in it. The command you need > > to create dump.log file: > > > > # tcpdump -w dump.log -i <iface> host <peerip> > > > > ...you need root rights (or sudo) to do the capturing. > > Ok, but the problem is that the bug doesn't happen > frequently... yesterday I waited for it to happen and nothing > happened :). I'll keep watching it... if I can get the dump, I send it. Ok, thanks for your efforts... These are often hard to reproduce because some not that likely pattern needs to happen and such things is often not easily controllable (if it is at all possible to influence the likelyhoods). -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-13 18:34 ` Ilpo Järvinen @ 2008-08-15 4:34 ` Dâniel Fraga 2008-08-15 7:06 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-15 4:34 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Wed, 13 Aug 2008 21:34:10 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Ok, thanks for your efforts... These are often hard to reproduce because > some not that likely pattern needs to happen and such things is often not > easily controllable (if it is at all possible to influence the > likelyhoods). Hi Ilpo, I don't know if the dumps are correct, but I did when the connection was stalled. The problem is, when I dumped "eth0", the connection suddenly come back alive again... so, I don't know if it's useless or not: For tun1 interface (which I use for my vpn): http://www.abusar.org/dump-tun1.log local loopback interface: http://www.abusar.org/dump-lo.log eth0, if it matters: http://www.abusar.org/dump-eth0.log Thanks. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-15 4:34 ` Dâniel Fraga @ 2008-08-15 7:06 ` Ilpo Järvinen 2008-08-15 21:35 ` Dâniel Fraga 2008-08-15 21:59 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-15 7:06 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 1756 bytes --] On Fri, 15 Aug 2008, Dâniel Fraga wrote: > On Wed, 13 Aug 2008 21:34:10 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Ok, thanks for your efforts... These are often hard to reproduce because > > some not that likely pattern needs to happen and such things is often not > > easily controllable (if it is at all possible to influence the > > likelyhoods). > > Hi Ilpo, I don't know if the dumps are correct, but I did when > the connection was stalled. I would be better to have tcpdump running at least a bit back (2-3 windows back is long enough for me), but obviously that might not be possible option because it occurs so rarely. ...It should be possible to have tcpdump restarted once in a while to avoid a one huge log if you'd just keep running tcpdump from beginning. > The problem is, when I dumped "eth0", the connection suddenly come back > alive again... The situation (or some of those I did debug with other people) are such that they may indeed resolve themself, though I'm also interested why the slow part occurred. > so, I don't know if it's useless or not: What do you mean by "come back alive"...? ...In eth0 log I found this connection 189.38.18.122.995 > 192.168.0.2.35477, the ip matches with abusar's. But I'm not sure if the connection in the tunnel is the interesting one, since it's going to/from port 119 but the ip addresses (10.195.195.2 and 10.195.195.1) don't tell anything to me, I guess you know their meaning (ie., if 10.195.195.2 is the one with which the connection stalls)? ...You're probably right that this wasn't very useful log, the longest "stall" I find is only 1.111328 seconds long (and it might be due to some processing that is made by 10.195.195.2). -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-15 7:06 ` Ilpo Järvinen @ 2008-08-15 21:35 ` Dâniel Fraga 2008-08-15 22:06 ` Ilpo Järvinen 2008-08-15 21:59 ` Dâniel Fraga 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-15 21:35 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Fri, 15 Aug 2008 10:06:39 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > I would be better to have tcpdump running at least a bit back (2-3 windows > back is long enough for me), but obviously that might not be possible > option because it occurs so rarely. ...It should be possible to have > tcpdump restarted once in a while to avoid a one huge log if you'd just > keep running tcpdump from beginning. Ok. > What do you mean by "come back alive"...? ...In eth0 log I found this I mean, it isn't stalled anymore. When it stalls, fetchnews stops and stay stalled forever. When it come back alive, it resumes (but it will only do that if I do something to restore the connection). > connection 189.38.18.122.995 > 192.168.0.2.35477, the ip matches with > abusar's. But I'm not sure if the connection in the tunnel is the > interesting one, since it's going to/from port 119 but the ip addresses > (10.195.195.2 and 10.195.195.1) don't tell anything to me, I guess you > know their meaning (ie., if 10.195.195.2 is the one with which the > connection stalls)? ...You're probably right that this wasn't very useful > log, the longest "stall" I find is only 1.111328 seconds long (and it > might be due to some processing that is made by 10.195.195.2). Ok: 10.195.195.1 is my local VPN IP (tun1) 10.195.195.2 is the remote VPN IP (on the server) 192.168.0.2 is my local IP (eth0) 189.38.18.122 is the server's IP Should I use tcpdump on the server too or is it sufficient to use on my client machine? Thank you very much again. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-15 21:35 ` Dâniel Fraga @ 2008-08-15 22:06 ` Ilpo Järvinen 2008-08-15 23:57 ` Dâniel Fraga 2008-08-16 2:15 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-15 22:06 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 2865 bytes --] On Fri, 15 Aug 2008, Dâniel Fraga wrote: > On Fri, 15 Aug 2008 10:06:39 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > I would be better to have tcpdump running at least a bit back (2-3 windows > > back is long enough for me), but obviously that might not be possible > > option because it occurs so rarely. ...It should be possible to have > > tcpdump restarted once in a while to avoid a one huge log if you'd just > > keep running tcpdump from beginning. > > Ok. > > > What do you mean by "come back alive"...? ...In eth0 log I found this > > I mean, it isn't stalled anymore. When it stalls, fetchnews > stops and stay stalled forever. When it come back alive, it resumes > (but it will only do that if I do something to restore the connection). Ok. I hope it will still reproduce with tcpdump running... Btw, doing cat /proc/net/tcp during the stall wouldn't be a bad idea (in addition to tcpdumping it). Also please let the tcpdumps run long enough if the stall persists, something like 15mins doesn't hurt because there are large timer values possibly involved. You might have mentioned it but I would like you to confirm which kernel version the server is running (at least 2.6.25.7 or 2.6.26 is new enough to have all bug fixes)? > > connection 189.38.18.122.995 > 192.168.0.2.35477, the ip matches with > > abusar's. But I'm not sure if the connection in the tunnel is the > > interesting one, since it's going to/from port 119 but the ip addresses > > (10.195.195.2 and 10.195.195.1) don't tell anything to me, I guess you > > know their meaning (ie., if 10.195.195.2 is the one with which the > > connection stalls)? ...You're probably right that this wasn't very useful > > log, the longest "stall" I find is only 1.111328 seconds long (and it > > might be due to some processing that is made by 10.195.195.2). > > Ok: > > 10.195.195.1 is my local VPN IP (tun1) > > 10.195.195.2 is the remote VPN IP (on the server) I sort of assumed so, thanks for the confirmation. > 192.168.0.2 is my local IP (eth0) > > 189.38.18.122 is the server's IP > > Should I use tcpdump on the server too or is it sufficient to > use on my client machine? It definately wouldn't hurt (though I usually can figure out what happens in the other end) and I guess it's quite easy for you to arrange. In case there's some other use than your testing traffic with the server, it's probably polite to filter there aggressively enough to not get that much unrelated traffic (tcpdump ... host <ip> and host <clientip> and port <portnum>, or so, I guess the ip address pair should be the vpn endpoints since the nntp traffic seems to go through it and the portnum is 119, if unsure you can verify with sudo netstat -p which tcp connections are associated to fetchnews if that's not immediately obvious). -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-15 22:06 ` Ilpo Järvinen @ 2008-08-15 23:57 ` Dâniel Fraga 2008-08-16 2:15 ` Dâniel Fraga 1 sibling, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-08-15 23:57 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Sat, 16 Aug 2008 01:06:55 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > You might have mentioned it but I would like you to confirm which kernel > version the server is running (at least 2.6.25.7 or 2.6.26 is new enough > to have all bug fixes)? Yes, 2.6.26. Thank you very much for your excellent explanation. You helped a lot. I'll try to do my "home work" and as soon as I have the data, I'll return. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-15 22:06 ` Ilpo Järvinen 2008-08-15 23:57 ` Dâniel Fraga @ 2008-08-16 2:15 ` Dâniel Fraga 2008-08-16 7:10 ` Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-16 2:15 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Sat, 16 Aug 2008 01:06:55 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Ok. I hope it will still reproduce with tcpdump running... Btw, doing cat > /proc/net/tcp during the stall wouldn't be a bad idea (in addition to > tcpdumping it). Also please let the tcpdumps run long enough if the stall > persists, something like 15mins doesn't hurt because there are large > timer values possibly involved. Hi, I did the following: fraga@tux ~/src$ cat /proc/net/tcp sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode 0: 00000000:0DA5 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 2912 1 ffff81007ea28000 299 0 0 2 -1 1: 00000000:23AA 00000000:0000 0A 00000000:00000000 00:00000000 00000000 501 0 6586 1 ffff8100614e1e00 299 0 0 2 -1 2: 00000000:1F4A 00000000:0000 0A 00000000:00000000 00:00000000 00000000 501 0 6164 1 ffff8100614e2400 299 0 0 2 -1 3: 00000000:0CEA 00000000:0000 0A 00000000:00000000 00:00000000 00000000 12347 0 3205 1 ffff81007ea29800 299 0 0 2 -1 4: 00000000:008B 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 3191 1 ffff81007e921800 299 0 0 2 -1 5: 00000000:1770 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 3454 1 ffff81007ea29e00 299 0 0 2 -1 6: 00000000:0015 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 2860 1 ffff81007e920600 299 0 0 2 -1 7: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 2507 1 ffff81007e920000 299 0 0 2 -1 8: 00000000:0077 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 2861 1 ffff81007e920c00 299 0 0 2 -1 9: 00000000:0019 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 3029 1 ffff81007ea29200 299 0 0 2 -1 10: 00000000:01BD 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0 0 3190 1 ffff81007e921200 299 0 0 2 -1 11: 0200A8C0:D8B5 4009BCCD:1446 01 00000000:00000000 00:00000000 00000000 501 0 6187 1 ffff8100614e4200 36 3 20 4 3 12: 0200A8C0:C77C D4E133C9:C5CF 01 00000000:00000000 00:00000000 00000000 501 0 7593 1 ffff810049008c00 39 3 24 4 -1 13: 0200A8C0:DD6A 21250440:0747 01 00000000:00000000 00:00000000 00000000 501 0 17613 1 ffff81007e927200 71 3 0 4 -1 14: 0200A8C0:9D7D B9B5A342:13BA 01 00000000:00000000 00:00000000 00000000 501 0 6183 1 ffff8100614e2a00 49 3 0 4 2 15: 0200A8C0:807C 7A1226BD:03E3 01 0000007C:00000000 01:00000089 00000003 501 0 24919 2 ffff81007ea2b600 183 0 0 2 2 16: 0200A8C0:8C07 7DA355D1:1467 01 00000000:00000000 00:00000000 00000000 501 0 6186 1 ffff8100614e3c00 41 3 26 4 -1 17: 0100007F:0077 0100007F:BED7 01 00000000:00000000 00:00000000 00000000 0 0 21883 1 ffff8100614e1800 21 3 1 6 -1 18: 0200A8C0:852D 7A1226BD:0016 01 00000000:00000000 02:0000E189 00000000 501 0 5975 2 ffff81007ea2ce00 23 3 10 2 2 19: 0100007F:B4C6 0100007F:0DA5 01 00000000:00000000 00:00000000 00000000 501 0 3815 1 ffff81007ea2a400 20 3 18 5 -1 20: 0200A8C0:EBFC B81B2ECF:0747 01 00000000:00000000 00:00000000 00000000 501 0 11878 1 ffff81004900c800 74 3 0 4 3 21: 0100007F:0DA5 0100007F:B4C5 01 00000000:00000000 00:00000000 00000000 0 0 2930 1 ffff81007ea28c00 20 3 31 3 -1 22: 0100007F:BED7 0100007F:0077 01 00000000:00000000 00:00000000 00000000 501 0 21881 1 ffff8100614e3000 20 3 0 5 -1 23: 0200A8C0:839B 141B2ECF:0747 01 00000000:00000000 00:00000000 00000000 501 0 6331 1 ffff8100614e0c00 83 3 0 4 3 24: 0100007F:0DA5 0100007F:B4C6 01 00000000:00000000 00:00000000 00000000 0 0 3816 1 ffff81007ea2aa00 21 3 23 5 -1 25: 0100007F:B4C5 0100007F:0DA5 01 00000000:00000000 00:00000000 00000000 65534 0 2929 1 ffff81007ea28600 20 3 30 3 -1 26: 0200A8C0:DBE6 63C155D1:0050 01 00000000:00000000 00:00000000 00000000 501 0 23988 1 ffff81007e922400 21 3 8 4 -1 27: 0200A8C0:8CDA 26E43641:0747 01 00000000:00000000 00:00000000 00000000 501 0 21604 1 ffff8100614e6000 60 3 0 4 -1 28: 0200A8C0:C2F5 8D1A2ECF:0747 01 00000000:00000000 00:00000000 00000000 501 0 6328 1 ffff81007e924200 110 3 0 4 3 29: 0200A8C0:8806 2D6C2ECF:0747 01 00000000:00000000 00:00000000 00000000 501 0 17566 1 ffff81004900ce00 46 3 30 4 -1 30: 0200A8C0:96FF 27F9F03F:0050 01 00000000:00000000 00:00000000 00000000 501 0 23953 1 ffff81007ea2c800 55 3 8 3 -1 31: 0200A8C0:8078 7A1226BD:03E3 04 0000007D:00000000 01:000009BD 00000006 0 0 0 2 ffff81007ea2c200 6718 3 6 2 2 And I can't use host and port at the same time with tcpdump (or I did something wrong) so I used (I need to update this, can't find the manpage... I tried to download a newer version but the link form the site seems broken): sudo tcpdump -w dump-mail.log -i eth0 port 995 to capture mail traffic that was stuck (usually it only happens with mail or nntp, interesting no?). All the other services (http, ssh, ftp always work fine). http://www.abusar.org/dump-mail.log But the file is small. I don't know if it will help. If not, no problem, just tell me and I'll try harder next time. Thanks. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-16 2:15 ` Dâniel Fraga @ 2008-08-16 7:10 ` Ilpo Järvinen 2008-08-16 19:18 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-16 7:10 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 1630 bytes --] On Fri, 15 Aug 2008, Dâniel Fraga wrote: > On Sat, 16 Aug 2008 01:06:55 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Ok. I hope it will still reproduce with tcpdump running... Btw, doing cat > > /proc/net/tcp during the stall wouldn't be a bad idea (in addition to > > tcpdumping it). Also please let the tcpdumps run long enough if the stall > > persists, something like 15mins doesn't hurt because there are large > > timer values possibly involved. > > Hi, I did the following: > > fraga@tux ~/src$ cat /proc/net/tcp ...snip... > And I can't use host and port at the same time with tcpdump (or > I did something wrong) so I used (I need to update this, can't find > the manpage... I tried to download a newer version but the link form > the site seems broken): > > sudo tcpdump -w dump-mail.log -i eth0 port 995 Hmm, sudo /usr/sbin/tcpdump -i eth1 host 192.168.1.1 and port 22 works for me, perhaps you forgot the and-operator in between them? Anyway, it seems quite fine. > to capture mail traffic that was stuck (usually it only happens > with mail or nntp, interesting no?). All the other services (http, > ssh, ftp always work fine). > > http://www.abusar.org/dump-mail.log > > But the file is small. I don't know if it will help. > > If not, no problem, just tell me and I'll try harder next time. > Thanks. This seems to be a valid sample, thanks. I'll return once I have figured something out (it might be that our state machine is somehow broken since there's traffic in both ways (rexmitted), yet neither party seems to be very willing to make progress). -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-16 7:10 ` Ilpo Järvinen @ 2008-08-16 19:18 ` Ilpo Järvinen 2008-08-17 0:36 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-16 19:18 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 1354 bytes --] On Sat, 16 Aug 2008, Ilpo Järvinen wrote: > On Fri, 15 Aug 2008, Dâniel Fraga wrote: > > > with mail or nntp, interesting no?). All the other services (http, > > ssh, ftp always work fine). > > > > But the file is small. I don't know if it will help. > > > > If not, no problem, just tell me and I'll try harder next time. > > This seems to be a valid sample, thanks. I'll return once I have figured > something out (it might be that our state machine is somehow broken since > there's traffic in both ways (rexmitted), yet neither party seems to be > very willing to make progress). Some thoughts, nothing very earth-shattering yet... It seems that the server (port 995) never leaves SYN-RECV state because it keeps retransmitting SYNACKs. While the other end (the client) is doing it's best to ACK them (correctly) and it also tries to send some data which never gets through and retransmissions are attempted for it (those packets also contain a ACK seqno that should be enough to end the SYN-RECV but for some reason that never happens). Eventually the connection is RSTed. I'll look through 2.6.24..25 history once I have some time to see if there are some clues about the cause. I'm also having a problem in figurin out why would the frto patch you tested solve this issue (unless there are two issues in the picture). -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-16 19:18 ` Ilpo Järvinen @ 2008-08-17 0:36 ` Dâniel Fraga 2008-08-19 10:38 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-17 0:36 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Sat, 16 Aug 2008 22:18:50 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > I'll look through 2.6.24..25 history once I have some time to see if > there are some clues about the cause. I'm also having a problem in > figurin out why would the frto patch you tested solve this issue (unless > there are two issues in the picture). Ok, surely some patch between .24 and .25 caused this. Or it's some bug that only "appeared" in .25 :) In fact, the frto patch helped, but not prevented the problem. I mean, it seems that with the frto patch, the problem doesn't happen frequently. And if I disable frto, the problem doesn't occur either. But, maybe, we could be talking about another bug, completely unrelated to frto... I don't know. i'm just guessing ;). Anyway, we talk about stalled connections ;) What I know is: 1) what you wrote is right: 2.6.24 is fine, 2.6.25 and 2.6.26 not 2) nmap -sS <server> seems to reset the connection (it's my workaround until now ;). Maybe the ping probe help in some way? I don't know. I want to help you as much as I can. So, ask anything you need. Thanks! -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-17 0:36 ` Dâniel Fraga @ 2008-08-19 10:38 ` Ilpo Järvinen 2008-08-20 0:34 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-19 10:38 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 2334 bytes --] On Sat, 16 Aug 2008, Dâniel Fraga wrote: > On Sat, 16 Aug 2008 22:18:50 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > I'll look through 2.6.24..25 history once I have some time to see if > > there are some clues about the cause. I'm also having a problem in > > figurin out why would the frto patch you tested solve this issue (unless > > there are two issues in the picture). > > Ok, surely some patch between .24 and .25 caused this. Or it's > some bug that only "appeared" in .25 :) > > In fact, the frto patch helped, but not prevented the problem. > I mean, it seems that with the frto patch, the problem doesn't happen > frequently. And if I disable frto, the problem doesn't occur either. > > But, maybe, we could be talking about another bug, completely > unrelated to frto... I don't know. i'm just guessing ;). Anyway, we > talk about stalled connections ;) > > What I know is: > > 1) what you wrote is right: 2.6.24 is fine, 2.6.25 and 2.6.26 not > > 2) nmap -sS <server> seems to reset the connection (it's my workaround > until now ;). Maybe the ping probe help in some way? I don't know. Perhaps, though it's not at all clear how it could do that... > I want to help you as much as I can. So, ask anything you need. I went through TCP related and inet_connection_sock related things, nothing obvious I could notice in there... Do you have net namespaces enabled CONFIG_NET_NS in .config? Any netfilter (iptables) rules on server which could cause those packets to not reach TCP layer? MIBs might give some clue why those segments didn't get accepted. Most interesting ones are PAWSEstab, TCPAbortOnSyn and InErrs. One can use /bin/cut to read those from the one-line files if one wants to (however, I attached a script which transposes them to get them somewhat human-readable). Also having the /proc/net/tcp output from the server while stalling would be (have been) useful to reveal state info (but I should have remembered to ask you to run it on both of them :-)). Also, I wonder what that [|tcp] hides, e.g., "<nop,nop,timestamp 15980976 70381399,nop,nop,[|tcp]>" in tcpdump (and that was for an ACK which doesn't make too much sense to me there). It occurs because snaplen which was given for tcpdump is small enough to make TCP header partial. -- i. [-- Attachment #2: Type: APPLICATION/X-SH, Size: 793 bytes --] ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-19 10:38 ` Ilpo Järvinen @ 2008-08-20 0:34 ` Dâniel Fraga 2008-08-20 7:57 ` Ilpo Järvinen 2008-08-20 12:37 ` Ilpo Järvinen 0 siblings, 2 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-08-20 0:34 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Tue, 19 Aug 2008 13:38:35 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Perhaps, though it's not at all clear how it could do that... I was thinking here of of some specific configuration I use. For example, I always used the wonder shaper htb script: http://lartc.org/howto/lartc.cookbook.ultimate-tc.html#AEN2241 Could HTB mess with frto or cause this problem? Would it be useful to disable completely HTB and use just the default scheduler? > Do you have net namespaces enabled CONFIG_NET_NS in .config? I couldn't find this specific option: fraga@tux /usr/src/linux$ grep CONFIG_NET_NS .config fraga@tux /usr/src/linux$ But I have those: fraga@tux /usr/src/linux$ grep CONFIG_NET_ .config # CONFIG_NET_KEY is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set CONFIG_NET_SCHED=y # CONFIG_NET_SCH_CBQ is not set CONFIG_NET_SCH_HTB=m # CONFIG_NET_SCH_HFSC is not set CONFIG_NET_SCH_PRIO=m CONFIG_NET_SCH_RED=m CONFIG_NET_SCH_SFQ=m # CONFIG_NET_SCH_TEQL is not set CONFIG_NET_SCH_TBF=m CONFIG_NET_SCH_GRED=m CONFIG_NET_SCH_DSMARK=m # CONFIG_NET_SCH_NETEM is not set CONFIG_NET_SCH_INGRESS=m CONFIG_NET_CLS=y # CONFIG_NET_CLS_BASIC is not set CONFIG_NET_CLS_TCINDEX=m CONFIG_NET_CLS_ROUTE4=m CONFIG_NET_CLS_ROUTE=y CONFIG_NET_CLS_FW=m CONFIG_NET_CLS_U32=m CONFIG_NET_CLS_RSVP=m # CONFIG_NET_CLS_RSVP6 is not set # CONFIG_NET_CLS_FLOW is not set # CONFIG_NET_EMATCH is not set CONFIG_NET_CLS_ACT=y CONFIG_NET_ACT_POLICE=y # CONFIG_NET_ACT_GACT is not set # CONFIG_NET_ACT_MIRRED is not set # CONFIG_NET_ACT_IPT is not set # CONFIG_NET_ACT_NAT is not set # CONFIG_NET_ACT_PEDIT is not set # CONFIG_NET_ACT_SIMP is not set # CONFIG_NET_CLS_IND is not set CONFIG_NET_SCH_FIFO=y # CONFIG_NET_PKTGEN is not set # CONFIG_NET_9P is not set # CONFIG_NET_SB1000 is not set CONFIG_NET_ETHERNET=y # CONFIG_NET_VENDOR_3COM is not set # CONFIG_NET_TULIP is not set CONFIG_NET_PCI=y # CONFIG_NET_POCKET is not set # CONFIG_NET_FC is not set # CONFIG_NET_POLL_CONTROLLER is not set And that: fraga@tux /usr/src/linux$ grep NAMESPACE .config CONFIG_NAMESPACES=y but this one, I think, isn't related to what you asked me. > Any netfilter (iptables) rules on server which could cause those packets > to not reach TCP layer? Here are the complete rules: # Generated by iptables-save v1.3.8 on Tue Aug 19 21:28:12 2008 *filter :INPUT DROP [627:34387] :FORWARD DROP [0:0] :OUTPUT ACCEPT [58771289:83128359870] :DROP_INPUT - [0:0] :FLDR - [0:0] :LDR - [0:0] -A INPUT -i lo -j ACCEPT -A INPUT -j DROP_INPUT -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p tcp -m multiport --dports 80,21,25,53,119,443,873,993,995 -A INPUT -s 192.168.102.1 -p tcp -m tcp --dport 3493 -j ACCEPT -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT -A INPUT -p udp -m udp --dport 53 -j ACCEPT -A INPUT -p tcp -m tcp --dport 113 -j REJECT --reject-with tcp-reset -A INPUT -p udp -m udp --dport 1194:1196 -j ACCEPT -A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT -A INPUT -j LDR -A FORWARD -j FLDR -A DROP_INPUT -s 216.201.112.111 -m comment --comment "deborahsafe Spam" -j DROP -A DROP_INPUT -s 200.49.247.241 -p tcp -m tcp --dport 22 -j DROP -A DROP_INPUT -s 189.70.204.3 -p tcp -m tcp --dport 21 -j DROP -A DROP_INPUT -s 189.70.204.3 -p tcp -m tcp --dport 21 -j DROP -A DROP_INPUT -s 189.70.204.3 -p tcp -m tcp --dport 21 -j DROP -A FLDR -j LOG --log-prefix "DROP [FORWARD]: " --log-level 6 --log-ip-options -A FLDR -j DROP -A LDR -j LOG --log-prefix "DROP [INPUT]: " --log-level 6 --log-ip-options -A LDR -j DROP COMMIT # Completed on Tue Aug 19 21:28:13 2008 As you can see, it's a preetty simple set of rules, nothing exotic here. > MIBs might give some clue why those segments didn't get accepted. Most > interesting ones are PAWSEstab, TCPAbortOnSyn and InErrs. One can use > /bin/cut to read those from the one-line files if one wants to (however, > I attached a script which transposes them to get them somewhat > human-readable). Also having the /proc/net/tcp output from the server > while stalling would be (have been) useful to reveal state info (but I > should have remembered to ask you to run it on both of them :-)). Ok ;) No problem, when I get the problem, I'll provide you the requested information. > Also, I wonder what that [|tcp] hides, e.g., "<nop,nop,timestamp > 15980976 70381399,nop,nop,[|tcp]>" in tcpdump (and that was for an ACK > which doesn't make too much sense to me there). It occurs because > snaplen which was given for tcpdump is small enough to make TCP header > partial. Hmmm, I don't know. This is complex to me, but I'll apply your script. Thank you! -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-20 0:34 ` Dâniel Fraga @ 2008-08-20 7:57 ` Ilpo Järvinen 2008-08-20 12:37 ` Ilpo Järvinen 1 sibling, 0 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-20 7:57 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 5537 bytes --] On Tue, 19 Aug 2008, Dâniel Fraga wrote: > On Tue, 19 Aug 2008 13:38:35 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Perhaps, though it's not at all clear how it could do that... > > I was thinking here of of some specific configuration I use. > For example, I always used the wonder shaper htb script: > > http://lartc.org/howto/lartc.cookbook.ultimate-tc.html#AEN2241 > > Could HTB mess with frto or cause this problem? Would it be > useful to disable completely HTB and use just the default scheduler? > > > Do you have net namespaces enabled CONFIG_NET_NS in .config? > > I couldn't find this specific option: > > fraga@tux /usr/src/linux$ grep CONFIG_NET_NS .config > fraga@tux /usr/src/linux$ > > But I have those: > > fraga@tux /usr/src/linux$ grep CONFIG_NET_ .config > # CONFIG_NET_KEY is not set > # CONFIG_NET_IPIP is not set > # CONFIG_NET_IPGRE is not set > CONFIG_NET_SCHED=y > # CONFIG_NET_SCH_CBQ is not set > CONFIG_NET_SCH_HTB=m > # CONFIG_NET_SCH_HFSC is not set > CONFIG_NET_SCH_PRIO=m > CONFIG_NET_SCH_RED=m > CONFIG_NET_SCH_SFQ=m > # CONFIG_NET_SCH_TEQL is not set > CONFIG_NET_SCH_TBF=m > CONFIG_NET_SCH_GRED=m > CONFIG_NET_SCH_DSMARK=m > # CONFIG_NET_SCH_NETEM is not set > CONFIG_NET_SCH_INGRESS=m > CONFIG_NET_CLS=y > # CONFIG_NET_CLS_BASIC is not set > CONFIG_NET_CLS_TCINDEX=m > CONFIG_NET_CLS_ROUTE4=m > CONFIG_NET_CLS_ROUTE=y > CONFIG_NET_CLS_FW=m > CONFIG_NET_CLS_U32=m > CONFIG_NET_CLS_RSVP=m > # CONFIG_NET_CLS_RSVP6 is not set > # CONFIG_NET_CLS_FLOW is not set > # CONFIG_NET_EMATCH is not set > CONFIG_NET_CLS_ACT=y > CONFIG_NET_ACT_POLICE=y > # CONFIG_NET_ACT_GACT is not set > # CONFIG_NET_ACT_MIRRED is not set > # CONFIG_NET_ACT_IPT is not set > # CONFIG_NET_ACT_NAT is not set > # CONFIG_NET_ACT_PEDIT is not set > # CONFIG_NET_ACT_SIMP is not set > # CONFIG_NET_CLS_IND is not set > CONFIG_NET_SCH_FIFO=y > # CONFIG_NET_PKTGEN is not set > # CONFIG_NET_9P is not set > # CONFIG_NET_SB1000 is not set > CONFIG_NET_ETHERNET=y > # CONFIG_NET_VENDOR_3COM is not set > # CONFIG_NET_TULIP is not set > CONFIG_NET_PCI=y > # CONFIG_NET_POCKET is not set > # CONFIG_NET_FC is not set > # CONFIG_NET_POLL_CONTROLLER is not set > > And that: > > fraga@tux /usr/src/linux$ grep NAMESPACE .config > CONFIG_NAMESPACES=y > > but this one, I think, isn't related to what you asked me. > > > Any netfilter (iptables) rules on server which could cause those packets > > to not reach TCP layer? > > Here are the complete rules: > > # Generated by iptables-save v1.3.8 on Tue Aug 19 21:28:12 2008 > *filter > :INPUT DROP [627:34387] > :FORWARD DROP [0:0] > :OUTPUT ACCEPT [58771289:83128359870] > :DROP_INPUT - [0:0] > :FLDR - [0:0] > :LDR - [0:0] > -A INPUT -i lo -j ACCEPT > -A INPUT -j DROP_INPUT > -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT > -A INPUT -p tcp -m multiport --dports 80,21,25,53,119,443,873,993,995 > -A INPUT -s 192.168.102.1 -p tcp -m tcp --dport 3493 -j ACCEPT > -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT > -A INPUT -p udp -m udp --dport 53 -j ACCEPT > -A INPUT -p tcp -m tcp --dport 113 -j REJECT --reject-with tcp-reset > -A INPUT -p udp -m udp --dport 1194:1196 -j ACCEPT > -A INPUT -p icmp -m icmp --icmp-type 8 -j ACCEPT > -A INPUT -j LDR > -A FORWARD -j FLDR > -A DROP_INPUT -s 216.201.112.111 -m comment --comment "deborahsafe Spam" -j DROP > -A DROP_INPUT -s 200.49.247.241 -p tcp -m tcp --dport 22 -j DROP > -A DROP_INPUT -s 189.70.204.3 -p tcp -m tcp --dport 21 -j DROP > -A DROP_INPUT -s 189.70.204.3 -p tcp -m tcp --dport 21 -j DROP > -A DROP_INPUT -s 189.70.204.3 -p tcp -m tcp --dport 21 -j DROP > -A FLDR -j LOG --log-prefix "DROP [FORWARD]: " --log-level 6 --log-ip-options > -A FLDR -j DROP > -A LDR -j LOG --log-prefix "DROP [INPUT]: " --log-level 6 --log-ip-options > -A LDR -j DROP > COMMIT > # Completed on Tue Aug 19 21:28:13 2008 > > As you can see, it's a preetty simple set of rules, nothing exotic here. > > > MIBs might give some clue why those segments didn't get accepted. Most > > interesting ones are PAWSEstab, TCPAbortOnSyn and InErrs. One can use > > /bin/cut to read those from the one-line files if one wants to (however, > > I attached a script which transposes them to get them somewhat > > human-readable). Also having the /proc/net/tcp output from the server > > while stalling would be (have been) useful to reveal state info (but I > > should have remembered to ask you to run it on both of them :-)). > > Ok ;) No problem, when I get the problem, I'll provide you the > requested information. It would be nice to "watch" them for a while (take snapshots with timestamps) during the event, so that it's easy to see increments. > > Also, I wonder what that [|tcp] hides, e.g., "<nop,nop,timestamp > > 15980976 70381399,nop,nop,[|tcp]>" in tcpdump (and that was for an ACK > > which doesn't make too much sense to me there). It occurs because > > snaplen which was given for tcpdump is small enough to make TCP header > > partial. > > Hmmm, I don't know. This is complex to me, but I'll apply your script. Try giving -s<number> among tcpdump parameters, where number is at least 100 or so. Also, it is very useful to have full set of logs about it to see what corresponds to what, so that also the tcpdump and /proc/net/tcp from both ends would be included (one started during the problem is better than nothing but if you can get it from earlier point too it would be quite nice). I'll comment the rest of this mail later on... -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-20 0:34 ` Dâniel Fraga 2008-08-20 7:57 ` Ilpo Järvinen @ 2008-08-20 12:37 ` Ilpo Järvinen 2008-08-22 21:32 ` Dâniel Fraga 1 sibling, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-20 12:37 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 4190 bytes --] On Tue, 19 Aug 2008, Dâniel Fraga wrote: > On Tue, 19 Aug 2008 13:38:35 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Perhaps, though it's not at all clear how it could do that... > > I was thinking here of of some specific configuration I use. > For example, I always used the wonder shaper htb script: > > http://lartc.org/howto/lartc.cookbook.ultimate-tc.html#AEN2241 > > Could HTB mess with frto or cause this problem? Would it be > useful to disable completely HTB and use just the default scheduler? Based on irc discussion with davem, there is a htb bug which can cause corruption of the retransmitted TCP packets (and then a discard due to checksum mismatch). That would also explain the strange headers I noticed earlier. There's a patch below (should apply to 2.6.26), please put it at least on the host(s) which use htb (I don't know if both server and the client do use wondershaper script or just the client). An different failure symptoms (one could be somehow frto related as FRTO is used while retransmitting) are also quite well explainable. But FRTO is mostly not a suspect based on the tcpdump you provided (no FRTO workaround would help in that). If you tcpdump with -s0 at receiver, you get full payload and therefore it is possible to verify checksum correctness. > > Do you have net namespaces enabled CONFIG_NET_NS in .config? > > I couldn't find this specific option: > > fraga@tux /usr/src/linux$ grep CONFIG_NET_NS .config > fraga@tux /usr/src/linux$ > > But I have those: I wasn't in error :-), it took some time also for me to figure out the right one, it's quite expected to be off (and even that missing) since it's currently !SYSFS depending. > > Any netfilter (iptables) rules on server which could cause those packets > > to not reach TCP layer? > > Here are the complete rules: ...snip... > As you can see, it's a preetty simple set of rules, nothing exotic here. ...agreed. -- i. --- From: David Miller <davem@davemloft.net> pkt_sched: Fix return value corruption in HTB and TBF. Packet schedulers should only return NET_XMIT_DROP iff the packet really was dropped. If the packet does reach the device after we return NET_XMIT_DROP then TCP can crash because it depends upon the enqueue path return values being accurate. Signed-off-by: David S. Miller <davem@davemloft.net> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index 3fb58f4..51c3f68 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -595,11 +595,13 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch) kfree_skb(skb); return ret; #endif - } else if (cl->un.leaf.q->enqueue(skb, cl->un.leaf.q) != + } else if ((ret = cl->un.leaf.q->enqueue(skb, cl->un.leaf.q)) != NET_XMIT_SUCCESS) { - sch->qstats.drops++; - cl->qstats.drops++; - return NET_XMIT_DROP; + if (ret == NET_XMIT_DROP) { + sch->qstats.drops++; + cl->qstats.drops++; + } + return ret; } else { cl->bstats.packets += skb_is_gso(skb)?skb_shinfo(skb)->gso_segs:1; @@ -639,11 +641,13 @@ static int htb_requeue(struct sk_buff *skb, struct Qdisc *sch) kfree_skb(skb); return ret; #endif - } else if (cl->un.leaf.q->ops->requeue(skb, cl->un.leaf.q) != + } else if ((ret = cl->un.leaf.q->ops->requeue(skb, cl->un.leaf.q)) != NET_XMIT_SUCCESS) { - sch->qstats.drops++; - cl->qstats.drops++; - return NET_XMIT_DROP; + if (ret == NET_XMIT_DROP) { + sch->qstats.drops++; + cl->qstats.drops++; + } + return ret; } else htb_activate(q, cl); diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c index 0b7d78f..fc6f8f3 100644 --- a/net/sched/sch_tbf.c +++ b/net/sched/sch_tbf.c @@ -123,15 +123,8 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch) struct tbf_sched_data *q = qdisc_priv(sch); int ret; - if (skb->len > q->max_size) { - sch->qstats.drops++; -#ifdef CONFIG_NET_CLS_ACT - if (sch->reshape_fail == NULL || sch->reshape_fail(skb, sch)) -#endif - kfree_skb(skb); - - return NET_XMIT_DROP; - } + if (skb->len > q->max_size) + return qdisc_reshape_fail(skb, sch); if ((ret = q->qdisc->enqueue(skb, q->qdisc)) != 0) { sch->qstats.drops++; ^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-20 12:37 ` Ilpo Järvinen @ 2008-08-22 21:32 ` Dâniel Fraga 2008-08-22 21:37 ` David Miller 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-22 21:32 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Wed, 20 Aug 2008 15:37:13 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Based on irc discussion with davem, there is a htb bug which can cause > corruption of the retransmitted TCP packets (and then a discard due to > checksum mismatch). That would also explain the strange headers I noticed > earlier. There's a patch below (should apply to 2.6.26), please put it at > least on the host(s) which use htb (I don't know if both server and the > client do use wondershaper script or just the client). An different > failure symptoms (one could be somehow frto related as FRTO is used while > retransmitting) are also quite well explainable. > > But FRTO is mostly not a suspect based on the tcpdump you provided (no > FRTO workaround would help in that). Ilpo, I have good news. I decided to disable completely HTB and the problem seems to have gone. And frto is enabled, of course. So the problem was with HTB, not frto as I thought. The HTB patches you provided are going to be included in the next 2.6.27 kernel, right? Thank you. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-22 21:32 ` Dâniel Fraga @ 2008-08-22 21:37 ` David Miller 2008-08-23 14:14 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: David Miller @ 2008-08-22 21:37 UTC (permalink / raw) To: fragabr Cc: ilpo.jarvinen, thomas.jarosch, billfink, netdev, kaber, netfilter-devel, kadlec From: Dâniel Fraga <fragabr@gmail.com> Date: Fri, 22 Aug 2008 18:32:24 -0300 > On Wed, 20 Aug 2008 15:37:13 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Based on irc discussion with davem, there is a htb bug which can cause > > corruption of the retransmitted TCP packets (and then a discard due to > > checksum mismatch). That would also explain the strange headers I noticed > > earlier. There's a patch below (should apply to 2.6.26), please put it at > > least on the host(s) which use htb (I don't know if both server and the > > client do use wondershaper script or just the client). An different > > failure symptoms (one could be somehow frto related as FRTO is used while > > retransmitting) are also quite well explainable. > > > > But FRTO is mostly not a suspect based on the tcpdump you provided (no > > FRTO workaround would help in that). > > Ilpo, I have good news. I decided to disable completely HTB and > the problem seems to have gone. And frto is enabled, of course. So the > problem was with HTB, not frto as I thought. > > The HTB patches you provided are going to be included in the > next 2.6.27 kernel, right? Yes, but it's important that you verify that the patch makes the problem go away when HTB is enabled. Please make this test if you can. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-22 21:37 ` David Miller @ 2008-08-23 14:14 ` Dâniel Fraga 2008-08-23 14:38 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-23 14:14 UTC (permalink / raw) To: David Miller Cc: ilpo.jarvinen, thomas.jarosch, billfink, netdev, kaber, netfilter-devel, kadlec On Fri, 22 Aug 2008 14:37:09 -0700 (PDT) David Miller <davem@davemloft.net> wrote: > Yes, but it's important that you verify that the patch makes the > problem go away when HTB is enabled. Please make this test if you > can. Correct! I tested with the HTB patches and the problem was solved. ;) Thank you very much. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-23 14:14 ` Dâniel Fraga @ 2008-08-23 14:38 ` Ilpo Järvinen 2008-08-24 19:38 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-23 14:38 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 414 bytes --] On Sat, 23 Aug 2008, Dâniel Fraga wrote: > On Fri, 22 Aug 2008 14:37:09 -0700 (PDT) > David Miller <davem@davemloft.net> wrote: > > > Yes, but it's important that you verify that the patch makes the > > problem go away when HTB is enabled. Please make this test if you > > can. > > Correct! I tested with the HTB patches and the problem was > solved. ;) Thank you very much. Thanks for verifying it! -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-23 14:38 ` Ilpo Järvinen @ 2008-08-24 19:38 ` Dâniel Fraga 2008-08-26 14:10 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-24 19:38 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Sat, 23 Aug 2008 17:38:32 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Thanks for verifying it! Ops! i replied too fast! I just got a stalled connection again! Important: these files were generated with the HTB patches applied. Here are botch tcpdump files: http://www.abusar.org/htb/dump-mail-server.log http://www.abusar.org/htb/dump-mail-client.log Both readmibs: http://www.abusar.org/htb/readmibs-server.txt http://www.abusar.org/htb/readmibs-client.txt Here are both cat /proc/net/tcp: http://www.abusar.org/htb/tcp-server.txt http://www.abusar.org/htb/tcp-client.txt I use the following to generate those dumps: 1) on the server: tcpdump -s 0 -w dump-mail-server.log -i eth0 host 201.52.214.230 2) on the client: tcpdump -s 0 -w dump-mail-client.log -i eth0 host teleporto.abusar.org and port 995 What happened? 1) the connection was stalled 2) these tcpdumps are the *best ones* I got because although I started them with the connection already stalled, the connection suddenly is not stalled anymore, and a few minutes later was stalled again... 3) I keep tcpdump running for more time Ps: anyway I could notice that the only two services that remain stalled is nntp, ftp, pop3 and smtp... http is never stalled, neither ssh. It seems to affect only "old" protocols :) Ps2: anyway, the htb patch seems to help, because the problem took much longer to happen. With htb patches the problem happens one time a day. Without the htb patches the problem happens more than one time a day. Ps3: I really doesn't understand why "nmap -sS server" "solves" the stalled connection issue. Ps4: sorry for my hurry feedback before. I thought the problem had gone. Anyway, I hope this time I provided the best data for you. Thanks. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-24 19:38 ` Dâniel Fraga @ 2008-08-26 14:10 ` Ilpo Järvinen 2008-08-26 14:32 ` Ilpo Järvinen 2008-08-26 17:18 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-26 14:10 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 5085 bytes --] On Sun, 24 Aug 2008, Dâniel Fraga wrote: > On Sat, 23 Aug 2008 17:38:32 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Thanks for verifying it! > > Ops! i replied too fast! I just got a stalled connection again! > > Important: these files were generated with the HTB patches applied. snip > What happened? > > 1) the connection was stalled > > 2) these tcpdumps are the *best ones* I got Easy to read indeed :-). > because although I started > them with the connection already stalled, the connection suddenly is not > stalled anymore, and a few minutes later was stalled again... There is more than one TCP flow in your workload btw (so using "connection" is a bit more blurry from my/TCP's pov). Some stall and never finish, some get immediately through without any stalling and proceed ok. So far I've not seen any cases with mixed behavior. The client seems to be working as expected. It even responds with DSACKs to SYNACK retransmissions indicating that it has processed them on TCP level. It might break some foreign systems btw (I don't remember if it was specified, so some TCP implementers may miss that possibility and their stack give up while seeing that to happen :-)), I hope that nobody demands it to be disabled someday (just a sidenote and has no relation to the actual problem). > 3) I keep tcpdump running for more time > > Ps: anyway I could notice that the only two services that > remain stalled is nntp, ftp, pop3 and smtp... http is never stalled, > neither ssh. It seems to affect only "old" protocols :) It could be userspace related thing. > Ps2: anyway, the htb patch seems to help, because the problem > took much longer to happen. With htb patches the problem happens one > time a day. Without the htb patches the problem happens more than one > time a day. It seems that there could well be more than one problem, with symptoms similar enough that they're hard to distinguish without a packet trace. > Ps3: I really doesn't understand why "nmap -sS server" > "solves" the stalled connection issue. Did it solve in this particular case? At least for 995 nothing earth-shattering happened. I find it hardly related here. Ie., I clearly see the problematic flows, and non-problematic ones. Neither seem to have no relation to the nmap generated traffic / timing. There's one non-problematic 995 flow where server generates some traffic during nmap (5 mins since the previous packet was seen for that connection) but likely the NAT in between has timed out that connection because no tear-down resets (or anything else) show up in any tcpdump. > Ps4: sorry for my hurry feedback before. I thought the problem had > gone. Anyway, I hope this time I provided the best data for you. Thanks. No problem. It's well possible to have a lucky periods every now and then... A number of packets have bad tcp cksum for the sender but that's probably due to some offloading or so... Receiver-side has correct timestamps however, so it shouldn't be a problem after all. On the bright side, -s 0 allows all timestamps to be visible, this makes me really perplexed: S 3102907969:3102907969(0) win 5840 <mss 1460,sackOK,timestamp 37188459 0,nop,wscale 7> (DF) S 3069527876:3069527876(0) ack 3102907970 win 5792 <mss 1460,sackOK,timestamp 258711279 37188459,nop,wscale 6> (DF) . ack 1 win 46 <nop,nop,timestamp 37188477 258711279> (DF) P 1:125(124) ack 1 win 46 <nop,nop,timestamp 37188481 258711279> (DF) P 1:125(124) ack 1 win 46 <nop,nop,timestamp 37188699 258711279> (DF) P 1:125(124) ack 1 win 46 <nop,nop,timestamp 37189135 258711279> (DF) P 1:125(124) ack 1 win 46 <nop,nop,timestamp 37190007 258711279> (DF) P 1:125(124) ack 1 win 46 <nop,nop,timestamp 37191751 258711279> (DF) S 3069527876:3069527876(0) ack 3102907970 win 5792 <mss 1460,sackOK,timestamp 258712395 37191751,nop,wscale 6> (DF) . ack 1 win 46 <nop,nop,timestamp 37192938 258712395,nop,nop,sack sack 1 {0:1} > (DF) P 1:125(124) ack 1 win 46 <nop,nop,timestamp 37195239 258712395> (DF) ...On the latest syn, the ts_recent was updated by the last packet with data, so it was definately processed by (some parts of) TCP at the server, so at least that wasn't dropped any where in between. In order for that to happen, I think req->ts_recent = tmp_opt.rcv_tsval in tcp_check_req must be reached. It seems that there's likely an abort on early there because synacks keep being retransmitted. Would a valid socket be created the request would be removed from the list. ListenOverflows might explain this (it can't be ListenDrops since it's equal to ListenOverflows and both get incremented on overflow). Are you perhaps short on workers at the userspace server? It would be nice to capture those mibs often enough (eg., once per 1s with timestamps) during the stall to see what actually gets incremented during the event because there's currently so much haystack that finding the needle gets impossible (ListenOverflows 47410) :-). Also, the corresponding tcpdump would be needed to match the events. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-26 14:10 ` Ilpo Järvinen @ 2008-08-26 14:32 ` Ilpo Järvinen 2008-08-26 17:18 ` Dâniel Fraga 1 sibling, 0 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-26 14:32 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 706 bytes --] On Tue, 26 Aug 2008, Ilpo Järvinen wrote: > ListenOverflows might explain this (it can't be ListenDrops since it's > equal to ListenOverflows and both get incremented on overflow). Are you > perhaps short on workers at the userspace server? It would be nice to > capture those mibs often enough (eg., once per 1s with timestamps) during > the stall to see what actually gets incremented during the event because > there's currently so much haystack that finding the needle gets impossible > (ListenOverflows 47410) :-). Also, the corresponding tcpdump would be > needed to match the events. Alternatively, you could strace the userspace to see that it keeps accept()'ing the connections. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-26 14:10 ` Ilpo Järvinen 2008-08-26 14:32 ` Ilpo Järvinen @ 2008-08-26 17:18 ` Dâniel Fraga 2008-08-26 20:40 ` Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-26 17:18 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Tue, 26 Aug 2008 17:10:46 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > There is more than one TCP flow in your workload btw (so using > "connection" is a bit more blurry from my/TCP's pov). Some stall and never > finish, some get immediately through without any stalling and proceed ok. > So far I've not seen any cases with mixed behavior. Interesting. > It could be userspace related thing. Hmmm. I'll try to report this to the dovecot and inn lists. > It seems that there could well be more than one problem, with symptoms > similar enough that they're hard to distinguish without a packet trace. Yes, exactly! I think the same. > Did it solve in this particular case? At least for 995 nothing Yes. nmap -sS always solves the problem. Very strange. nmap -sS for me is kind of brute force attempt to restablish the normal behaviour of the server... Anyway, I disabled htb and frto and everything is fine for now. I'll keep investigating this. > ListenOverflows might explain this (it can't be ListenDrops since it's > equal to ListenOverflows and both get incremented on overflow). Are you > perhaps short on workers at the userspace server? It would be nice to I use dovecot por mail. I'll post on the dovecot list. If it's an userspace issue, better. > capture those mibs often enough (eg., once per 1s with timestamps) during > the stall to see what actually gets incremented during the event because > there's currently so much haystack that finding the needle gets impossible > (ListenOverflows 47410) :-). Also, the corresponding tcpdump would be > needed to match the events. Ok. If I had more useful information, I'll reply. Thank you very much! -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-26 17:18 ` Dâniel Fraga @ 2008-08-26 20:40 ` Ilpo Järvinen 2008-08-26 21:17 ` Dâniel Fraga 2008-08-28 21:49 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-26 20:40 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 4326 bytes --] On Tue, 26 Aug 2008, Dâniel Fraga wrote: > On Tue, 26 Aug 2008 17:10:46 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > There is more than one TCP flow in your workload btw (so using > > "connection" is a bit more blurry from my/TCP's pov). Some stall and never > > finish, some get immediately through without any stalling and proceed ok. > > So far I've not seen any cases with mixed behavior. > > Interesting. If you want to, a tcpdump from normal, working case wouldn't hurt either to show the "normal pattern" on network level and that is trivial to produce in no time now that you know the commands etc. I guess... :-) > > It could be userspace related thing. > > Hmmm. I'll try to report this to the dovecot and inn lists. They might not be that interested until we have something more concrete than what we know currently... :-) > > It seems that there could well be more than one problem, with symptoms > > similar enough that they're hard to distinguish without a packet trace. > > Yes, exactly! I think the same. > > > Did it solve in this particular case? At least for 995 nothing > > Yes. nmap -sS always solves the problem. Very strange. nmap -sS > for me is kind of brute force attempt to restablish the normal > behaviour of the server... Can you explain a bit more. Does it resolve during it or some time after it? And more importantly how do you know that it resolves? Ie., what is the normal behavior (be more specific than "it works" :-), how do know that it's working). It seems that either we lack some traffic between the parties or simply need to find out what the userspace is doing, and in the latter case what happens in the network might not be relevant at all. Is there possibility that we miss an alternative route by using the host rule for tcpdump (at the server)? Nmap starts at 22:26:26.613098, the last packet in the client log is at 22:26:01.452842. Alternatively, the port 995 was not the right one to track (though there's clearly this on network level visible problem with it too)... :-( > Anyway, I disabled htb and frto and everything is fine for now. > I'll keep investigating this. Two points: HTB shaping could cause drops that are related but considering what it visible in the server end's tcpdump, the userspace's behavior is quite relevant. You might jump into conclusions too quickly every now and then, more time might be needed to really ensure something is working. Obviously if any non-workingness is noticed, it's always a counter-proof even if long working periods occur in between. > > ListenOverflows might explain this (it can't be ListenDrops since it's > > equal to ListenOverflows and both get incremented on overflow). Are you > > perhaps short on workers at the userspace server? It would be nice to > > I use dovecot por mail. I'll post on the dovecot list. If it's > an userspace issue, better. It's not guaranteed that it's _only_ userspace, there could be some kernel aspect in the problem too (e.g., related to wakeups or so). In syscall terms this ListenOverflow means that int listen(int sockfd, int backlog); (see man -S 2 listen) is given some size as backlog for those connections that are not yet accept()'ed, and that is exhausted when the ListenOverflow gets incremented (ie., if I'm not completely wrong :-)). You might want to look on dovecot how to make it accept more concurrent connections, perhaps the login_max_processes_count might the right one (I quickly glanced http://wiki.dovecot.org/LoginProcess) though this is somewhat site configuration dependant according to that page. > > capture those mibs often enough (eg., once per 1s with timestamps) during > > the stall to see what actually gets incremented during the event because > > there's currently so much haystack that finding the needle gets impossible > > (ListenOverflows 47410) :-). Also, the corresponding tcpdump would be > > needed to match the events. > > Ok. If I had more useful information, I'll reply. > > Thank you very much! You could try setting up some script which does something along these lines and then redirect its during the event to some file (+ tcpdumping the thing obviously): while [ : ]; do date "+%s.%N" cat /proc/net/{netstat,snmp} sleep 1 done -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-26 20:40 ` Ilpo Järvinen @ 2008-08-26 21:17 ` Dâniel Fraga 2008-08-27 10:22 ` Ilpo Järvinen 2008-08-28 21:49 ` Dâniel Fraga 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-26 21:17 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Tue, 26 Aug 2008 23:40:58 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > If you want to, a tcpdump from normal, working case wouldn't hurt either > to show the "normal pattern" on network level and that is trivial to > produce in no time now that you know the commands etc. I guess... :-) Ok, there it is: http://www.abusar.org/htb/dump-normal.log Just the port 995... I checked email, then received a message, checked again, just the normal behaviour. > They might not be that interested until we have something more concrete > than what we know currently... :-) Ok :) And you're right, because if I disable frto and htb *and* the problem has gone, there's a huge chance to be something related to kernel. Or a mix of kernel and user space problem which happens just when frto and/or htb are used. > Can you explain a bit more. Does it resolve during it or some time after > it? And more importantly how do you know that it resolves? Ie., what is > the normal behavior (be more specific than "it works" :-), how do know > that it's working). Ok. For example: 1) the connection is normal, then suddenly it stalls. I cannot receive mail, nor download nntp messages, nor access ftp etc. 2) I do on my client machine a "nmap -sS server" and... 3) ...imediatelly the connection is not stalled anymore. Now I remembered one thing and I'd like to make a question (I hope it isn't a stupid question): dynticks (tickless) were implemented for x86-64 in 2.6.24 kernel and I started to use dynticks in 2.6.24. Could it be affecting the server behaviour? I use dynticks (enabled) on all my machines, but does it make sense to use in a server environment? Could the dynticks cause this? Until now, I don't think so, but... who knows? http://kernelnewbies.org/Linux_2_6_24#head-4edc562fa1b9fa8e5da5adaf1beab057237c325d > It seems that either we lack some traffic between the parties or simply > need to find out what the userspace is doing, and in the latter case what > happens in the network might not be relevant at all. Is there possibility > that we miss an alternative route by using the host rule for tcpdump (at > the server)? Nmap starts at 22:26:26.613098, the last packet in the client > log is at 22:26:01.452842. Alternatively, the port 995 was not the right > one to track (though there's clearly this on network level visible problem > with it too)... :-( I tracked the 995 port, because I have problems reading email pro pop3s (995). Should I do it different with tcpdump? > You might jump into conclusions too quickly every now and then, more > time might be needed to really ensure something is working. Obviously > if any non-workingness is noticed, it's always a counter-proof even if > long working periods occur in between. Ok. It seems a complex issue. You're right. I need more patience ;) > In syscall terms this ListenOverflow means that int listen(int sockfd, int > backlog); (see man -S 2 listen) is given some size as backlog for those > connections that are not yet accept()'ed, and that is exhausted when the > ListenOverflow gets incremented (ie., if I'm not completely wrong :-)). Hmm interesting. > You might want to look on dovecot how to make it accept more concurrent > connections, perhaps the login_max_processes_count might the right one > (I quickly glanced http://wiki.dovecot.org/LoginProcess) though this is > somewhat site configuration dependant according to that page. Yes, I have login_max_processes_count = 128 (the default) and I have just a few users (just 10 users), so I think it's not the problem. > You could try setting up some script which does something along these > lines and then redirect its during the event to some file (+ tcpdumping > the thing obviously): > > while [ : ]; do > date "+%s.%N" > cat /proc/net/{netstat,snmp} > sleep 1 > done Ok. You're helping a lot. Thanks Ilpo ;) -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-26 21:17 ` Dâniel Fraga @ 2008-08-27 10:22 ` Ilpo Järvinen 2008-08-27 19:51 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-27 10:22 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 6560 bytes --] On Tue, 26 Aug 2008, Dâniel Fraga wrote: > On Tue, 26 Aug 2008 23:40:58 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > If you want to, a tcpdump from normal, working case wouldn't hurt either > > to show the "normal pattern" on network level and that is trivial to > > produce in no time now that you know the commands etc. I guess... :-) > > Ok, there it is: > > http://www.abusar.org/htb/dump-normal.log > > Just the port 995... I checked email, then received a message, > checked again, just the normal behaviour. Thanks, those flows (there were again some) looks exactly what also the working connections in the earlier log do. > > They might not be that interested until we have something more concrete > > than what we know currently... :-) > > Ok :) And you're right, because if I disable frto and htb *and* > the problem has gone, there's a huge chance to be something related to > kernel. Or a mix of kernel and user space problem which happens just > when frto and/or htb are used. > > > Can you explain a bit more. Does it resolve during it or some time after > > it? And more importantly how do you know that it resolves? Ie., what is > > the normal behavior (be more specific than "it works" :-), how do know > > that it's working). > > Ok. For example: > > 1) the connection is normal, then suddenly it stalls. I cannot receive > mail, nor download nntp messages, nor access ftp etc. ...thus there could be other ports that are related as well, do you remember what exactly started working in that particular case :-)? > 2) I do on my client machine a "nmap -sS server" and... > > 3) ...imediatelly the connection is not stalled anymore. Which of the connections? Mail, nntp, ftp, or the etc.? :-) To the host ip which was given for the tcpdump filter, definately nothing was resumed. > Now I remembered one thing and I'd like to make a question (I > hope it isn't a stupid question): dynticks (tickless) were implemented > for x86-64 in 2.6.24 kernel and I started to use dynticks in 2.6.24. Could > it be affecting the server behaviour? I use dynticks (enabled) on all > my machines, but does it make sense to use in a server environment? > Could the dynticks cause this? Until now, I don't think so, but... who > knows? > > http://kernelnewbies.org/Linux_2_6_24#head-4edc562fa1b9fa8e5da5adaf1beab057237c325d I was think that at a time (even thought of enquiring you about this part of the config), but the tcpdump log shows a problem that is unlikely to depend on timers in any way (and at least some timer expires because the SYNACKs are retransmitted, so it's not in some infinite wait bug). I'd like to know what causes that and try to solve it. Once we know the reasons, we can probably easily determinate whether there's need to experiment with the timers. Trying to conquer all problems at once, when not even knowing how many problems one is going to find is not that easy either. Besides, I'd be more concerned about the timers on the client after seeing that nothing goes in the network while the nmap trick resolves the thing. > > It seems that either we lack some traffic between the parties or simply > > need to find out what the userspace is doing, and in the latter case what > > happens in the network might not be relevant at all. Is there possibility > > that we miss an alternative route by using the host rule for tcpdump (at > > the server)? Nmap starts at 22:26:26.613098, the last packet in the client > > log is at 22:26:01.452842. Alternatively, the port 995 was not the right > > one to track (though there's clearly this on network level visible problem > > with it too)... :-( > > I tracked the 995 port, because I have problems reading email > pro pop3s (995). Should I do it different with tcpdump? The server's log captured not only 995 traffic but everything else to the host with the given ip (including udp which should show the tunnelled traffic I guess). Unless there's some other route to that host with a different ip, I think we don't have much more to find out in the network (besides the potential of missing packets from tcpdump during the syn flooding, but it's very unlikely that all packets of some active flow would be hit at the same time, so something from a progressing flow would still be shown even if some of packets would be missing). This makes me wander if the network behavior is at all related to resolving of the problem. Only thing I can think of is that for some reason the userspace gets notified much later than it should about TCP reset and therefore is waiting until that happens and can only then continue. > > You might jump into conclusions too quickly every now and then, more > > time might be needed to really ensure something is working. Obviously > > if any non-workingness is noticed, it's always a counter-proof even if > > long working periods occur in between. > > Ok. It seems a complex issue. You're right. I need more > patience ;) ...of course if one wants to comment something to keep others posted what's happening, one could always note that "so far all good but I keep testing for longer time" (that's what some other people say). > > In syscall terms this ListenOverflow means that int listen(int sockfd, int > > backlog); (see man -S 2 listen) is given some size as backlog for those > > connections that are not yet accept()'ed, and that is exhausted when the > > ListenOverflow gets incremented (ie., if I'm not completely wrong :-)). > > Hmm interesting. > > > You might want to look on dovecot how to make it accept more concurrent > > connections, perhaps the login_max_processes_count might the right one > > (I quickly glanced http://wiki.dovecot.org/LoginProcess) though this is > > somewhat site configuration dependant according to that page. > > Yes, I have login_max_processes_count = 128 (the default) and I > have just a few users (just 10 users), so I think it's not the problem. It would be too easy explanation, yeah :-). Can you still please check next time that there aren't even near that many server processes at the server :-). > > You could try setting up some script which does something along these > > lines and then redirect its during the event to some file (+ tcpdumping > > the thing obviously): > > > > while [ : ]; do > > date "+%s.%N" > > cat /proc/net/{netstat,snmp} Adding this wouldn't hurt btw: cat /proc/net/tcp > > sleep 1 > > done > > Ok. You're helping a lot. Thanks Ilpo ;) > > > -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-27 10:22 ` Ilpo Järvinen @ 2008-08-27 19:51 ` Dâniel Fraga 2008-08-27 20:32 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-27 19:51 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Wed, 27 Aug 2008 13:22:22 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > ...thus there could be other ports that are related as well, do you > remember what exactly started working in that particular case :-)? > Which of the connections? Mail, nntp, ftp, or the etc.? :-) To the host ip > which was given for the tcpdump filter, definately nothing was resumed. Ok. Let's focus on mail: 1) first, my client (Claws-mail -- but it happened with Outlook of other users too) is working perfectly. I can download new messages. It connects to port 995 on the server without problems. 2) suddenly it gives me an error message, that it cannot authenticate anymore (sorry, I don't have the exact message). If I try again to download new messages, it gives the same error. The connection to port 995 seems stalled or, better yet, cannot complete succesfully. It seems to time out. 3) the server will stay this way, until I do an "nmap" to the server. This way, everything goes back to normal. So, the server at point 2 got stalled and just an nmap server can force the server to go back to normal behaviour (I discovered that nmap solved the issue by luck). > I was think that at a time (even thought of enquiring you about this > part of the config), but the tcpdump log shows a problem that is > unlikely to depend on timers in any way (and at least some timer expires > because the SYNACKs are retransmitted, so it's not in some infinite wait > bug). I'd like to know what causes that and try to solve it. Ok. > Once we know the reasons, we can probably easily determinate whether > there's need to experiment with the timers. Trying to conquer all problems > at once, when not even knowing how many problems one is going to find is > not that easy either. Besides, I'd be more concerned about the timers on > the client after seeing that nothing goes in the network while the nmap > trick resolves the thing. Ok. > The server's log captured not only 995 traffic but everything else to the > host with the given ip (including udp which should show the tunnelled > traffic I guess). Unless there's some other route to that host with Ok, that's because I forgot to restrict traffic to port 995 on the server. Sorry. > This makes me wander if the network behavior is at all related to > resolving of the problem. Only thing I can think of is that for some > reason the userspace gets notified much later than it should about > TCP reset and therefore is waiting until that happens and can only > then continue. What I can assure you is that other users (which use Microsoft Outlook Express) had the same problem, so, in this case, we can be pretty sure it isn't related to user space client. > It would be too easy explanation, yeah :-). Can you still please check > next time that there aren't even near that many server processes at the > server :-). Ok, when I get the problem I'll check this. > Adding this wouldn't hurt btw: > > cat /proc/net/tcp Ok. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-27 19:51 ` Dâniel Fraga @ 2008-08-27 20:32 ` Ilpo Järvinen 2008-08-27 20:50 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-27 20:32 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 2897 bytes --] On Wed, 27 Aug 2008, Dâniel Fraga wrote: > On Wed, 27 Aug 2008 13:22:22 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > ...thus there could be other ports that are related as well, do you > > remember what exactly started working in that particular case :-)? > > > Which of the connections? Mail, nntp, ftp, or the etc.? :-) To the host ip > > which was given for the tcpdump filter, definately nothing was resumed. > > Ok. Let's focus on mail: > > 1) first, my client (Claws-mail -- but it happened with Outlook of > other users too) is working perfectly. I can download new messages. It > connects to port 995 on the server without problems. > > 2) suddenly it gives me an error message, that it cannot authenticate > anymore (sorry, I don't have the exact message). The exact message is not that big deal :-). > If I try again to > download new messages, it gives the same error. The connection to port > 995 seems stalled or, better yet, cannot complete succesfully. It seems > to time out. > > 3) the server will stay this way, until I do an "nmap" to the server. > This way, everything goes back to normal. Ok. Though this all opens more questions than answers... :-(, why isn't there any traffic in neither of the tcpdumps then (not in the client's nor in the server's). > So, the server at point 2 got stalled and just an nmap server > can force the server to go back to normal behaviour (I discovered that > nmap solved the issue by luck). I guess it might have something to do with the additional 3-way handshake that gets attempted but who knows... > > The server's log captured not only 995 traffic but everything else to the > > host with the given ip (including udp which should show the tunnelled > > traffic I guess). Unless there's some other route to that host with > > Ok, that's because I forgot to restrict traffic to port 995 on > the server. Sorry. I don't think it was bad at all... :-) I just meant that there wasn't any other visible traffic, which is very very strange because nothing port 995 related (or anything else) seems to happen during the nmap... Which network interfaces the server has? Could things get routed through some other iface during the time of trouble (and during the nmap solution), that would explain why it isn't visible in the tcpdump which is for the specific interface. > > This makes me wander if the network behavior is at all related to > > resolving of the problem. Only thing I can think of is that for some > > reason the userspace gets notified much later than it should about > > TCP reset and therefore is waiting until that happens and can only > > then continue. > > What I can assure you is that other users (which use Microsoft > Outlook Express) had the same problem, so, in this case, we can be > pretty sure it isn't related to user space client. Ah, ok. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-27 20:32 ` Ilpo Järvinen @ 2008-08-27 20:50 ` Dâniel Fraga 2008-08-27 21:25 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-27 20:50 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Wed, 27 Aug 2008 23:32:34 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Ok. Though this all opens more questions than answers... :-(, why isn't > there any traffic in neither of the tcpdumps then (not in the client's > nor in the server's). Very strange. In this topic, I saw the discussion about some routers messing with traffic and frto related stuff, right? My server is behind some routers that I don't know (because it's not me who controls these routers). So if the problem is with some of these routers, I'm afraid we can do nothing about that. > I don't think it was bad at all... :-) I just meant that there wasn't any > other visible traffic, which is very very strange because nothing port 995 > related (or anything else) seems to happen during the nmap... Which > network interfaces the server has? Could things get routed through some > other iface during the time of trouble (and during the nmap solution), > that would explain why it isn't visible in the tcpdump which is for the > specific interface. fraga@teleporto ~$ ip route list 10.1.0.6 dev tun2 proto kernel scope link src 10.1.0.5 10.195.195.1 dev tun1 proto kernel scope link src 10.195.195.2 192.168.102.0/24 via 10.1.0.6 dev tun2 200.211.201.0/24 dev eth1 proto kernel scope link src 200.211.201.248 189.38.0.0/16 dev eth0 proto kernel scope link src 189.38.18.122 default via 189.38.18.121 dev eth0 Well, if that's the problem I'll be very ashamed for wasting your time. Anyway, eveything should go to eth0 interface, through 189.38.18.121 gateway. The eth1 interface (200.211.201.248) is an old interface which we do not use anymore. So I'm right now deactivating it (I should do that for a long time ago). Let's see if the problem remains or not (although I have other servers with multiple interfaces and everything is fine, since, as far as I understand, what matters is the default gateway -- I think that there's no reason to Linux send something to eth1 interface since there's only one default gateway). Anyway I'm dropping eth1 interface. I'll wait a few days before confirming if that's the problem or not. Thanks again. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-27 20:50 ` Dâniel Fraga @ 2008-08-27 21:25 ` Ilpo Järvinen 2008-08-27 21:42 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-27 21:25 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 3390 bytes --] On Wed, 27 Aug 2008, Dâniel Fraga wrote: > On Wed, 27 Aug 2008 23:32:34 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Ok. Though this all opens more questions than answers... :-(, why isn't > > there any traffic in neither of the tcpdumps then (not in the client's > > nor in the server's). > > Very strange. In this topic, I saw the discussion about some > routers messing with traffic and frto related stuff, right? My server > is behind some routers that I don't know (because it's not me who > controls these routers). So if the problem is with some of these > routers, I'm afraid we can do nothing about that. Sure, but that still doesn't explain at all why the expected traffic doesn't show up anywhere (ie., the working mail after nmap resolution). Such a router cannot prevent us tcpdumping both ends. My point is that we would always see the packet at least in the sending end's tcpdump. And, would we have the traffic from the end-point tcpdumps, we could trivially figure out what the middlebox did to the traffic. > > I don't think it was bad at all... :-) I just meant that there wasn't any > > other visible traffic, which is very very strange because nothing port 995 > > related (or anything else) seems to happen during the nmap... Which > > network interfaces the server has? Could things get routed through some > > other iface during the time of trouble (and during the nmap solution), > > that would explain why it isn't visible in the tcpdump which is for the > > specific interface. > > fraga@teleporto ~$ ip route list > 10.1.0.6 dev tun2 proto kernel scope link src 10.1.0.5 > 10.195.195.1 dev tun1 proto kernel scope link src 10.195.195.2 > 192.168.102.0/24 via 10.1.0.6 dev tun2 > 200.211.201.0/24 dev eth1 proto kernel scope link src > 200.211.201.248 189.38.0.0/16 dev eth0 proto kernel scope link src > 189.38.18.122 default via 189.38.18.121 dev eth0 > > Well, if that's the problem I'll be very ashamed for wasting > your time. Anyway, eveything should go to eth0 interface, through > 189.38.18.121 gateway. > > The eth1 interface (200.211.201.248) is an old interface which > we do not use anymore. So I'm right now deactivating it (I should do > that for a long time ago). Let's see if the problem remains or not > (although I have other servers with multiple interfaces and everything > is fine, since, as far as I understand, what matters is the default > gateway -- I think that there's no reason to Linux send something to > eth1 interface since there's only one default gateway). Agreed, it shouldn't happen. Do you have a static setup for the IPs or is there dhcp component which could in theory cause some routing table alterations, again I find that unlikely but in the meantime I start to run out of ideas how the client and server can speak with each other without leaving any traces about it (I hope you really used exactly the tcpdump command you told in the mail linking to those stall logs). > Anyway I'm dropping eth1 interface. I'll wait a few days before > confirming if that's the problem or not. > > Thanks again. There's some NAT somewhere btw because the client uses 192.168.0.2 as source address but I don't think that currently has some relevance (it might timeout some connections after an idle period, which is usually of configurable length). -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-27 21:25 ` Ilpo Järvinen @ 2008-08-27 21:42 ` Dâniel Fraga 2008-08-27 22:24 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-27 21:42 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Thu, 28 Aug 2008 00:25:31 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Sure, but that still doesn't explain at all why the expected traffic > doesn't show up anywhere (ie., the working mail after nmap resolution). > Such a router cannot prevent us tcpdumping both ends. My point is that we > would always see the packet at least in the sending end's tcpdump. And, > would we have the traffic from the end-point tcpdumps, we could trivially > figure out what the middlebox did to the traffic. Ok. > Agreed, it shouldn't happen. Do you have a static setup for the IPs or is > there dhcp component which could in theory cause some routing table Static setup. But there were traces of a multipath config I used before and completely forgot (below). > alterations, again I find that unlikely but in the meantime I start to run > out of ideas how the client and server can speak with each other without > leaving any traces about it (I hope you really used exactly the tcpdump > command you told in the mail linking to those stall logs). Yes, you can believe me. Anyway, I disabled the eth1 interface. And there's more! We never had 2 links on this server, although I was always prepared to use multipath and ip route policy (since the plans were to have 2 links). I had 2 commands which I suspect maybe could be messing everything: #ip rule add from ${link0} lookup 1 #ip route add 0/0 via ${gw0} table 1 #ip rule add from ${link1} lookup 2 #ip route add 0/0 via ${gw1} table 2 I deleted all of this. I just use this if I had to links from two ISPs and decided to load balance between them. As it isn't the case... > There's some NAT somewhere btw because the client uses 192.168.0.2 as > source address but I don't think that currently has some relevance (it > might timeout some connections after an idle period, which is usually > of configurable length). Yes, in my client machine, I'm behind a D-Link 524 router. Anyway, give me some days, I'll test with these new changes for a week at least before confirming the problem was solved (and with both frto and htb enabled). If it's really this old Mulitpath misconfiguration, I apologize for my error and the length discussion. Anyway, it's good to register, in the case someone do the same error as me. Well, let's wait to see if the problem has gone. Thank you very much. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-27 21:42 ` Dâniel Fraga @ 2008-08-27 22:24 ` Dâniel Fraga 0 siblings, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-08-27 22:24 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Wed, 27 Aug 2008 18:42:05 -0300 Dâniel Fraga <fragabr@gmail.com> wrote: > If it's really this old Mulitpath misconfiguration, I apologize > for my error and the length discussion. Anyway, it's good to register, > in the case someone do the same error as me. > > Well, let's wait to see if the problem has gone. > > Thank you very much. Well, you can ignore my previous message. Just got a stalled connection. Since I do not have time to collect data now, I need to use nmap to restablish the connection. Ok. So there's nothing to do with multipath. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-26 20:40 ` Ilpo Järvinen 2008-08-26 21:17 ` Dâniel Fraga @ 2008-08-28 21:49 ` Dâniel Fraga 2008-08-29 13:07 ` Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-28 21:49 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Tue, 26 Aug 2008 23:40:58 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > while [ : ]; do > date "+%s.%N" > cat /proc/net/{netstat,snmp,tcp} > sleep 1 > done Ok; Let's try again, now with more data (I hope): 1) tcpdump (just port 995): http://www.abusar.org/stall/dump-client http://www.abusar.org/stall/dump-server http://www.abusar.org/stall/dump-server-loopback I don't know if loopback is useful, just in case... 2) the above script : http://www.abusar.org/stall/script-client-log.txt http://www.abusar.org/stall/script-server-log.txt 3) and strace from the client Claws Mail: http://www.abusar.org/stall/strace-client-claws-mail.txt I forgot to use the -r option in strace. But when Claws-mail stalls, it gives the following multiple times: read(4, 0xf48704, 4096) = -1 EAGAIN (Resource temporarily unavailable) select(5, [4], [4], NULL, NULL) = 1 (out [4]) writev(4, [{"xxxxxxxxxxxxxxx"..., 108}], 1) = 108 select(5, [4], [], NULL, NULL) = 1 (in [4]) read(4, "xxxxxxxxxxxxxxxxxxxxxx"..., 4096) = 108 Well, I hope this time you have more information and I hope I didn't forget anything. If not, let's keep trying. Important: these data were collected with frto disabled (0) and htb disabled too. So it isn't related to frto, neither htb. Thank you! -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-28 21:49 ` Dâniel Fraga @ 2008-08-29 13:07 ` Ilpo Järvinen 2008-08-29 17:41 ` Dâniel Fraga 2008-08-30 6:56 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-29 13:07 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 2179 bytes --] On Thu, 28 Aug 2008, Dâniel Fraga wrote: > Well, I hope this time you have more information and I hope I > didn't forget anything. If not, let's keep trying. Thanks. It took a moment for me to analyze such sheer amount of data, but I'm used to large logs... :-) Can you check during a "normal" time if the ListenOverflows grows with as considerable rate as during the stall (no need to send that log to me, just confirm that it doesn't do that is enough). A little cheat to do that for a logfile (the command I used): grep -A1 "ListenOverflows" <log> | cut -d ' ' -f 21-22 | grep [0-9] > Important: these data were collected with frto disabled (0) and htb > disabled too. So it isn't related to frto, neither htb. I kind of assumed/knew that since the htb patch didn't solve it. ...When you use nmap to resolve, is the time always constant or do you run it until the situation resolves? There are constantly 9 items in sk_ack_backlog (ie., connections which are not yet accept), those connections are in TCP_CLOSE_WAIT, then there are ~7 connections hanging in SYN_RECV which cannot make progress (all of them from a single address besides two flows of yours in SYN_RECV). So I guess that the configured 128 is not related to the number that is given to listen syscall, as it seems to be 9. ...Next we need to find out why dovecot is not accept()ing or is doing that dead slow (the client's state is hardly significant, so I guess it's no longer mandatory to collect it every time)... Can you provide these to familiarize myself a bit to the server's environment (no need to wait for the stall): ps ax | grep dovecot (or whatever the process is named) netstat -p -n -l | grep "995" But you'll mostly have to resort to strace during the stall, I recommend trying to trace just part of the syscalls, eg at least these: strace -e trace=accept,listen,close,shutdown,select ...as it would probably not be wise to make a full dump available (that it would contain every syscall). Alternatively, you can create one full dump for yourself and just grep the relevant parts. There may be need to strace more than one process (all dovecot related). -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-29 13:07 ` Ilpo Järvinen @ 2008-08-29 17:41 ` Dâniel Fraga 2008-09-01 7:11 ` Ilpo Järvinen 2008-08-30 6:56 ` Dâniel Fraga 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-29 17:41 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Fri, 29 Aug 2008 16:07:04 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Can you check during a "normal" time if the ListenOverflows grows with as > considerable rate as during the stall (no need to send that log to me, > just confirm that it doesn't do that is enough). A little cheat to do that > for a logfile (the command I used): > > grep -A1 "ListenOverflows" <log> | cut -d ' ' -f 21-22 | grep [0-9] It does not grow: 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 10953 It stays in this value for a long time. > ...When you use nmap to resolve, is the time always constant or do you run > it until the situation resolves? The time is constant. It takes just 3 seconds to nmap to "solve" the problem. I always have to use Ctrl+C to stop nmap before it completes the scanning because in the first 3 seconds the problem is "solved". > There are constantly 9 items in sk_ack_backlog (ie., connections which are > not yet accept), those connections are in TCP_CLOSE_WAIT, then there are > ~7 connections hanging in SYN_RECV which cannot make progress (all of them > from a single address besides two flows of yours in SYN_RECV). > > So I guess that the configured 128 is not related to the number that > is given to listen syscall, as it seems to be 9. > > ...Next we need to find out why dovecot is not accept()ing or is doing > that dead slow (the client's state is hardly significant, so I guess > it's no longer mandatory to collect it every time)... Would it be useful if I do the same for port 119? Because inn (nntp) stalls too. And proftp too. So I'm sure it isn't related to dovecot, otherwise the other services wouldn't stall too. > Can you provide these to familiarize myself a bit to the server's > environment (no need to wait for the stall): > > ps ax | grep dovecot (or whatever the process is named) fraga@teleporto ~$ ps ax|grep dovecot 2361 ? Ss 0:13 /usr/local/sbin/dovecot 2363 ? S 0:07 dovecot-auth 4751 ? S 0:00 dovecot-auth -w 6133 ? S 0:00 dovecot-auth -w 6134 ? S 0:00 dovecot-auth -w 15963 ? S 0:00 dovecot-auth -w The dovecot-auth I use for postfix too. > netstat -p -n -l | grep "995" fraga@teleporto ~$ sudo netstat -p -n -l | grep "995" Password: tcp 0 0 0.0.0.0:995 0.0.0.0:* LISTEN 2361/dovecot > But you'll mostly have to resort to strace during the stall, I recommend > trying to trace just part of the syscalls, eg at least these: > > strace -e trace=accept,listen,close,shutdown,select > > ...as it would probably not be wise to make a full dump available (that it > would contain every syscall). Alternatively, you can create one full dump > for yourself and just grep the relevant parts. There may be need to strace > more than one process (all dovecot related). Ok, at next stall I'll do that. Maybe it's good to strace inn and proftp too, right? Don't you think it's interesting that http (apache) and ssh never stalls? -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-29 17:41 ` Dâniel Fraga @ 2008-09-01 7:11 ` Ilpo Järvinen 0 siblings, 0 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-01 7:11 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 4074 bytes --] On Fri, 29 Aug 2008, Dâniel Fraga wrote: > On Fri, 29 Aug 2008 16:07:04 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Can you check during a "normal" time if the ListenOverflows grows with as > > considerable rate as during the stall (no need to send that log to me, > > just confirm that it doesn't do that is enough). A little cheat to do that > > for a logfile (the command I used): > > > > grep -A1 "ListenOverflows" <log> | cut -d ' ' -f 21-22 | grep [0-9] > > It does not grow: > > 10953 10953 ...snip... > It stays in this value for a long time. Yeah, a constant one is expected. During the stall it was growing sharply. > > ...When you use nmap to resolve, is the time always constant or do you run > > it until the situation resolves? > > The time is constant. It takes just 3 seconds to nmap to > "solve" the problem. I always have to use Ctrl+C to stop nmap before it > completes the scanning because in the first 3 seconds the problem is > "solved". Thanks (though I hoped the other way around :-)). > > There are constantly 9 items in sk_ack_backlog (ie., connections which are > > not yet accept), those connections are in TCP_CLOSE_WAIT, then there are > > ~7 connections hanging in SYN_RECV which cannot make progress (all of them > > from a single address besides two flows of yours in SYN_RECV). > > > > So I guess that the configured 128 is not related to the number that > > is given to listen syscall, as it seems to be 9. > > > > ...Next we need to find out why dovecot is not accept()ing or is doing > > that dead slow (the client's state is hardly significant, so I guess > > it's no longer mandatory to collect it every time)... > > Would it be useful if I do the same for port 119? Because inn > (nntp) stalls too. And proftp too. So I'm sure it isn't related to > dovecot, otherwise the other services wouldn't stall too. Sure. Whatever of them you feel is the best choice but I doubt there's much benefit from doing that for many at the same time. Once we find out what is happening for one, the others are the same. ftp is problematic to tcpdump. Nntp should be fine I guess. > > Can you provide these to familiarize myself a bit to the server's > > environment (no need to wait for the stall): > > > > ps ax | grep dovecot (or whatever the process is named) > > fraga@teleporto ~$ ps ax|grep dovecot > 2361 ? Ss 0:13 /usr/local/sbin/dovecot > 2363 ? S 0:07 dovecot-auth > 4751 ? S 0:00 dovecot-auth -w > 6133 ? S 0:00 dovecot-auth -w > 6134 ? S 0:00 dovecot-auth -w > 15963 ? S 0:00 dovecot-auth -w > > The dovecot-auth I use for postfix too. > > > netstat -p -n -l | grep "995" > > fraga@teleporto ~$ sudo netstat -p -n -l | grep "995" > Password: > tcp 0 0 0.0.0.0:995 0.0.0.0:* LISTEN 2361/dovecot > > > But you'll mostly have to resort to strace during the stall, I recommend > > trying to trace just part of the syscalls, eg at least these: > > > > strace -e trace=accept,listen,close,shutdown,select > > > > ...as it would probably not be wise to make a full dump available (that it > > would contain every syscall). Alternatively, you can create one full dump > > for yourself and just grep the relevant parts. There may be need to strace > > more than one process (all dovecot related). > > Ok, at next stall I'll do that. > > Maybe it's good to strace inn and proftp too, right? I'm fine with either way. Basically we just want to find out where server processes are waiting when the stall happens. If at least one of them was in accept() but never made progress it's related to wakeup somehow, if not in accept, well, lets reconsider then... > Don't you think it's interesting that http (apache) and ssh never > stalls? It is interesting, yes... but do you have some idea how that would help to solve the problem (I don't)? Only thing that I could think of is that it could related to setsockopt()s they set differently. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-29 13:07 ` Ilpo Järvinen 2008-08-29 17:41 ` Dâniel Fraga @ 2008-08-30 6:56 ` Dâniel Fraga 2008-09-01 7:11 ` Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-08-30 6:56 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Fri, 29 Aug 2008 16:07:04 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > ...as it would probably not be wise to make a full dump available (that it > would contain every syscall). Alternatively, you can create one full dump > for yourself and just grep the relevant parts. There may be need to strace > more than one process (all dovecot related). While waiting for a stall, I was thinking here: is there any chance it could be a bug generated by gcc 4.3? I saw the date gcc 4.3.0 was released and it's just after 2.6.24 and before 2.6.25... I was using gcc 4.3.1 and now 4.3.2... but maybe I could try go back to gcc 4.2.4 to test... Which version of gcc you developers are using? -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-30 6:56 ` Dâniel Fraga @ 2008-09-01 7:11 ` Ilpo Järvinen 2008-09-07 8:17 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-01 7:11 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 1297 bytes --] On Sat, 30 Aug 2008, Dâniel Fraga wrote: > On Fri, 29 Aug 2008 16:07:04 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > ...as it would probably not be wise to make a full dump available (that it > > would contain every syscall). Alternatively, you can create one full dump > > for yourself and just grep the relevant parts. There may be need to strace > > more than one process (all dovecot related). > > While waiting for a stall, I was thinking here: is there any > chance it could be a bug generated by gcc 4.3? I saw the date gcc 4.3.0 > was released and it's just after 2.6.24 and before 2.6.25... > > I was using gcc 4.3.1 and now 4.3.2... but maybe I could try go > back to gcc 4.2.4 to test... That's one option. If you do that, you could try catching two flies at the same time by selecting something else than tickless. > Which version of gcc you developers are using? I guess that on x86 most use some recent/semi-recent by default but there are some with old as well, while the non-x86 archs tend to have more often a bit older gccs I guess. Anyway, if gcc did something wrong, it is still mostly correct, ie., there's just some race (which is likely non-corrupting even). And hitting that might not be very easy for some of the devs. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-01 7:11 ` Ilpo Järvinen @ 2008-09-07 8:17 ` Dâniel Fraga 2008-09-08 10:27 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-09-07 8:17 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Mon, 1 Sep 2008 10:11:25 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > That's one option. If you do that, you could try catching two flies at the > same time by selecting something else than tickless. Hi Ilpo. I *think* I discovered the source of the problem. It's not related to gcc, neither dynticks. I'm almost sure it's related to *High Resolution Timer*. I simply disabled this option and the problems disappeared. I'd like to ask you if it does make sense, based on the problem we've being discussing over these weeks. What's your opinion? Thank you. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-07 8:17 ` Dâniel Fraga @ 2008-09-08 10:27 ` Ilpo Järvinen 2008-09-08 20:20 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-08 10:27 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 1016 bytes --] On Sun, 7 Sep 2008, Dâniel Fraga wrote: > On Mon, 1 Sep 2008 10:11:25 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > That's one option. If you do that, you could try catching two flies at the > > same time by selecting something else than tickless. > > Hi Ilpo. I *think* I discovered the source of the problem. It's > not related to gcc, neither dynticks. > > I'm almost sure it's related to *High Resolution Timer*. I > simply disabled this option and the problems disappeared. > > I'd like to ask you if it does make sense, based on the > problem we've being discussing over these weeks. What's your opinion? It could well be possible, accept seems to call schedule_timeout if nothing is immediately available (but I don't know well enough what end up being hrtimer'ed when you enable them and what will not)... Anyway, how long did you test for that to confirm it? Does this explain the 2.6.24->2.6.25 change in behavior as well (ie., did they got enabled there)? -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-08 10:27 ` Ilpo Järvinen @ 2008-09-08 20:20 ` Dâniel Fraga 2008-09-11 13:44 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-09-08 20:20 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Mon, 8 Sep 2008 13:27:43 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > It could well be possible, accept seems to call schedule_timeout if > nothing is immediately available (but I don't know well enough what > end up being hrtimer'ed when you enable them and what will not)... > Anyway, how long did you test for that to confirm it? It has been five days since I disable high resolution timer and have not got any problems anymore. > Does this explain the 2.6.24->2.6.25 change in behavior as well (ie., > did they got enabled there)? Well, as far as I know, high res timer related, what changed in 2.6.25 is the following: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8f4d37ec073c17e2d4aa8851df5837d798606d6f (although if I disable dynticks, the problem persists) Although in 2.6.24 we already had, for example the x86-32/64 arch reunification and I don't know if it has anything to do with my problem in 2.6.25... just some thoughts... I wrote that because the problem doesn't happen in 32 bit machines, but only in x86_64... Of course I'm not saying for sure that the high res timer is causing this. Maybe, as you said before, the problem is much more complex and realy don't know what in fact uses high res timer. Anyway, I'll leave high res timer disabled for now until we discover something new. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-08 20:20 ` Dâniel Fraga @ 2008-09-11 13:44 ` Ilpo Järvinen 2008-09-11 17:30 ` Dâniel Fraga 2008-09-11 18:12 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-11 13:44 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 1782 bytes --] On Mon, 8 Sep 2008, Dâniel Fraga wrote: > On Mon, 8 Sep 2008 13:27:43 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > It could well be possible, accept seems to call schedule_timeout if > > nothing is immediately available (but I don't know well enough what > > end up being hrtimer'ed when you enable them and what will not)... > > Anyway, how long did you test for that to confirm it? > > It has been five days since I disable high resolution timer and > have not got any problems anymore. > > > Does this explain the 2.6.24->2.6.25 change in behavior as well (ie., > > did they got enabled there)? > > Well, as far as I know, high res timer related, what changed > in 2.6.25 is the following: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8f4d37ec073c17e2d4aa8851df5837d798606d6f > > (although if I disable dynticks, the problem persists) > > Although in 2.6.24 we already had, for example the x86-32/64 > arch reunification and I don't know if it has anything to do with my > problem in 2.6.25... just some thoughts... I wrote that because the > problem doesn't happen in 32 bit machines, but only in x86_64... > > Of course I'm not saying for sure that the high res timer is > causing this. Maybe, as you said before, the problem is much more > complex and realy don't know what in fact uses high res timer. > > Anyway, I'll leave high res timer disabled for now until we > discover something new. ...I guess it would be possible to remove SCHED_FEAT_HRTICK from /proc/sys/kernel/sched_features then while keeping the hrtimers otherwise enabled to test this. It's possible that hrtimers just affect on how easy it is to trigger but at least it seems an useful lead until proven otherwise. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-11 13:44 ` Ilpo Järvinen @ 2008-09-11 17:30 ` Dâniel Fraga 2008-09-12 10:16 ` Ilpo Järvinen 2008-09-11 18:12 ` Dâniel Fraga 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-09-11 17:30 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Thu, 11 Sep 2008 16:44:20 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > ...I guess it would be possible to remove SCHED_FEAT_HRTICK from > /proc/sys/kernel/sched_features then while keeping the hrtimers > otherwise enabled to test this. > > It's possible that hrtimers just affect on how easy it is to trigger > but at least it seems an useful lead until proven otherwise. You're right Ilpo. After days and days without the problem, today it triggered (but I wasn't online at the time, so I couldn't grab any data). So, you're correct. HRtimers just affect on how easy it is to trigger the issue. In other words: with high resolution timer enabled, the problem appears more frequently. At least if we discovered a way how to trigger this, we could test it more easily. The problem is to wait a long time for it to happen. Just a curiosity: on your servers, do you use x86_64? It seems this problem is very specific to x86_64 or appear more often on x86_64 than x86_32. It never happens on my x86_32 bit servers. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-11 17:30 ` Dâniel Fraga @ 2008-09-12 10:16 ` Ilpo Järvinen 2008-09-13 23:31 ` Dâniel Fraga 2008-09-15 19:42 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-12 10:16 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 2414 bytes --] On Thu, 11 Sep 2008, Dâniel Fraga wrote: > On Thu, 11 Sep 2008 16:44:20 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > ...I guess it would be possible to remove SCHED_FEAT_HRTICK from > > /proc/sys/kernel/sched_features then while keeping the hrtimers > > otherwise enabled to test this. > > > > It's possible that hrtimers just affect on how easy it is to trigger > > but at least it seems an useful lead until proven otherwise. > > You're right Ilpo. After days and days without the problem, > today it triggered (but I wasn't online at the time, so I couldn't grab > any data). Thanks. Once we know what the userspace at the server is doing, it might make the problem immediately obvious, though I'm a bit afraid that e.g., strace might interfere with the problem so that it resolves right away and we're again left with nothing... > So, you're correct. HRtimers just affect on how easy it is to > trigger the issue. In other words: with high resolution timer enabled, > the problem appears more frequently. > > At least if we discovered a way how to trigger this, we could > test it more easily. The problem is to wait a long time for it to > happen. > > Just a curiosity: on your servers, I don't really have any I would call "server" in the sense you mean, I might occassionally set up one for test from time to time for a very limited period but normally it's just ssh and some other which I use so rarely that I'd hardly notice, and that's it. I was planning, however, to setup some day a distcc stress test using all my spare cpu cycles (I'd like to put it under kvm but that got stalled due to some timing issue at the guest making it to go into an infinite loop), once I get that working I could probably easily put other test-only stuff to that framework as well. But but, there are other people around the world besides us :-), and afaict this is the only (outstanding) report which relates to ceasing of accept() so I doubt it's something very regularly occuring thing or we would have heard of it. > do you use x86_64? At least on some machines, but like you have discovered it seems to service dependant, so that some processes never got stuck, I might only run such or so, who knows... > It seems > this problem is very specific to x86_64 or appear more often on x86_64 > than x86_32. It never happens on my x86_32 bit servers. Ok. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-12 10:16 ` Ilpo Järvinen @ 2008-09-13 23:31 ` Dâniel Fraga 2008-09-16 12:10 ` Ilpo Järvinen 2008-09-15 19:42 ` Dâniel Fraga 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-09-13 23:31 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Fri, 12 Sep 2008 13:16:19 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Ok. Ilpo, except for DROP [INPUT] lines, does the log below means something to you? Sep 13 20:01:21 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20607 DF PROTO=TCP SPT=4038 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:21 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20616 DF PROTO=TCP SPT=4054 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:21 teleporto vmunix: C193.8. S=5.5.5.5 E=8TS00 RC00 T=1 D262DF PROTO=TCP SPT=4146 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20690 DF PROTO=TCP SPT=4179 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20698 DF PROTO=TCP SPT=4201 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20707 DF PROTO=TCP SPT=4231 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20725 DF PROTO=TCP SPT=4294 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:22 teleporto vmunix: OOTPST45 P=89WNO=53 E=x0SNUG= I mean these lines: Sep 13 20:01:21 teleporto vmunix: C193.8. S=5.5.5.5 E=8TS00 RC00 T=1 D262DF PROTO=TCP SPT=4146 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 Sep 13 20:01:22 teleporto vmunix: OOTPST45 P=89WNO=53 E=x0SNUG= This was registered during a stall. I didn't collect more data because I had to restore the server as fast as I can. If it doesn't help or doen't mean anything useful, please ignore. -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-13 23:31 ` Dâniel Fraga @ 2008-09-16 12:10 ` Ilpo Järvinen 2008-09-16 14:24 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-16 12:10 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 3578 bytes --] I've copied stuff from the other mail to here... Sorry for the delay, I had already looked into it but left it as postponed and I've been busy in other things... On Sat, 13 Sep 2008, Dâniel Fraga wrote: > Ilpo, except for DROP [INPUT] lines, does the log below means something to you? > > Sep 13 20:01:21 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20607 DF PROTO=TCP SPT=4038 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 > Sep 13 20:01:21 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20616 DF PROTO=TCP SPT=4054 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 > Sep 13 20:01:21 teleporto vmunix: C193.8. S=5.5.5.5 E=8TS00 RC00 T=1 D262DF PROTO=TCP SPT=4146 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 > Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20690 DF PROTO=TCP SPT=4179 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 > Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20698 DF PROTO=TCP SPT=4201 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 > Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20707 DF PROTO=TCP SPT=4231 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 > Sep 13 20:01:22 teleporto vmunix: DROP [INPUT]: IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:50:73:6c:4b:e5:08:00 SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20725 DF PROTO=TCP SPT=4294 DPT=4899 WINDOW=65535 RES=0x00 SYN URGP=0 > Sep 13 20:01:22 teleporto vmunix: OOTPST45 P=89WNO=53 E=x0SNUG= > > I mean these lines: > > SRC=189.31.180.2 DST=255.255.255.255 LEN=48 TOS=0x00 PREC=0x00 TTL=116 ID=20616 > C193.8. S=5.5.5.5 E=8TS00 RC00 T=1 D262 ...Funny, it's printing every second character of a correct line. How that can happen, other people are much more qualified to give a meaningful answer... > Sep 13 20:01:22 teleporto vmunix: OOTPST45 P=89WNO=53 E=x0SNUG= > > This strange kernel messages are normal or is this clearly something > buggy? This is logged whenever the connection is stalled. I've no idea how they get generated. > Anyway, I'm 50% almost sure that it's something related to ntpd > adjusting time. I do not mean that whenever ntpd syncs > the time, the connection is stalled but I need a few more weeks to > assure this. How positive you actually are that it's exactly at that time? Ie., have you really check that the timing really matches as ntp syncs time every now and then, I wouldn't be surprised if it would happen "always" close enough to give a false alarm. "50% almost sure" didn't sound that convincing (whatever it means in the first place). > Basically without ntpd, everything is fine, but when ntpd is > running, the stall happens. Maybe the kernel gets confused when > ntpd changes the time? It shouldn't happen of course. I'll reply in a > few weeks. Thanks. Only thing I know to ask is, do you have any idea if your ntpd is hard-stepping the time instead of adjusting the clock's rate a bit (the latter should keep the clock monotonious besides potential bugs)? -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-16 12:10 ` Ilpo Järvinen @ 2008-09-16 14:24 ` Dâniel Fraga 2008-09-17 10:23 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-09-16 14:24 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Tue, 16 Sep 2008 15:10:33 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Only thing I know to ask is, do you have any idea if your ntpd is > hard-stepping the time instead of adjusting the clock's rate a bit (the > latter should keep the clock monotonious besides potential bugs)? I assume it's adjusting the clock's rate a bit. Anyway, it's a pretty simple config: fraga@teleporto ~$ cat /etc/ntp.conf server ntp.usp.br server ntp.nasa.gov driftfile /etc/ntp.drift And ntpd is running without any special parameters. The log messages are as simple as: Sep 15 03:56:04 teleporto ntpd[2301]: frequency initialized 5.891 PPM from /etc/ntp.drift Sep 15 03:59:25 teleporto ntpd[2304]: frequency initialized 5.891 PPM from /etc/ntp.drift Sep 15 04:03:49 teleporto ntpd[2304]: synchronized to 143.107.255.15, stratum 2 Sep 15 04:03:49 teleporto ntpd[2304]: kernel time sync status change 0001 Sep 15 04:10:16 teleporto ntpd[2304]: synchronized to 198.123.30.132, stratum 1 Sep 15 04:11:58 teleporto ntpd[2301]: frequency initialized 5.891 PPM from /etc/ntp.drift Sep 15 04:16:18 teleporto ntpd[2301]: synchronized to 198.123.30.132, stratum 1 Sep 15 04:16:18 teleporto ntpd[2301]: kernel time sync status change 0001 Sep 15 12:08:53 teleporto ntpd[2301]: kernel time sync status change 4001 Sep 15 12:34:31 teleporto ntpd[2301]: kernel time sync status change 0001 Sep 15 14:34:06 teleporto ntpd[2301]: kernel time sync status change 4001 Sep 15 14:51:12 teleporto ntpd[2301]: kernel time sync status change 0001 If I understood correctly what do you mean, ntpd adjusts nicely the time to not cause huge differences in the time. And we're reaching the conclusion that the timer code from 2.6.25 and above have something wrong, since 2.6.24 and below is ok, which causes those stalls. But I'll wait some more time to confirm this, although I'm almost sure it's a timer related bug which has this colateral effect of stalling connections. And just a question: do you use ntpd on your own desktop? Thank you! -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-16 14:24 ` Dâniel Fraga @ 2008-09-17 10:23 ` Ilpo Järvinen 2008-09-18 20:35 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-17 10:23 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 2600 bytes --] On Tue, 16 Sep 2008, Dâniel Fraga wrote: > On Tue, 16 Sep 2008 15:10:33 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Only thing I know to ask is, do you have any idea if your ntpd is > > hard-stepping the time instead of adjusting the clock's rate a bit (the > > latter should keep the clock monotonious besides potential bugs)? > > I assume it's adjusting the clock's rate a bit. Anyway, it's a > pretty simple config: > > fraga@teleporto ~$ cat /etc/ntp.conf > server ntp.usp.br > server ntp.nasa.gov > driftfile /etc/ntp.drift > > And ntpd is running without any special parameters. > > The log messages are as simple as: > > Sep 15 03:56:04 teleporto ntpd[2301]: frequency initialized 5.891 PPM from /etc/ntp.drift > Sep 15 03:59:25 teleporto ntpd[2304]: frequency initialized 5.891 PPM from /etc/ntp.drift > Sep 15 04:03:49 teleporto ntpd[2304]: synchronized to 143.107.255.15, stratum 2 > Sep 15 04:03:49 teleporto ntpd[2304]: kernel time sync status change 0001 > Sep 15 04:10:16 teleporto ntpd[2304]: synchronized to 198.123.30.132, stratum 1 > Sep 15 04:11:58 teleporto ntpd[2301]: frequency initialized 5.891 PPM from /etc/ntp.drift > Sep 15 04:16:18 teleporto ntpd[2301]: synchronized to 198.123.30.132, stratum 1 > Sep 15 04:16:18 teleporto ntpd[2301]: kernel time sync status change 0001 > Sep 15 12:08:53 teleporto ntpd[2301]: kernel time sync status change 4001 > Sep 15 12:34:31 teleporto ntpd[2301]: kernel time sync status change 0001 > Sep 15 14:34:06 teleporto ntpd[2301]: kernel time sync status change 4001 > Sep 15 14:51:12 teleporto ntpd[2301]: kernel time sync status change 0001 I was to look where these (or actually the ones you mentioned earlier) messages exactly originate from in the source of ntpd but didn't yet have time. > If I understood correctly what do you mean, ntpd adjusts nicely the time to not > cause huge differences in the time. It is definately the default, if it's even possible to configure ntpd to just set forcibly the new time (with ntpdate you can decide that with -b/-B switch iirc). > And we're reaching the conclusion that the timer code from 2.6.25 > and above have something wrong, since 2.6.24 and below is ok, which > causes those stalls. There were some other timer related complications in 2.6.25 but it's so long time ago that I hardly remember anything about those anymore (and I'm not an expert on those things anyway). And it's still very open issue how that would cause the problem you're seeing. > And just a question: do you use ntpd on your own desktop? Yes. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-17 10:23 ` Ilpo Järvinen @ 2008-09-18 20:35 ` Dâniel Fraga 2008-09-18 21:04 ` Ilpo Järvinen 0 siblings, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-09-18 20:35 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Wed, 17 Sep 2008 13:23:28 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > There were some other timer related complications in 2.6.25 but it's so > long time ago that I hardly remember anything about those anymore (and > I'm not an expert on those things anyway). And it's still very open issue > how that would cause the problem you're seeing. I opened a bug report to timer developers... let's see if they can help: http://bugzilla.kernel.org/show_bug.cgi?id=11588 -- ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-18 20:35 ` Dâniel Fraga @ 2008-09-18 21:04 ` Ilpo Järvinen 2008-09-21 3:02 ` Dâniel Fraga 2008-09-22 4:23 ` Dâniel Fraga 0 siblings, 2 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-18 21:04 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 1362 bytes --] On Thu, 18 Sep 2008, Dâniel Fraga wrote: > On Wed, 17 Sep 2008 13:23:28 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > There were some other timer related complications in 2.6.25 but it's so > > long time ago that I hardly remember anything about those anymore (and > > I'm not an expert on those things anyway). And it's still very open issue > > how that would cause the problem you're seeing. > > I opened a bug report to timer developers... > let's see if they can help: > > http://bugzilla.kernel.org/show_bug.cgi?id=11588 Ok. Another potential candidate might be scheduler (my wording was sloppy when I used "timer related complications" while time related was my main intention)... Anyway, if/when you succeed collecting some strace of the server processes, please let me know (though putting a full one available might not be wise thing like I said earlier). After I thought it a bit, it might be enough the start the strace with -p for all server processes of a service during a stall and then resolve it after some amount of waiting with nmap (and hope that strace doesn't resolve it by interfering something relevant :-), you will see that from the fact that it resolves without nmap then). That would probably reveal if the processes where waiting in accept() or not, and if not, where they were. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-18 21:04 ` Ilpo Järvinen @ 2008-09-21 3:02 ` Dâniel Fraga 2008-09-22 4:23 ` Dâniel Fraga 1 sibling, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-09-21 3:02 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Fri, 19 Sep 2008 00:04:23 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Anyway, if/when you succeed collecting some strace of the server > processes, please let me know (though putting a full one available might > not be wise thing like I said earlier). After I thought it a bit, it might > be enough the start the strace with -p for all server processes of a > service during a stall and then resolve it after some amount of waiting > with nmap (and hope that strace doesn't resolve it by interfering > something relevant :-), you will see that from the fact that it resolves > without nmap then). That would probably reveal if the processes where > waiting in accept() or not, and if not, where they were. I got a stall, tried to use strace but even strace couldn't trace nothing. Everything which uses some kind of network connection is stalled (or because everything is stalled, strace couldn't trace anything). I'll try to leave strace running all the time, but I'm afraid it could prevent the stall. Anyway, I'll test and return soon. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-18 21:04 ` Ilpo Järvinen 2008-09-21 3:02 ` Dâniel Fraga @ 2008-09-22 4:23 ` Dâniel Fraga 2008-09-22 11:22 ` Ilpo Järvinen 1 sibling, 1 reply; 107+ messages in thread From: Dâniel Fraga @ 2008-09-22 4:23 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Fri, 19 Sep 2008 00:04:23 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Anyway, if/when you succeed collecting some strace of the server > processes, please let me know (though putting a full one available might > not be wise thing like I said earlier). After I thought it a bit, it might > be enough the start the strace with -p for all server processes of a > service during a stall and then resolve it after some amount of waiting > with nmap (and hope that strace doesn't resolve it by interfering > something relevant :-), you will see that from the fact that it resolves > without nmap then). That would probably reveal if the processes where > waiting in accept() or not, and if not, where they were. Hi again Ilpo, I waited the whole day for a stall, and fortunatelly it happened while I was stracing dovecot and child processes. The stall happened at 01:11 (at the end). I hope that it has something useful. http://www.abusar.org/strace/dovecot.txt.bz2 I then nmap'ed the server and killed strace. I used the following: strace -t -p 2315 -f -e trace=accept,listen,close,shutdown,select -o dovecot.txt -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-22 4:23 ` Dâniel Fraga @ 2008-09-22 11:22 ` Ilpo Järvinen 2008-09-22 16:13 ` Dâniel Fraga 0 siblings, 1 reply; 107+ messages in thread From: Ilpo Järvinen @ 2008-09-22 11:22 UTC (permalink / raw) To: Dâniel Fraga Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: TEXT/PLAIN, Size: 2166 bytes --] On Mon, 22 Sep 2008, Dâniel Fraga wrote: > On Fri, 19 Sep 2008 00:04:23 +0300 (EEST) > "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > > > Anyway, if/when you succeed collecting some strace of the server > > processes, please let me know (though putting a full one available might > > not be wise thing like I said earlier). After I thought it a bit, it might > > be enough the start the strace with -p for all server processes of a > > service during a stall and then resolve it after some amount of waiting > > with nmap (and hope that strace doesn't resolve it by interfering > > something relevant :-), you will see that from the fact that it resolves > > without nmap then). That would probably reveal if the processes where > > waiting in accept() or not, and if not, where they were. > > Hi again Ilpo, I waited the whole day for a stall, and > fortunatelly it happened while I was stracing dovecot and child > processes. The stall happened at 01:11 (at the end). I hope that it > has something useful. It definately shows a stall, there are _no_ events between 0:53 and 1:11 while there isn't any other period like that, every other minute since the start has some activity going on :-). So this might not be related to networking at all like we've kind of already figured out (definately accept() has very little to do here). There weren't close()'es there either so it looks very stuck on something that's outside of the syscalls we listed in -e, I suppose... It seems that next sensible step is to just obtain a full strace to see what actually took place during those long minutes if anything (it's better that you keep that log private and just use grep over it on request). ...A full strace might grow huge though. Also, for strace use -tt instead of -t to get more accurate timestamps and add -T. When you get the stall next time, please also check that the processes are actually sleeping instead of looping like crazy in some buggy userspace code :-) (obviously before resolving it with nmap). When using nmap to resolve, take note on exact timestamp (including seconds). E.g., $ date > nmap.ts; nmap ... -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-22 11:22 ` Ilpo Järvinen @ 2008-09-22 16:13 ` Dâniel Fraga 0 siblings, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-09-22 16:13 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Mon, 22 Sep 2008 14:22:12 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > It seems that next sensible step is to just obtain a full strace to see > what actually took place during those long minutes if anything (it's > better that you keep that log private and just use grep over it on > request). ...A full strace might grow huge though. Also, for strace use > -tt instead of -t to get more accurate timestamps and add -T. > > When you get the stall next time, please also check that the processes are > actually sleeping instead of looping like crazy in some buggy userspace > code :-) (obviously before resolving it with nmap). > > When using nmap to resolve, take note on exact timestamp (including > seconds). E.g., > $ date > nmap.ts; nmap ... Thanks! Today I'm lucky. I got the stall fast. It seems that it happens more frequently as more connections are made. What should I grep? -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-12 10:16 ` Ilpo Järvinen 2008-09-13 23:31 ` Dâniel Fraga @ 2008-09-15 19:42 ` Dâniel Fraga 1 sibling, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-09-15 19:42 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Fri, 12 Sep 2008 13:16:19 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > Ok. Sep 15 09:22:38 teleporto vmunix: 600 RC00 T=4 D337POOIM YE3CD=3[R=93.812DT1210624LN4 O=x0PE=x0TL5 D0D RT=C NOPEE[ ye]] Sep 15 09:53:49 teleporto vmunix: 6DO IPT:I=t0OT A=ff:ff:ff:05:36:b50:0SC13192522DT25252525LN4 O=x0PE=x0TL12I= FPOOTPST51 P=31 IDW650RS00 C Y RP0 This strange kernel messages are normal or is this clearly something buggy? This is logged whenever the connection is stalled. Anyway, I'm 50% almost sure that it's something related to ntpd adjusting time. I do not mean that whenever ntpd syncs the time, the connection is stalled but I need a few more weeks to assure this. Basically without ntpd, everything is fine, but when ntpd is running, the stall happens. Maybe the kernel gets confused when ntpd changes the time? It shouldn't happen of course. I'll reply in a few weeks. Thanks. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-09-11 13:44 ` Ilpo Järvinen 2008-09-11 17:30 ` Dâniel Fraga @ 2008-09-11 18:12 ` Dâniel Fraga 1 sibling, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-09-11 18:12 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, netfilter-devel, kadlec On Thu, 11 Sep 2008 16:44:20 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > ...I guess it would be possible to remove SCHED_FEAT_HRTICK from > /proc/sys/kernel/sched_features then while keeping the hrtimers > otherwise enabled to test this. > > It's possible that hrtimers just affect on how easy it is to trigger > but at least it seems an useful lead until proven otherwise. Well, I have a new suspect now: ntpd. It seems that when ntpd syncs the clock, the problem happens (just a guess): Sep 11 13:55:31 tux ntpd[2652]: synchronized to 143.107.255.15, stratum 2 Sep 11 13:55:31 tux ntpd[2652]: kernel time sync enabled 0001 I disabled ntpd (and I'll just sync the clock with ntpdate just one time at the boot) and see what happens. I think the problem could be related to this, since "sudo" is affected too and as far as I know, sudo is very sensible to timer. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-15 7:06 ` Ilpo Järvinen 2008-08-15 21:35 ` Dâniel Fraga @ 2008-08-15 21:59 ` Dâniel Fraga 1 sibling, 0 replies; 107+ messages in thread From: Dâniel Fraga @ 2008-08-15 21:59 UTC (permalink / raw) To: Ilpo Järvinen Cc: David Miller, thomas.jarosch, billfink, Netdev, Patrick Hardy, sr, netfilter-devel, kadlec On Fri, 15 Aug 2008 10:06:39 +0300 (EEST) "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi> wrote: > I would be better to have tcpdump running at least a bit back (2-3 windows > back is long enough for me), but obviously that might not be possible > option because it occurs so rarely. ...It should be possible to have > tcpdump restarted once in a while to avoid a one huge log if you'd just > keep running tcpdump from beginning. Ok. > What do you mean by "come back alive"...? ...In eth0 log I found this > connection 189.38.18.122.995 > 192.168.0.2.35477, the ip matches with > abusar's. But I'm not sure if the connection in the tunnel is the > interesting one, since it's going to/from port 119 but the ip addresses > (10.195.195.2 and 10.195.195.1) don't tell anything to me, I guess you > know their meaning (ie., if 10.195.195.2 is the one with which the > connection stalls)? ...You're probably right that this wasn't very useful > log, the longest "stall" I find is only 1.111328 seconds long (and it > might be due to some processing that is made by 10.195.195.2). By "come back alive" I mean when the connection isn't stalled anymore. 189.38.18.122 -> server 10.195.195.1 -> my local VPN ip (tun1) 10.195.195.2 -> remote VPN ip (on the server) 192.168.0.2 -> my local ip (eth0) Should I run tcpdump on the server too, or is it sufficient to dump just on my client machine? Thank you very much again. -- -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-12 8:18 ` David Miller 2008-08-12 17:43 ` Dâniel Fraga @ 2008-08-13 8:00 ` Thomas Jarosch 1 sibling, 0 replies; 107+ messages in thread From: Thomas Jarosch @ 2008-08-13 8:00 UTC (permalink / raw) To: David Miller Cc: ilpo.jarvinen, billfink, fragabr, netdev, kaber, sr, netfilter-devel, kadlec On Tuesday, 12. August 2008 10:18:22 David Miller wrote: > From: Thomas Jarosch <thomas.jarosch@intra2net.com> > Date: Tue, 12 Aug 2008 09:46:17 +0200 > > > David, I agree with you, though I'm not sure about the end user > > experience: > > We had the same situation with ECN and window scaling, and my proposal > is the same as how we handled those situations involving broken > middleware boxes. Yes, that is true. IMHO there's a slight difference with FRTO trouble compared to ECN/window scaling issues: ECN trouble -> No access at all Broken window scaling -> Large transfers don't work MTU issues -> No access at all / large transfers don't work FRTO problems -> Hard to spot as they only happen when packet loss occurs. Though I guess Ilpo knows best if there's an "easy" way to detect this or not. Thomas ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-12 7:46 ` Thomas Jarosch 2008-08-12 8:18 ` David Miller @ 2008-08-22 21:18 ` Ilpo Järvinen 1 sibling, 0 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-22 21:18 UTC (permalink / raw) To: Thomas Jarosch Cc: David Miller, billfink, fragabr, Netdev, Patrick Hardy, netfilter-devel, kadlec [-- Attachment #1: Type: text/plain, Size: 2460 bytes --] On Tue, 12 Aug 2008, Thomas Jarosch wrote: > On Monday, 11. August 2008 23:44:21 David Miller wrote: > > Trying to come up with a signature for this bogus stuff is both time > > consuming and having a risk of false positives. And I really question > > whether this thing is worth it. > > > > The sane thing to do in this case is to declare the box inoperative > > and that it needs to be fixed to avoid this behavior. > > > > Any reasonable congestion control scheme is going to run into problems > > trying to react to the packet patterns this thing creates. It is > > therefore not really limited to FRTO so it really shouldn't be treated > > like an FRTO problem even though it shows up more pronounced when > > FRTO is enabled. > > David, I agree with you, though I'm not sure about the end user experience: > > The kernel is an early adopter of FRTO and will be bitten by bugs of other > TCP implementations like we've experienced. I guess most affected users > just see stalled or slow connections and won't have the time or knowledge > to debug this. This is hardly a big problem. Much bigger problem seems to be that some distros base to 2.6.24 and did not take TCP fixes that were put to 2.6.25.7 but not to 2.6.24.y series because it wasn't updated anymore. There are hardly any other reports but for 2.6.24 (and the ones which we have have gone through @ netdev to fix the bugs / problems) in the ones I've seen. > A proper warning could help them and the kernel > developers to get this issue solved as quickly as possible. > > We called the hotline of the ISP several times and they always claimed > sending big mails with Outlook/Windows works, so it must be linux's fault. > That view of things is totally biased, but it's something I want to make sure > people can't get away with easily :-) I should probably one day check how vista's frto is behaving itself to know better... ...but I guess they'll be running to some problems with big mails pretty soon... ;-) In the meantime, can you check the attached patches. Besides the kernel patch, you need to build your own patched iproute2 as well to configure the features (ip tool among them is enough in case the build of some other part of the toolset fails like it did for me). I somewhat tested them, and the result seemed to be what I'd expect (I just forced RTOs with some netem heavy dropping and quickly glanced over the resulting packet patterns near RTO). -- i. [-- Attachment #2: Type: text/plain, Size: 1764 bytes --] From b4d1efcf1d4384296d6d6b4f8378f8c408cefc98 Mon Sep 17 00:00:00 2001 From: =?ISO-8859-1?q?Ilpo=20J=E4rvinen?= <ilpo.jarvinen@helsinki.fi> Date: Tue, 19 Aug 2008 08:20:16 +0300 Subject: [PATCH] tcp/frto: make frto per route configurable MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit Needs iproute2 support since it isn't able to set RTAX_FEATURES currently (ie., also the other TCP variant related RTAX_FEATUREs won't work, they've been unused since the addition in 2003 or so). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/linux/rtnetlink.h | 1 + net/ipv4/tcp_input.c | 4 ++++ 2 files changed, 5 insertions(+), 0 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index f4d386c..e628062 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -373,6 +373,7 @@ enum #define RTAX_FEATURE_SACK 0x00000002 #define RTAX_FEATURE_TIMESTAMP 0x00000004 #define RTAX_FEATURE_ALLFRAG 0x00000008 +#define RTAX_FEATURE_FRTO 0x00000010 struct rta_session { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1f5e604..4f1cc0e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1709,11 +1709,15 @@ int tcp_use_frto(struct sock *sk) { const struct tcp_sock *tp = tcp_sk(sk); const struct inet_connection_sock *icsk = inet_csk(sk); + struct dst_entry *dst = __sk_dst_get(sk); struct sk_buff *skb; if (!sysctl_tcp_frto) return 0; + if (dst && (dst_metric(dst, RTAX_FEATURES) & RTAX_FEATURE_FRTO)) + return 0; + /* MTU probe and F-RTO won't really play nicely along currently */ if (icsk->icsk_mtup.probe_size) return 0; -- 1.5.2.2 [-- Attachment #3: Type: text/plain, Size: 5146 bytes --] From 59d7878c04eb9571c58baf78bfd07b169d3e5c0d Mon Sep 17 00:00:00 2001 From: =?ISO-8859-1?q?Ilpo=20J=E4rvinen?= <ilpo.jarvinen@helsinki.fi> Date: Fri, 22 Aug 2008 14:49:00 +0300 Subject: [PATCH] iproute2: enable setting of per route features MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit The kernel has had an entry for per route RTAX_FEATURES which was added as unused back in 2003. Allow setting them now. It seems that it's much more sensible to have the meaning negated because otherwise the meaning of zero is very ambiguous, ie., does it mean that feature is turned off or not given. Besides, this matches what one would expect in the intented use-case, where we have global settings from sysctl and want to work-around something per route (ie., disable an otherwise enabled feature). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> --- include/linux/rtnetlink.h | 1 + ip/iproute.c | 58 +++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 54 insertions(+), 5 deletions(-) diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index c1f2d50..354a6f1 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -373,6 +373,7 @@ enum #define RTAX_FEATURE_SACK 0x00000002 #define RTAX_FEATURE_TIMESTAMP 0x00000004 #define RTAX_FEATURE_ALLFRAG 0x00000008 +#define RTAX_FEATURE_FRTO 0x00000010 struct rta_session { diff --git a/ip/iproute.c b/ip/iproute.c index 2a8f3f8..d4a90fc 100644 --- a/ip/iproute.c +++ b/ip/iproute.c @@ -52,6 +52,20 @@ static const char *mx_names[RTAX_MAX+1] = { [RTAX_FEATURES] = "features", [RTAX_RTO_MIN] = "rto_min", }; + +struct valname { + unsigned int val; + const char *name; +}; + +static const struct valname features[] = { + { RTAX_FEATURE_ECN, "ecn" }, + { RTAX_FEATURE_SACK, "sack" }, + { RTAX_FEATURE_TIMESTAMP, "timestamps" }, + { RTAX_FEATURE_TIMESTAMP, "ts" }, + { RTAX_FEATURE_FRTO, "frto"}, +}; + static void usage(void) __attribute__((noreturn)); static void usage(void) @@ -73,7 +87,7 @@ static void usage(void) fprintf(stderr, " [ rtt TIME ] [ rttvar TIME ]\n"); fprintf(stderr, " [ window NUMBER] [ cwnd NUMBER ] [ initcwnd NUMBER ]\n"); fprintf(stderr, " [ ssthresh NUMBER ] [ realms REALM ] [ src ADDRESS ]\n"); - fprintf(stderr, " [ rto_min TIME ]\n"); + fprintf(stderr, " [ rto_min TIME ] [ features DISABLED_FEATURES ]\n"); fprintf(stderr, "TYPE := [ unicast | local | broadcast | multicast | throw |\n"); fprintf(stderr, " unreachable | prohibit | blackhole | nat ]\n"); fprintf(stderr, "TABLE_ID := [ local | main | default | all | NUMBER ]\n"); @@ -83,6 +97,8 @@ static void usage(void) fprintf(stderr, "NHFLAGS := [ onlink | pervasive ]\n"); fprintf(stderr, "RTPROTO := [ kernel | boot | static | NUMBER ]\n"); fprintf(stderr, "TIME := NUMBER[s|ms|us|ns|j]\n"); + fprintf(stderr, "DISABLED_FEATURES := sack | timestamps | ts | ecn | frto |\n"); + fprintf(stderr, " [ DISABLED_FEATURES ]\n"); exit(-1); } @@ -505,10 +521,8 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) if (mxlock & (1<<i)) fprintf(fp, " lock"); - if (i != RTAX_RTT && i != RTAX_RTTVAR && - i != RTAX_RTO_MIN) - fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i])); - else { + if (i == RTAX_RTT || i == RTAX_RTTVAR || + i == RTAX_RTO_MIN) { unsigned long long val = *(unsigned*)RTA_DATA(mxrta[i]); val *= 1000; @@ -520,6 +534,16 @@ int print_route(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg) fprintf(fp, " %llums", val/hz); else fprintf(fp, " %.2fms", (float)val/hz); + } else if (i == RTAX_FEATURES) { + int j; + unsigned int f = *(unsigned*)RTA_DATA(mxrta[i]); + for (j = 0; j < ARRAY_SIZE(features); j++) + if (f & features[j].val) { + fprintf(fp, " %s", features[j].name); + f &= ~features[j].val; + } + } else { + fprintf(fp, " %u", *(unsigned*)RTA_DATA(mxrta[i])); } } } @@ -851,6 +875,30 @@ int iproute_modify(int cmd, unsigned flags, int argc, char **argv) if (get_unsigned(&win, *argv, 0)) invarg("\"ssthresh\" value is invalid\n", *argv); rta_addattr32(mxrta, sizeof(mxbuf), RTAX_SSTHRESH, win); + } else if (matches(*argv, "features") == 0) { + int j; + unsigned int f = 0; + NEXT_ARG(); + while (1) { + for (j = 0; j < ARRAY_SIZE(features); j++) { + if (strcmp(*argv, features[j].name) == 0) { + f |= features[j].val; + if (!NEXT_ARG_OK()) + goto feat_out; + NEXT_ARG(); + break; + } + } + if (j == ARRAY_SIZE(features)) { + if (f) + PREV_ARG(); + break; + } + } +feat_out: + if (!f) + invarg("\"features\" list is invalid\n", *argv); + rta_addattr32(mxrta, sizeof(mxbuf), RTAX_FEATURES, f); } else if (matches(*argv, "realms") == 0) { __u32 realm; NEXT_ARG(); -- 1.5.2.2 ^ permalink raw reply related [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-08 4:42 ` Bill Fink 2008-08-08 10:32 ` Ilpo Järvinen @ 2008-08-11 21:41 ` David Miller 1 sibling, 0 replies; 107+ messages in thread From: David Miller @ 2008-08-11 21:41 UTC (permalink / raw) To: billfink Cc: ilpo.jarvinen, fragabr, thomas.jarosch, netdev, kaber, sr, netfilter-devel, kadlec From: Bill Fink <billfink@mindspring.com> Date: Fri, 8 Aug 2008 00:42:31 -0400 > Since you suspect the problem is being caused by a broken middlebox, > would it perhaps be a better approach to add a per-route option to > allow disabling of FRTO for the given destination. This would be > similar to Stephen Hemminger's fix for broken middleboxes that don't > handle window scaling properly. It seems this would be better than > modifying FRTO behavior for everyone else that is being compliant. This is the kind of direction I'm leaning towards as well. The behavior of these middleboxes borders on unbelievable. And there comes a point where catering to these various busted boxes stops to make sense. At some point we have to say "sorry, someone has to get that box fixed." You can't reorder packets like that, on purpose, and not expect some new, yet reasonable, TCP algorithm to fall flat on it's face. ^ permalink raw reply [flat|nested] 107+ messages in thread
* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround
@ 2008-08-25 16:45 Thomas Jarosch
2008-08-26 12:03 ` Ilpo Järvinen
0 siblings, 1 reply; 107+ messages in thread
From: Thomas Jarosch @ 2008-08-25 16:45 UTC (permalink / raw)
To: Netdev; +Cc: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 1598 bytes --]
Forward mail. Upgrading to KDE 4.1.0/kdepim4 from KDE 3.5.9
enabled HTML emails by default and I didn't notice it before.
---------- Forwarded Message ----------
Subject: Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround
Date: Monday, 25. August 2008
From: Thomas Jarosch <thomas.jarosch@intra2net.com>
To: "Ilpo Järvinen" <ilpo.jarvinen@helsinki.fi>
On Friday, 22. August 2008 23:18:44 Ilpo Järvinen wrote:
> In the meantime, can you check the attached patches. Besides the kernel
> patch, you need to build your own patched iproute2 as well to configure
> the features (ip tool among them is enough in case the build of some other
> part of the toolset fails like it did for me). I somewhat tested them, and
> the result seemed to be what I'd expect (I just forced RTOs with some
> netem heavy dropping and quickly glanced over the resulting packet
> patterns near RTO).
Your patches work fine.
I've noticed two small things:
1. Maybe it's a good idea to add a note above the tcp_use_frto() change
to explain that the value is negated. Took me a while to figure out
why there is no "!" in there :-)
2. Maybe rename the "features" option in iproute2 to "disable_features".
Then it would be more intuitive what it does.
btw: If you apply something to the iproute2 git tree,
I got a compiler error while testing the patch:
In file included from lnstat.c:40:
lnstat.h:28: error: field 'last_read' has incomplete type
lnstat.h:29: error: field 'interval' has incomplete type
Attached small patch fixes the issue.
Cheers,
Thomas
[-- Attachment #2: iproute2-fix-include-for-timeval.patch --]
[-- Type: text/x-patch, Size: 552 bytes --]
Fix this compile error:
In file included from lnstat.c:40:
lnstat.h:28: error: field 'last_read' has incomplete type
lnstat.h:29: error: field 'interval' has incomplete type
Signed-off-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
diff -u -r iproute2-2.6.25/misc/lnstat.h iproute2.timeval/misc/lnstat.h
--- iproute2-2.6.25/misc/lnstat.h Thu Apr 17 19:12:54 2008
+++ iproute2.timeval/misc/lnstat.h Mon Aug 25 17:48:33 2008
@@ -2,6 +2,7 @@
#define _LNSTAT_H
#include <limits.h>
+#include <sys/time.h>
#define LNSTAT_VERSION "0.02 041002"
^ permalink raw reply [flat|nested] 107+ messages in thread* Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround 2008-08-25 16:45 Thomas Jarosch @ 2008-08-26 12:03 ` Ilpo Järvinen 0 siblings, 0 replies; 107+ messages in thread From: Ilpo Järvinen @ 2008-08-26 12:03 UTC (permalink / raw) To: Thomas Jarosch; +Cc: Netdev, netfilter-devel [-- Attachment #1: Type: TEXT/PLAIN, Size: 1657 bytes --] On Mon, 25 Aug 2008, Thomas Jarosch wrote: > On Friday, 22. August 2008 23:18:44 Ilpo Järvinen wrote: > > In the meantime, can you check the attached patches. Besides the kernel > > patch, you need to build your own patched iproute2 as well to configure > > the features (ip tool among them is enough in case the build of some other > > part of the toolset fails like it did for me). I somewhat tested them, and > > the result seemed to be what I'd expect (I just forced RTOs with some > > netem heavy dropping and quickly glanced over the resulting packet > > patterns near RTO). > > Your patches work fine. Thanks for testing. > I've noticed two small things: > 1. Maybe it's a good idea to add a note above the tcp_use_frto() change > to explain that the value is negated. Took me a while to figure out > why there is no "!" in there :-) > 2. Maybe rename the "features" option in iproute2 to "disable_features". > Then it would be more intuitive what it does. First of all I hate doing anything which has an user interface stamp in it... :-) Second, didn't I write about this negation in some of the log messages... /me looks for that... hmm... I think I did ...and that's besides the very clear help text :-). Yeah, it was just that the earlier ip already prints the field as "features" though I guess changing also that is a non-problem to existing userspace stuff because of the current usage of the field. There's this RTAX_FEATURE_ALLFRAG stuff which somebody could be looking for but I don't know how likely that will be. But anyway, point taken. I'll try to change both to disable_features and see if that gets accepted. -- i. ^ permalink raw reply [flat|nested] 107+ messages in thread
end of thread, other threads:[~2008-09-22 16:13 UTC | newest]
Thread overview: 107+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <47EA0DAB.7080205@securenet.de>
[not found] ` <Pine.LNX.4.64.0807042251500.4142@blackhole.kfki.hu>
[not found] ` <200807071118.32988.thomas.jarosch@intra2net.com>
2008-07-07 13:18 ` TCP connection stalls under 2.6.24.7 Thomas Jarosch
2008-07-10 13:17 ` Jozsef Kadlecsik
2008-07-10 14:12 ` Thomas Jarosch
2008-07-10 21:21 ` Jozsef Kadlecsik
2008-07-11 14:33 ` Thomas Jarosch
2008-07-15 11:47 ` Thomas Jarosch
2008-07-15 16:10 ` Thomas Jarosch
2008-07-15 18:30 ` Dâniel Fraga
2008-07-31 4:47 ` Dâniel Fraga
2008-07-31 7:39 ` Ilpo Järvinen
2008-08-02 12:24 ` Dâniel Fraga
2008-07-15 20:17 ` Ilpo Järvinen
2008-07-16 8:07 ` Thomas Jarosch
2008-07-16 9:03 ` Thomas Jarosch
2008-07-17 13:55 ` Ilpo Järvinen
2008-07-17 15:15 ` Thomas Jarosch
2008-07-17 15:53 ` Ilpo Järvinen
2008-07-18 9:14 ` Thomas Jarosch
2008-07-18 13:55 ` Ilpo Järvinen
2008-07-18 14:02 ` Thomas Jarosch
2008-07-19 7:35 ` Ilpo Järvinen
2008-07-25 10:00 ` Ilpo Järvinen
2008-07-25 13:00 ` Thomas Jarosch
2008-07-25 14:06 ` Ilpo Järvinen
2008-07-25 15:34 ` Thomas Jarosch
2008-07-31 7:39 ` Thomas Jarosch
2008-07-31 12:44 ` Dâniel Fraga
2008-07-31 13:47 ` Thomas Jarosch
2008-07-31 14:11 ` Dâniel Fraga
2008-08-06 18:53 ` Dâniel Fraga
2008-08-07 6:54 ` Ilpo Järvinen
2008-08-07 11:50 ` Denys Fedoryshchenko
2008-08-07 12:11 ` Thomas Jarosch
2008-08-07 12:14 ` Ilpo Järvinen
2008-08-07 12:23 ` Denys Fedoryshchenko
2008-08-08 9:56 ` Ilpo Järvinen
2008-08-08 10:32 ` Denys Fedoryshchenko
2008-08-07 11:33 ` [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround Ilpo Järvinen
2008-08-08 4:42 ` Bill Fink
2008-08-08 10:32 ` Ilpo Järvinen
2008-08-11 21:44 ` David Miller
2008-08-12 7:46 ` Thomas Jarosch
2008-08-12 8:18 ` David Miller
2008-08-12 17:43 ` Dâniel Fraga
2008-08-12 17:52 ` Ilpo Järvinen
2008-08-13 17:53 ` Dâniel Fraga
2008-08-13 18:34 ` Ilpo Järvinen
2008-08-15 4:34 ` Dâniel Fraga
2008-08-15 7:06 ` Ilpo Järvinen
2008-08-15 21:35 ` Dâniel Fraga
2008-08-15 22:06 ` Ilpo Järvinen
2008-08-15 23:57 ` Dâniel Fraga
2008-08-16 2:15 ` Dâniel Fraga
2008-08-16 7:10 ` Ilpo Järvinen
2008-08-16 19:18 ` Ilpo Järvinen
2008-08-17 0:36 ` Dâniel Fraga
2008-08-19 10:38 ` Ilpo Järvinen
2008-08-20 0:34 ` Dâniel Fraga
2008-08-20 7:57 ` Ilpo Järvinen
2008-08-20 12:37 ` Ilpo Järvinen
2008-08-22 21:32 ` Dâniel Fraga
2008-08-22 21:37 ` David Miller
2008-08-23 14:14 ` Dâniel Fraga
2008-08-23 14:38 ` Ilpo Järvinen
2008-08-24 19:38 ` Dâniel Fraga
2008-08-26 14:10 ` Ilpo Järvinen
2008-08-26 14:32 ` Ilpo Järvinen
2008-08-26 17:18 ` Dâniel Fraga
2008-08-26 20:40 ` Ilpo Järvinen
2008-08-26 21:17 ` Dâniel Fraga
2008-08-27 10:22 ` Ilpo Järvinen
2008-08-27 19:51 ` Dâniel Fraga
2008-08-27 20:32 ` Ilpo Järvinen
2008-08-27 20:50 ` Dâniel Fraga
2008-08-27 21:25 ` Ilpo Järvinen
2008-08-27 21:42 ` Dâniel Fraga
2008-08-27 22:24 ` Dâniel Fraga
2008-08-28 21:49 ` Dâniel Fraga
2008-08-29 13:07 ` Ilpo Järvinen
2008-08-29 17:41 ` Dâniel Fraga
2008-09-01 7:11 ` Ilpo Järvinen
2008-08-30 6:56 ` Dâniel Fraga
2008-09-01 7:11 ` Ilpo Järvinen
2008-09-07 8:17 ` Dâniel Fraga
2008-09-08 10:27 ` Ilpo Järvinen
2008-09-08 20:20 ` Dâniel Fraga
2008-09-11 13:44 ` Ilpo Järvinen
2008-09-11 17:30 ` Dâniel Fraga
2008-09-12 10:16 ` Ilpo Järvinen
2008-09-13 23:31 ` Dâniel Fraga
2008-09-16 12:10 ` Ilpo Järvinen
2008-09-16 14:24 ` Dâniel Fraga
2008-09-17 10:23 ` Ilpo Järvinen
2008-09-18 20:35 ` Dâniel Fraga
2008-09-18 21:04 ` Ilpo Järvinen
2008-09-21 3:02 ` Dâniel Fraga
2008-09-22 4:23 ` Dâniel Fraga
2008-09-22 11:22 ` Ilpo Järvinen
2008-09-22 16:13 ` Dâniel Fraga
2008-09-15 19:42 ` Dâniel Fraga
2008-09-11 18:12 ` Dâniel Fraga
2008-08-15 21:59 ` Dâniel Fraga
2008-08-13 8:00 ` Thomas Jarosch
2008-08-22 21:18 ` Ilpo Järvinen
2008-08-11 21:41 ` David Miller
2008-08-25 16:45 Thomas Jarosch
2008-08-26 12:03 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).