* Re: network/performance problem @ 2004-03-11 15:27 Ron Peterson 2004-03-11 17:32 ` Ron Peterson 2004-03-11 23:15 ` Andrew Morton 0 siblings, 2 replies; 13+ messages in thread From: Ron Peterson @ 2004-03-11 15:27 UTC (permalink / raw) To: Linux Kernel Mailing List I didn't reboot sam like I said I would. I decided I'd let it spiral down. I'm still collecting profile data every fifteen minutes. I haven't posted any more graphs. They look the same as all the others: a monotonically increasing ping latency (w/ a corresponding slow increase in system load averages - which I'm logging, if anyone wants more data). http://depot.mtholyoke.edu:8080/tmp/sam-profile/ I've been perusing fa.linux.kernel, and saw Brad Laue's thread. FWIW, it smells similar. When my machines finally go down, ksoftirqd is always at the top of the process list. Any ideas at all about what might be happening? -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-11 15:27 network/performance problem Ron Peterson @ 2004-03-11 17:32 ` Ron Peterson 2004-03-11 23:15 ` Andrew Morton 1 sibling, 0 replies; 13+ messages in thread From: Ron Peterson @ 2004-03-11 17:32 UTC (permalink / raw) To: Linux Kernel Mailing List On Thu, Mar 11, 2004 at 10:27:28AM -0500, rpeterso wrote: > I've been perusing fa.linux.kernel, and saw Brad Laue's thread. FWIW, > it smells similar. When my machines finally go down, ksoftirqd is > always at the top of the process list. > > Any ideas at all about what might be happening? I put my latest user.log file up (16M): http://depot.mtholyoke.edu:8080/tmp/sam-profile/user.log If you 'grep PSTOPCPU user.log | less', you can see that ksoftirqd_CPU0 slowly but steadily consumes a higher and higher CPU percentage. What this means, I have no idea. -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-11 15:27 network/performance problem Ron Peterson 2004-03-11 17:32 ` Ron Peterson @ 2004-03-11 23:15 ` Andrew Morton 2004-03-11 23:35 ` Ron Peterson 1 sibling, 1 reply; 13+ messages in thread From: Andrew Morton @ 2004-03-11 23:15 UTC (permalink / raw) To: Ron Peterson; +Cc: linux-kernel Ron Peterson <rpeterso@mtholyoke.edu> wrote: > > I didn't reboot sam like I said I would. I decided I'd let it spiral > down. I'm still collecting profile data every fifteen minutes. I > haven't posted any more graphs. They look the same as all the others: a > monotonically increasing ping latency (w/ a corresponding slow increase > in system load averages - which I'm logging, if anyone wants more data). > > http://depot.mtholyoke.edu:8080/tmp/sam-profile/ > > I've been perusing fa.linux.kernel, and saw Brad Laue's thread. FWIW, > it smells similar. When my machines finally go down, ksoftirqd is > always at the top of the process list. > > Any ideas at all about what might be happening? The profiles tell a story: c0217fb0 wait_for_packet 2 0.0063 c0256660 arpt_do_table 2 0.0019 c0265ca0 __generic_copy_to_user 2 0.0278 c0106bd0 system_call 3 0.0536 c0107e8c handle_IRQ_event 3 0.0326 c014bf10 statm_pgd_range 3 0.0077 c0120ed4 do_wp_page 5 0.0101 c024c0d4 ip_conntrack_expect_related 47 0.0368 c0105250 default_idle 2817 70.4250 c024bae0 init_conntrack 3053 3.7232 00000000 total 5962 0.0041 It appears that netfilter has gone berzerk and is taking your machine out. Are you really sure that nothing is sitting there injecting new rules all the time? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-11 23:15 ` Andrew Morton @ 2004-03-11 23:35 ` Ron Peterson 2004-03-12 10:11 ` Patrick McHardy 2004-03-12 16:47 ` Ron Peterson 0 siblings, 2 replies; 13+ messages in thread From: Ron Peterson @ 2004-03-11 23:35 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel On Thu, Mar 11, 2004 at 03:15:59PM -0800, Andrew Morton wrote: > Ron Peterson <rpeterso@mtholyoke.edu> wrote: > > > > I didn't reboot sam like I said I would. I decided I'd let it spiral > > down. I'm still collecting profile data every fifteen minutes. I > > haven't posted any more graphs. They look the same as all the others: a > > monotonically increasing ping latency (w/ a corresponding slow increase > > in system load averages - which I'm logging, if anyone wants more data). > > > > http://depot.mtholyoke.edu:8080/tmp/sam-profile/ > > > > I've been perusing fa.linux.kernel, and saw Brad Laue's thread. FWIW, > > it smells similar. When my machines finally go down, ksoftirqd is > > always at the top of the process list. > > > > Any ideas at all about what might be happening? > > The profiles tell a story: > > c0217fb0 wait_for_packet 2 0.0063 > c0256660 arpt_do_table 2 0.0019 > c0265ca0 __generic_copy_to_user 2 0.0278 > c0106bd0 system_call 3 0.0536 > c0107e8c handle_IRQ_event 3 0.0326 > c014bf10 statm_pgd_range 3 0.0077 > c0120ed4 do_wp_page 5 0.0101 > c024c0d4 ip_conntrack_expect_related 47 0.0368 > c0105250 default_idle 2817 70.4250 > c024bae0 init_conntrack 3053 3.7232 > 00000000 total 5962 0.0041 > > It appears that netfilter has gone berzerk and is taking your machine out. > > Are you really sure that nothing is sitting there injecting new rules all > the time? You mean a script calling 'iptables' to dynamically add rules? Nothing like that at all. I dumped the current rules below. Are you looking at the init_conntrack numbers? While they seem, in the long run, to be getting larger, they're not increasing monotonically. My ping latencies, and the CPU percentage consumed by ksoftirqd_CPU0 just go up and and up (albeit slowly). The graph below shows what happened when I flushed the rules, and set the default policy to ACCEPT. So the ping latencies, at least, seem to have something to do with iptables. http://depot.mtholyoke.edu:8080/tmp/tap-sam/2004-03-06_9:30/sam_last_108000.png 1003# iptables -v -L Chain INPUT (policy DROP 9910K packets, 1296M bytes) pkts bytes target prot opt in out source destination 1899K 2581M ACCEPT all -- any any anywhere anywhere state RELATED,ESTABLISHED 28774 2396K ACCEPT icmp -- any any 138.110.0.0/16 anywhere icmp echo-request 12 672 ACCEPT tcp -- any any anywhere anywhere tcp dpt:ssh 0 0 ACCEPT tcp -- any any anywhere anywhere tcp dpt:https 127 8713 ACCEPT all -- lo any anywhere localhost Chain FORWARD (policy DROP 0 packets, 0 bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy DROP 137 packets, 9042 bytes) pkts bytes target prot opt in out source destination 1433K 287M ACCEPT all -- any any anywhere anywhere state NEW,RELATED,ESTABLISHED Thu Mar 11 06:26:55 root@sam ~ 1004# iptables -v -L -t nat Chain PREROUTING (policy ACCEPT 21M packets, 2512M bytes) pkts bytes target prot opt in out source destination Chain POSTROUTING (policy ACCEPT 676K packets, 27M bytes) pkts bytes target prot opt in out source destination Chain OUTPUT (policy ACCEPT 676K packets, 27M bytes) pkts bytes target prot opt in out source destination ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-11 23:35 ` Ron Peterson @ 2004-03-12 10:11 ` Patrick McHardy 2004-03-12 16:10 ` Martin Josefsson 2004-03-12 16:47 ` Ron Peterson 1 sibling, 1 reply; 13+ messages in thread From: Patrick McHardy @ 2004-03-12 10:11 UTC (permalink / raw) To: Ron Peterson Cc: Andrew Morton, linux-kernel, Netfilter Development Mailinglist Ron Peterson wrote: > On Thu, Mar 11, 2004 at 03:15:59PM -0800, Andrew Morton wrote: >>The profiles tell a story: >> >>c0217fb0 wait_for_packet 2 0.0063 >>c0256660 arpt_do_table 2 0.0019 >>c0265ca0 __generic_copy_to_user 2 0.0278 >>c0106bd0 system_call 3 0.0536 >>c0107e8c handle_IRQ_event 3 0.0326 >>c014bf10 statm_pgd_range 3 0.0077 >>c0120ed4 do_wp_page 5 0.0101 >>c024c0d4 ip_conntrack_expect_related 47 0.0368 >>c0105250 default_idle 2817 70.4250 >>c024bae0 init_conntrack 3053 3.7232 >>00000000 total 5962 0.0041 >> >>It appears that netfilter has gone berzerk and is taking your machine out. >> >>Are you really sure that nothing is sitting there injecting new rules all >>the time? > > > You mean a script calling 'iptables' to dynamically add rules? Nothing > like that at all. I dumped the current rules below. > > Are you looking at the init_conntrack numbers? While they seem, in the > long run, to be getting larger, they're not increasing monotonically. > My ping latencies, and the CPU percentage consumed by ksoftirqd_CPU0 > just go up and and up (albeit slowly). > The size-128 slab keeps growing over time, I suspect something is registering lots of expectations. init_conntrack has to walk the entire list for each new connection. Which helpers are you using ? Please also post the content of /proc/net/ip_conntrack and your config. Regards Patrick ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-12 10:11 ` Patrick McHardy @ 2004-03-12 16:10 ` Martin Josefsson 0 siblings, 0 replies; 13+ messages in thread From: Martin Josefsson @ 2004-03-12 16:10 UTC (permalink / raw) To: Patrick McHardy Cc: Ron Peterson, Andrew Morton, linux-kernel, Netfilter Development Mailinglist On Fri, 12 Mar 2004, Patrick McHardy wrote: > >>c024c0d4 ip_conntrack_expect_related 47 0.0368 > >>c0105250 default_idle 2817 70.4250 > >>c024bae0 init_conntrack 3053 3.7232 > >>00000000 total 5962 0.0041 > >> > >>It appears that netfilter has gone berzerk and is taking your machine out. > >> > >>Are you really sure that nothing is sitting there injecting new rules all > >>the time? > > > > > > You mean a script calling 'iptables' to dynamically add rules? Nothing > > like that at all. I dumped the current rules below. > > > > Are you looking at the init_conntrack numbers? While they seem, in the > > long run, to be getting larger, they're not increasing monotonically. > > My ping latencies, and the CPU percentage consumed by ksoftirqd_CPU0 > > just go up and and up (albeit slowly). > > > > The size-128 slab keeps growing over time, I suspect something is > registering lots of expectations. init_conntrack has to walk the > entire list for each new connection. Which helpers are you using ? > Please also post the content of /proc/net/ip_conntrack and your > config. If you want to see the numbers of expectations registered per second you can apply the ctstat patch from patch-o-matic and download the small utility mentioned in the helpfile. I can prepare a regular patch for you if it sounds interesting. We can add a counter for the number of expectations in the linked-list as well in order to debug this. (the ctstat patch only adds counters for new/deleted expectations) /Martin ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-11 23:35 ` Ron Peterson 2004-03-12 10:11 ` Patrick McHardy @ 2004-03-12 16:47 ` Ron Peterson 2004-03-12 17:23 ` Ron Peterson 2004-03-12 22:56 ` Ron Peterson 1 sibling, 2 replies; 13+ messages in thread From: Ron Peterson @ 2004-03-12 16:47 UTC (permalink / raw) To: linux-kernel Hi Patrick. (I'm all set now. Someone kindly sent me a critical network patch via email... :) I'm not subscribed to lkml, but am following along in fa.kernel.linux. I'm replying to my own mail to keep the thread somewhat intact... Anyway, sam's .config can be found here: http://depot.mtholyoke.edu:8080/tmp/sam-profile/sam-config-2.4.21 On sam, I just did: 1002# cat /proc/net/ip_conntrack > ip_conntrack ..and it wiped the machine out. I can't ping it, ssh to it, nothing. I need to go walk over to the machine room... :( After lunch I'm stuck in meetings for awhile... Thanks. -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-12 16:47 ` Ron Peterson @ 2004-03-12 17:23 ` Ron Peterson 2004-03-12 22:56 ` Ron Peterson 1 sibling, 0 replies; 13+ messages in thread From: Ron Peterson @ 2004-03-12 17:23 UTC (permalink / raw) To: linux-kernel On Fri, Mar 12, 2004 at 11:47:04AM -0500, rpeterso wrote: > On sam, I just did: > > 1002# cat /proc/net/ip_conntrack > ip_conntrack > > ..and it wiped the machine out. I can't ping it, ssh to it, nothing. I > need to go walk over to the machine room... :( I rebooted, and did the exact same thing as above. Here's what the console says: Unable to handle kernel NULL pointer dereference at virtual address 00000018 printint eip: c024aae5 *pde = 00000000 Ooops: 0000 CPU: 0 EIP: 0010:[<c024aaae5>] Not tainted EFLAGS: 00010286 eax: 00000000 ebx: deb00440 ecx: ddad71d1 edx: e089b000 esi: deb00440 edi: ddad71d2 ebp: 0000002d esp: ddb4df3c dsd: 0018 es: 0018 ss: 0018 Process cat (pid: 365, stackpage=ddb4d000) Stack: deb00440 000001d2 000001d2 c024ad1a ddad71d2 deb00440 00000000 00000c00 ddad7000 00001000 00000ff6 c014af9f ddad7000 ddb4df98 00000029 00000c00 00000000 ddafe3c0 ffffffea 00001000 c196dce0 00000000 00000000 00000000 Call Trace: [<c024ad1a>] [<c014af9f>] [<c012f936>] [c0106c03>] Code: 83 78 18 00 74 3a 83 7e 2c 00 74 1f a1 44 3c 32 c0 8b 56 34 <0>Kernel panic: Aiee, killing interrupt handler! In interrupt handler - not syncing ...whew. Hopefully not too many typos.. ;) After I reboot again, I'll probably find this all got syslogged.. -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-12 16:47 ` Ron Peterson 2004-03-12 17:23 ` Ron Peterson @ 2004-03-12 22:56 ` Ron Peterson 2004-03-14 6:33 ` David S. Miller 1 sibling, 1 reply; 13+ messages in thread From: Ron Peterson @ 2004-03-12 22:56 UTC (permalink / raw) To: linux-kernel On Fri, Mar 12, 2004 at 11:47:04AM -0500, rpeterso wrote: > (I'm all set now. Someone kindly sent me a critical network patch via > email... :) ...just in case ...since my sense of humor is suspect ...that was a joke. Same problem persists after reboot. I haven't installed a different kernel or otherwise changed anything on 'sam' yet. Not sure what would be good to try next. -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-12 22:56 ` Ron Peterson @ 2004-03-14 6:33 ` David S. Miller 2004-03-14 13:23 ` Ron Peterson 0 siblings, 1 reply; 13+ messages in thread From: David S. Miller @ 2004-03-14 6:33 UTC (permalink / raw) To: Ron Peterson; +Cc: linux-kernel On Fri, 12 Mar 2004 17:56:06 -0500 Ron Peterson <rpeterso@mtholyoke.edu> wrote: > ...just in case ...since my sense of humor is suspect ...that was a > joke. Same problem persists after reboot. I haven't installed a > different kernel or otherwise changed anything on 'sam' yet. Not sure > what would be good to try next. FInd out what's adding all of the netfilter rules like crazy. It is obvious this is happening, from your profiles. I know you say that you have no idea what might be doing it, but your description matches every other one that was reported in the past of gradual networking slowdown, and in each of those cases it was something poking netfilter in some way, and your profiles basically confirm that this is what is happening somehow on your box. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-14 6:33 ` David S. Miller @ 2004-03-14 13:23 ` Ron Peterson 2004-03-14 17:33 ` David S. Miller 0 siblings, 1 reply; 13+ messages in thread From: Ron Peterson @ 2004-03-14 13:23 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel On Sat, Mar 13, 2004 at 10:33:49PM -0800, David S. Miller wrote: > On Fri, 12 Mar 2004 17:56:06 -0500 > Ron Peterson <rpeterso@mtholyoke.edu> wrote: > > > ...just in case ...since my sense of humor is suspect ...that was a > > joke. Same problem persists after reboot. I haven't installed a > > different kernel or otherwise changed anything on 'sam' yet. Not sure > > what would be good to try next. > > FInd out what's adding all of the netfilter rules like crazy. > > It is obvious this is happening, from your profiles. I know you > say that you have no idea what might be doing it, but your description > matches every other one that was reported in the past of gradual > networking slowdown, and in each of those cases it was something > poking netfilter in some way, and your profiles basically > confirm that this is what is happening somehow on your box. Don't think so. If I revert to 2.4.20 from 2.4.21, and change nothing else, this problem goes away. -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-14 13:23 ` Ron Peterson @ 2004-03-14 17:33 ` David S. Miller 2004-03-14 18:13 ` Ron Peterson 0 siblings, 1 reply; 13+ messages in thread From: David S. Miller @ 2004-03-14 17:33 UTC (permalink / raw) To: Ron Peterson; +Cc: linux-kernel On Sun, 14 Mar 2004 08:23:40 -0500 Ron Peterson <rpeterso@mtholyoke.edu> wrote: > Don't think so. If I revert to 2.4.20 from 2.4.21, and change nothing > else, this problem goes away. That's right because a netfilter change during that time period makes certain auto-rule adding setups go berzerk and it's a bug in the netfilter userland bits not the kernel. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: network/performance problem 2004-03-14 17:33 ` David S. Miller @ 2004-03-14 18:13 ` Ron Peterson 0 siblings, 0 replies; 13+ messages in thread From: Ron Peterson @ 2004-03-14 18:13 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel On Sun, Mar 14, 2004 at 09:33:58AM -0800, David S. Miller wrote: > On Sun, 14 Mar 2004 08:23:40 -0500 > Ron Peterson <rpeterso@mtholyoke.edu> wrote: > > > Don't think so. If I revert to 2.4.20 from 2.4.21, and change nothing > > else, this problem goes away. > > That's right because a netfilter change during that time period > makes certain auto-rule adding setups go berzerk and it's a bug > in the netfilter userland bits not the kernel. I may indeed be completely dense. That's not unheard of around these parts. I'd certainly accept an explanation of my denseness in lieue of any other explanation, as long as I can make this stop happening. What is the nature of the auto-rule adding setups going berzerk problem? Below are my current iptables rules on 'sam' (the only machine not currently running 2.4.20). There are no jumps to user defined chains. I have not installed any scripts that dynamically add/alter iptables rules. I can't imagine what package I may have installed that might do such a thing either. Even if there were such a script somehow, since nothing below ever jumps anywhere else, it wouldn't be getting called, right? If I flush and expunge my rules as follows, the problem goes away. If this was because a jump to user defined chain was being deleted, then I'd understand. But there are no jumps out of INPUT, OUTPUT, FORWARD, PREROUTING, or POSTROUTING, so I'm confused. $IPTABLES -F $IPTABLES -t nat -F $IPTABLES -X $IPTABLES -P INPUT ACCEPT $IPTABLES -P OUTPUT ACCEPT $IPTABLES -P FORWARD ACCEPT FWIW, I compiled the latest 'iptables' code against my current running 2.4.21 kernel also.. 1052# iptables -V iptables v1.2.9 1045# iptables -L Chain INPUT (policy DROP) target prot opt source destination ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED ACCEPT icmp -- 138.110.0.0/16 anywhere icmp echo-request ACCEPT tcp -- anywhere anywhere tcp dpt:ssh ACCEPT tcp -- anywhere anywhere tcp dpt:https ACCEPT all -- anywhere localhost Chain FORWARD (policy DROP) target prot opt source destination Chain OUTPUT (policy DROP) target prot opt source destination ACCEPT all -- anywhere anywhere state NEW,RELATED,ESTABLISHED Sun Mar 14 12:57:25 root@sam /usr/src 1046# iptables -L -t nat Chain PREROUTING (policy ACCEPT) target prot opt source destination Chain POSTROUTING (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination -- Ron Peterson Network & Systems Manager Mount Holyoke College http://www.mtholyoke.edu/~rpeterso ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-03-14 18:14 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-03-11 15:27 network/performance problem Ron Peterson 2004-03-11 17:32 ` Ron Peterson 2004-03-11 23:15 ` Andrew Morton 2004-03-11 23:35 ` Ron Peterson 2004-03-12 10:11 ` Patrick McHardy 2004-03-12 16:10 ` Martin Josefsson 2004-03-12 16:47 ` Ron Peterson 2004-03-12 17:23 ` Ron Peterson 2004-03-12 22:56 ` Ron Peterson 2004-03-14 6:33 ` David S. Miller 2004-03-14 13:23 ` Ron Peterson 2004-03-14 17:33 ` David S. Miller 2004-03-14 18:13 ` Ron Peterson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox