* rwlock recursion on CPU#0, netfilter related? @ 2005-09-25 10:58 Pekka Pietikainen 2005-09-25 13:43 ` Harald Welte 0 siblings, 1 reply; 6+ messages in thread From: Pekka Pietikainen @ 2005-09-25 10:58 UTC (permalink / raw) To: netdev Just to get a wider audience, somewhere between 2.6.13-git4 and current (2.6.14-rc2-git4 is the last one I tested, which seems to have some fixes in this are wrt. git3, but problem remains) my x86_64 crashes quite quickly after boot. Using Fedora devel kernels, I can probably whip up a vanilla kernel if the maintainers in this area prefer that. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167835 and https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=119228 apart from the crashes I get funny ping times on the kernels that break when they're still up (64 bytes from 10.10.9.1: icmp_seq=0 ttl=255 time=4294971590968 ms) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rwlock recursion on CPU#0, netfilter related? 2005-09-25 10:58 rwlock recursion on CPU#0, netfilter related? Pekka Pietikainen @ 2005-09-25 13:43 ` Harald Welte 2005-09-25 20:19 ` Pekka Pietikainen 0 siblings, 1 reply; 6+ messages in thread From: Harald Welte @ 2005-09-25 13:43 UTC (permalink / raw) To: Pekka Pietikainen; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 1790 bytes --] On Sun, Sep 25, 2005 at 01:58:34PM +0300, Pekka Pietikainen wrote: > Just to get a wider audience, somewhere between 2.6.13-git4 and current > (2.6.14-rc2-git4 is the last one I tested, which seems to have some > fixes in this are wrt. git3, but problem remains) my x86_64 > crashes quite quickly after boot. Using Fedora devel kernels, I can > probably whip up a vanilla kernel if the maintainers in this area > prefer that. > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167835 Can you please give some more feedback like 1) how does your kernel .config look like? 2) which modules are loaded 3) how does your ruleset look like? 4) most importantly, have you enabled CONFIG_IP_NF_CONNTRACK_EVENTS ? if yes, please disable, it's broken, a fix has been submitted, but I don't know if it has propagated to Linus yet (netdev Message-ID: <20050922143515.GD8917@rama.de.gnumonks.org>) please also try a) only loading iptable_filter (and ip_tables), but no other modules a) only loading ip_conntrack but no other netfilter modules (no nat, no iptables) b) only loading ip_conntrack and iptable_nat (but no rules) this kind of debugging helps to locate where it is. netfilter has grown big ;) Also, I have that Ping time problem on my x86_64 debian unstable (smp). But only in 1 out of ten cases on average (when starting ping, ctrl+c, pin, ctrl+c, ...). I've always assumed it's some 64bit problem in "ping" itself. -- - Harald Welte <laforge@gnumonks.org> http://gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rwlock recursion on CPU#0, netfilter related? 2005-09-25 13:43 ` Harald Welte @ 2005-09-25 20:19 ` Pekka Pietikainen 2005-09-28 14:58 ` Pekka Pietikainen 0 siblings, 1 reply; 6+ messages in thread From: Pekka Pietikainen @ 2005-09-25 20:19 UTC (permalink / raw) To: Harald Welte; +Cc: netdev On Sun, Sep 25, 2005 at 03:43:44PM +0200, Harald Welte wrote: > 1) how does your kernel .config look like? http://cvs.fedora.redhat.com/viewcvs/devel/kernel/configs/config-generic?rev=1.60&view=auto http://cvs.fedora.redhat.com/viewcvs/devel/kernel/configs/config-x86_64-generic?rev=1.16&view=auto > 2) which modules are loaded Module Size Used by w83627hf 46569 0 eeprom 17617 0 i2c_sensor 12225 2 w83627hf,eeprom i2c_isa 11329 0 rfcomm 61033 0 l2cap 46145 5 rfcomm bluetooth 73317 4 rfcomm,l2cap ipv6 325889 16 ppp_synctty 21057 0 ppp_async 22465 1 crc_ccitt 10817 1 ppp_async ppp_generic 41953 6 ppp_synctty,ppp_async slhc 16193 1 ppp_generic ip_conntrack_ftp 82177 0 ipt_ULOG 18913 1 ipt_state 10689 18 ip_conntrack 60053 2 ip_conntrack_ftp,ipt_state iptable_filter 11969 1 ip_tables 32193 3 ipt_ULOG,ipt_state,iptable_filter loop 26449 0 video 27977 0 button 16481 0 battery 19657 0 ac 14409 0 ohci1394 46753 0 ieee1394 381273 1 ohci1394 ohci_hcd 33249 0 ehci_hcd 46157 0 parport_pc 40621 0 parport 52557 1 parport_pc i2c_nforce2 16833 0 i2c_core 34241 5 w83627hf,eeprom,i2c_sensor,i2c_isa,i2c_nforce2 shpchp 108009 0 emu10k1_gp 12865 0 gameport 27089 2 emu10k1_gp snd_emu10k1 138629 0 snd_rawmidi 39521 1 snd_emu10k1 snd_util_mem 14401 1 snd_emu10k1 snd_hwdep 20321 1 snd_emu10k1 snd_intel8x0 46273 0 snd_ac97_codec 106757 2 snd_emu10k1,snd_intel8x0 snd_seq_dummy 12869 0 snd_seq_oss 47012 0 snd_seq_midi_event 17473 1 snd_seq_oss snd_seq 74265 5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event snd_seq_device 19281 5 snd_emu10k1,snd_rawmidi,snd_seq_dummy,snd_seq_oss,snd_seq snd_pcm_oss 68465 0 snd_mixer_oss 28225 1 snd_pcm_oss snd_pcm 115401 4 snd_emu10k1,snd_intel8x0,snd_ac97_codec,snd_pcm_oss snd_timer 37577 3 snd_emu10k1,snd_seq,snd_pcm snd 75681 12 snd_emu10k1,snd_rawmidi,snd_hwdep,snd_intel8x0,snd_ac97_codec,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer soundcore 19809 1 snd snd_page_alloc 21713 3 snd_emu10k1,snd_intel8x0,snd_pcm r8169 43209 0 forcedeth 30657 0 floppy 77865 0 dm_snapshot 26369 0 dm_zero 10817 0 dm_mirror 32433 0 ext3 154577 3 jbd 76145 1 ext3 dm_mod 73873 7 dm_snapshot,dm_zero,dm_mirror sata_nv 19141 3 libata 61649 1 sata_nv sd_mod 29121 4 scsi_mod 167801 2 libata,sd_mod > 3) how does your ruleset look like? *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :RH-Firewall-1-INPUT - [0:0] -A INPUT -j RH-Firewall-1-INPUT -A FORWARD -j RH-Firewall-1-INPUT -A RH-Firewall-1-INPUT -i lo -j ACCEPT -A RH-Firewall-1-INPUT -i eth1 -j ACCEPT -A RH-Firewall-1-INPUT -p icmp --icmp-type echo-request -j ACCEPT -A RH-Firewall-1-INPUT -p esp -j ACCEPT -A RH-Firewall-1-INPUT -p ah -j ACCEPT -A RH-Firewall-1-INPUT -p ipv6 -j ACCEPT -A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A RH-Firewall-1-INPUT -j ULOG -A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport x -j ACCEPT -A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport y -j ACCEPT (for a bunch of ports, some with -s sourcenet/24 etc.) -A RH-Firewall-1-INPUT -j DROP COMMIT # Completed on Sun Sep 28 10:37:44 2003 So basically a single-host firewall with ULOG and ftp conntracking being the only fancy things. > 4) most importantly, have you enabled CONFIG_IP_NF_CONNTRACK_EVENTS ? > if yes, please disable, it's broken, a fix has been submitted, but I > don't know if it has propagated to Linus yet (netdev Message-ID: > <20050922143515.GD8917@rama.de.gnumonks.org>) Enabled, so this could be it. But 2.6.14-rc2-git4 did crash too (although it did take a bit longer for that to happen), and the changelog does state: commit 1dfbab59498d6f227c91988bab6c71af049a5333 tree 6b20409a232ebe8c37f16d06b3fbcde6bec8f328 parent a82b748930fce0dab22c64075c38c830ae116904 author Harald Welte <laforge@netfilter.org> Thu, 22 Sep 2005 23:46:57 -0700 committer David S. Miller <davem@davemloft.net> Thu, 22 Sep 2005 23:46:57 -0700 [NETFILTER] Fix conntrack event cache deadlock/oops Which is this patch, right? Will verify whether disabling the option makes any difference tomorrow, as well as your other recommendations. > Also, I have that Ping time problem on my x86_64 debian unstable (smp). > But only in 1 out of ten cases on average (when starting ping, ctrl+c, > pin, ctrl+c, ...). I've always assumed it's some 64bit problem in > "ping" itself. Happens for all packets on the "broken" kernels, and works a-ok (few ms latencies to the same box) on the 2.6.13-era ones that don't crash. Could be a different bug, sure. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rwlock recursion on CPU#0, netfilter related? 2005-09-25 20:19 ` Pekka Pietikainen @ 2005-09-28 14:58 ` Pekka Pietikainen 2005-09-29 12:05 ` Harald Welte 0 siblings, 1 reply; 6+ messages in thread From: Pekka Pietikainen @ 2005-09-28 14:58 UTC (permalink / raw) To: Harald Welte; +Cc: netdev On Sun, Sep 25, 2005 at 11:19:45PM +0300, Pekka Pietikainen wrote: > Enabled, so this could be it. But 2.6.14-rc2-git4 did crash too (although > it did take a bit longer for that to happen), and the changelog does state: Ok, it looks like that patch was the thing after all. I now tried the latest fedora-devel kernel (1.1582, based on 2.6.14-rc2-git6) and the box has been running for a few hours happily. Could be the fedora kernel that claimed to be git4 actually wasn't, or the git4 changelog was really a post-git4 changelog :). But anyway, bug is gone. > > But only in 1 out of ten cases on average (when starting ping, ctrl+c, > > pin, ctrl+c, ...). I've always assumed it's some 64bit problem in > > "ping" itself. > Happens for all packets on the "broken" kernels, and works a-ok (few ms > latencies to the same box) on the 2.6.13-era ones that don't crash. > Could be a different bug, sure. This one is still around, so it's a different bug. Looks like it's a 64-bit issue, a 32-bit ping gives realistic ping times. tcpdump timestamps are also affected, they're completely off too. So looks like someone broke packet timestamps on 64-bit some time after 2.6.13. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: rwlock recursion on CPU#0, netfilter related? 2005-09-28 14:58 ` Pekka Pietikainen @ 2005-09-29 12:05 ` Harald Welte 2005-09-30 6:49 ` Funny timestamps (Was: Re: rwlock recursion on CPU#0, netfilter related?) Pekka Pietikainen 0 siblings, 1 reply; 6+ messages in thread From: Harald Welte @ 2005-09-29 12:05 UTC (permalink / raw) To: Pekka Pietikainen; +Cc: netdev [-- Attachment #1: Type: text/plain, Size: 1314 bytes --] On Wed, Sep 28, 2005 at 05:58:15PM +0300, Pekka Pietikainen wrote: > On Sun, Sep 25, 2005 at 11:19:45PM +0300, Pekka Pietikainen wrote: > > Enabled, so this could be it. But 2.6.14-rc2-git4 did crash too (although > > it did take a bit longer for that to happen), and the changelog does state: > Ok, it looks like that patch was the thing after all. I now tried the latest > fedora-devel kernel (1.1582, based on 2.6.14-rc2-git6) and the box has been > running for a few hours happily. Could be the fedora kernel that claimed to > be git4 actually wasn't, or the git4 changelog was really a post-git4 > changelog :). But anyway, bug is gone. great news. > This one is still around, so it's a different bug. Looks like it's a 64-bit > issue, a 32-bit ping gives realistic ping times. tcpdump timestamps are also > affected, they're completely off too. So looks like someone broke packet > timestamps on 64-bit some time after 2.6.13. luckily I'm not the core network maintainer ;) -- - Harald Welte <laforge@gnumonks.org> http://gnumonks.org/ ============================================================================ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 6+ messages in thread
* Funny timestamps (Was: Re: rwlock recursion on CPU#0, netfilter related?) 2005-09-29 12:05 ` Harald Welte @ 2005-09-30 6:49 ` Pekka Pietikainen 0 siblings, 0 replies; 6+ messages in thread From: Pekka Pietikainen @ 2005-09-30 6:49 UTC (permalink / raw) To: netdev On Thu, Sep 29, 2005 at 02:05:38PM +0200, Harald Welte wrote: > > This one is still around, so it's a different bug. Looks like it's a 64-bit > > issue, a 32-bit ping gives realistic ping times. tcpdump timestamps are also > > affected, they're completely off too. So looks like someone broke packet > > timestamps on 64-bit some time after 2.6.13. > > luckily I'm not the core network maintainer ;) Here's the actual bug report: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=168166 Ended up being a userspace thing, maybe. But still makes me wonder what change actually broke things. It must have been something soon after 2.6.13. And there's still tcpdump, which doesn't seem to go into the problem mode when I test it now, except that nothing should have changed in the kernel/tcpdump/libpcap versions. Blah. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2005-09-30 6:49 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-09-25 10:58 rwlock recursion on CPU#0, netfilter related? Pekka Pietikainen 2005-09-25 13:43 ` Harald Welte 2005-09-25 20:19 ` Pekka Pietikainen 2005-09-28 14:58 ` Pekka Pietikainen 2005-09-29 12:05 ` Harald Welte 2005-09-30 6:49 ` Funny timestamps (Was: Re: rwlock recursion on CPU#0, netfilter related?) Pekka Pietikainen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).