netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* rwlock recursion on CPU#0, netfilter related?
@ 2005-09-25 10:58 Pekka Pietikainen
  2005-09-25 13:43 ` Harald Welte
  0 siblings, 1 reply; 6+ messages in thread
From: Pekka Pietikainen @ 2005-09-25 10:58 UTC (permalink / raw)
  To: netdev

Just to get a wider audience, somewhere between 2.6.13-git4 and current 
(2.6.14-rc2-git4 is the last one I tested, which seems to have some
fixes in this are wrt. git3, but problem remains) my x86_64
crashes quite quickly after boot. Using Fedora devel kernels, I can
probably whip up a vanilla kernel if the maintainers in this area
prefer that.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167835

and

https://bugzilla.redhat.com/bugzilla/attachment.cgi?id=119228

apart from the crashes I get funny ping times on the kernels that
break when they're still up 
(64 bytes from 10.10.9.1: icmp_seq=0 ttl=255 time=4294971590968 ms)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rwlock recursion on CPU#0, netfilter related?
  2005-09-25 10:58 rwlock recursion on CPU#0, netfilter related? Pekka Pietikainen
@ 2005-09-25 13:43 ` Harald Welte
  2005-09-25 20:19   ` Pekka Pietikainen
  0 siblings, 1 reply; 6+ messages in thread
From: Harald Welte @ 2005-09-25 13:43 UTC (permalink / raw)
  To: Pekka Pietikainen; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1790 bytes --]

On Sun, Sep 25, 2005 at 01:58:34PM +0300, Pekka Pietikainen wrote:
> Just to get a wider audience, somewhere between 2.6.13-git4 and current 
> (2.6.14-rc2-git4 is the last one I tested, which seems to have some
> fixes in this are wrt. git3, but problem remains) my x86_64
> crashes quite quickly after boot. Using Fedora devel kernels, I can
> probably whip up a vanilla kernel if the maintainers in this area
> prefer that.
> 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=167835

Can you please give some more feedback like 

1) how does your kernel .config look like?
2) which modules are loaded
3) how does your ruleset look like?
4) most importantly, have you enabled CONFIG_IP_NF_CONNTRACK_EVENTS ?
   if yes, please disable, it's broken, a fix has been submitted, but I
   don't know if it has propagated to Linus yet (netdev Message-ID:
   <20050922143515.GD8917@rama.de.gnumonks.org>)


please also try 

a) only loading iptable_filter (and ip_tables), but no other modules
a) only loading ip_conntrack but no other netfilter modules (no nat, no iptables)
b) only loading ip_conntrack and iptable_nat (but no rules)

this kind of debugging helps to locate where it is.  netfilter has grown
big ;)

Also, I have that Ping time problem on my x86_64 debian unstable (smp).
But only in 1 out of ten cases on average (when starting ping, ctrl+c,
pin, ctrl+c, ...).  I've always assumed it's some 64bit problem in
"ping" itself.

-- 
- Harald Welte <laforge@gnumonks.org>          	        http://gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rwlock recursion on CPU#0, netfilter related?
  2005-09-25 13:43 ` Harald Welte
@ 2005-09-25 20:19   ` Pekka Pietikainen
  2005-09-28 14:58     ` Pekka Pietikainen
  0 siblings, 1 reply; 6+ messages in thread
From: Pekka Pietikainen @ 2005-09-25 20:19 UTC (permalink / raw)
  To: Harald Welte; +Cc: netdev

On Sun, Sep 25, 2005 at 03:43:44PM +0200, Harald Welte wrote:
> 1) how does your kernel .config look like?
http://cvs.fedora.redhat.com/viewcvs/devel/kernel/configs/config-generic?rev=1.60&view=auto
http://cvs.fedora.redhat.com/viewcvs/devel/kernel/configs/config-x86_64-generic?rev=1.16&view=auto

> 2) which modules are loaded
Module                  Size  Used by
w83627hf               46569  0 
eeprom                 17617  0 
i2c_sensor             12225  2 w83627hf,eeprom
i2c_isa                11329  0 
rfcomm                 61033  0 
l2cap                  46145  5 rfcomm
bluetooth              73317  4 rfcomm,l2cap
ipv6                  325889  16 
ppp_synctty            21057  0 
ppp_async              22465  1 
crc_ccitt              10817  1 ppp_async
ppp_generic            41953  6 ppp_synctty,ppp_async
slhc                   16193  1 ppp_generic
ip_conntrack_ftp       82177  0 
ipt_ULOG               18913  1 
ipt_state              10689  18 
ip_conntrack           60053  2 ip_conntrack_ftp,ipt_state
iptable_filter         11969  1 
ip_tables              32193  3 ipt_ULOG,ipt_state,iptable_filter
loop                   26449  0 
video                  27977  0 
button                 16481  0 
battery                19657  0 
ac                     14409  0 
ohci1394               46753  0 
ieee1394              381273  1 ohci1394
ohci_hcd               33249  0 
ehci_hcd               46157  0 
parport_pc             40621  0 
parport                52557  1 parport_pc
i2c_nforce2            16833  0 
i2c_core               34241  5 w83627hf,eeprom,i2c_sensor,i2c_isa,i2c_nforce2
shpchp                108009  0 
emu10k1_gp             12865  0 
gameport               27089  2 emu10k1_gp
snd_emu10k1           138629  0 
snd_rawmidi            39521  1 snd_emu10k1
snd_util_mem           14401  1 snd_emu10k1
snd_hwdep              20321  1 snd_emu10k1
snd_intel8x0           46273  0 
snd_ac97_codec        106757  2 snd_emu10k1,snd_intel8x0
snd_seq_dummy          12869  0 
snd_seq_oss            47012  0 
snd_seq_midi_event     17473  1 snd_seq_oss
snd_seq                74265  5 snd_seq_dummy,snd_seq_oss,snd_seq_midi_event
snd_seq_device         19281  5 snd_emu10k1,snd_rawmidi,snd_seq_dummy,snd_seq_oss,snd_seq
snd_pcm_oss            68465  0 
snd_mixer_oss          28225  1 snd_pcm_oss
snd_pcm               115401  4 snd_emu10k1,snd_intel8x0,snd_ac97_codec,snd_pcm_oss
snd_timer              37577  3 snd_emu10k1,snd_seq,snd_pcm
snd                    75681  12 snd_emu10k1,snd_rawmidi,snd_hwdep,snd_intel8x0,snd_ac97_codec,snd_seq_oss,snd_seq,snd_seq_device,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer
soundcore              19809  1 snd
snd_page_alloc         21713  3 snd_emu10k1,snd_intel8x0,snd_pcm
r8169                  43209  0 
forcedeth              30657  0 
floppy                 77865  0 
dm_snapshot            26369  0 
dm_zero                10817  0 
dm_mirror              32433  0 
ext3                  154577  3 
jbd                    76145  1 ext3
dm_mod                 73873  7 dm_snapshot,dm_zero,dm_mirror
sata_nv                19141  3 
libata                 61649  1 sata_nv
sd_mod                 29121  4 
scsi_mod              167801  2 libata,sd_mod

> 3) how does your ruleset look like?
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT 
-A FORWARD -j RH-Firewall-1-INPUT 
-A RH-Firewall-1-INPUT -i lo -j ACCEPT 
-A RH-Firewall-1-INPUT -i eth1 -j ACCEPT 
-A RH-Firewall-1-INPUT -p icmp --icmp-type echo-request -j ACCEPT 
-A RH-Firewall-1-INPUT -p esp -j ACCEPT 
-A RH-Firewall-1-INPUT -p ah -j ACCEPT 
-A RH-Firewall-1-INPUT -p ipv6 -j ACCEPT 
-A RH-Firewall-1-INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT 
-A RH-Firewall-1-INPUT -j ULOG
-A RH-Firewall-1-INPUT -p tcp -m state --state NEW -m tcp --dport x -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m state --state NEW -m udp --dport y -j ACCEPT
(for a bunch of ports, some with -s sourcenet/24 etc.)
-A RH-Firewall-1-INPUT -j DROP
COMMIT
# Completed on Sun Sep 28 10:37:44 2003

So basically a single-host firewall with ULOG and ftp conntracking being the
only fancy things.

> 4) most importantly, have you enabled CONFIG_IP_NF_CONNTRACK_EVENTS ?
>    if yes, please disable, it's broken, a fix has been submitted, but I
>    don't know if it has propagated to Linus yet (netdev Message-ID:
>    <20050922143515.GD8917@rama.de.gnumonks.org>)
Enabled, so this could be it. But 2.6.14-rc2-git4 did crash too (although
it did take a bit longer for that to happen), and the changelog does state:

commit 1dfbab59498d6f227c91988bab6c71af049a5333
tree 6b20409a232ebe8c37f16d06b3fbcde6bec8f328
parent a82b748930fce0dab22c64075c38c830ae116904
author Harald Welte <laforge@netfilter.org> Thu, 22 Sep 2005 23:46:57 -0700
committer David S. Miller <davem@davemloft.net> Thu, 22 Sep 2005 23:46:57
-0700

    [NETFILTER] Fix conntrack event cache deadlock/oops

Which is this patch, right? Will verify whether disabling the option makes any
difference tomorrow, as well as your other recommendations.

> Also, I have that Ping time problem on my x86_64 debian unstable (smp).
> But only in 1 out of ten cases on average (when starting ping, ctrl+c,
> pin, ctrl+c, ...).  I've always assumed it's some 64bit problem in
> "ping" itself.
Happens for all packets on the "broken" kernels, and works a-ok (few ms
latencies to the same box) on the 2.6.13-era ones that don't crash.
Could be a different bug, sure.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rwlock recursion on CPU#0, netfilter related?
  2005-09-25 20:19   ` Pekka Pietikainen
@ 2005-09-28 14:58     ` Pekka Pietikainen
  2005-09-29 12:05       ` Harald Welte
  0 siblings, 1 reply; 6+ messages in thread
From: Pekka Pietikainen @ 2005-09-28 14:58 UTC (permalink / raw)
  To: Harald Welte; +Cc: netdev

On Sun, Sep 25, 2005 at 11:19:45PM +0300, Pekka Pietikainen wrote:
> Enabled, so this could be it. But 2.6.14-rc2-git4 did crash too (although
> it did take a bit longer for that to happen), and the changelog does state:
Ok, it looks like that patch was the thing after all. I now tried the latest
fedora-devel kernel (1.1582, based on 2.6.14-rc2-git6) and the box has been
running for a few hours happily. Could be the fedora kernel that claimed to
be git4 actually wasn't, or the git4 changelog was really a post-git4
changelog :). But anyway, bug is gone.

> > But only in 1 out of ten cases on average (when starting ping, ctrl+c,
> > pin, ctrl+c, ...).  I've always assumed it's some 64bit problem in
> > "ping" itself.
> Happens for all packets on the "broken" kernels, and works a-ok (few ms
> latencies to the same box) on the 2.6.13-era ones that don't crash.
> Could be a different bug, sure.
This one is still around, so it's a different bug. Looks like it's a 64-bit
issue, a 32-bit ping gives realistic ping times. tcpdump timestamps are also
affected, they're completely off too. So looks like someone broke packet
timestamps on 64-bit some time after 2.6.13.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: rwlock recursion on CPU#0, netfilter related?
  2005-09-28 14:58     ` Pekka Pietikainen
@ 2005-09-29 12:05       ` Harald Welte
  2005-09-30  6:49         ` Funny timestamps (Was: Re: rwlock recursion on CPU#0, netfilter related?) Pekka Pietikainen
  0 siblings, 1 reply; 6+ messages in thread
From: Harald Welte @ 2005-09-29 12:05 UTC (permalink / raw)
  To: Pekka Pietikainen; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1314 bytes --]

On Wed, Sep 28, 2005 at 05:58:15PM +0300, Pekka Pietikainen wrote:
> On Sun, Sep 25, 2005 at 11:19:45PM +0300, Pekka Pietikainen wrote:
> > Enabled, so this could be it. But 2.6.14-rc2-git4 did crash too (although
> > it did take a bit longer for that to happen), and the changelog does state:
> Ok, it looks like that patch was the thing after all. I now tried the latest
> fedora-devel kernel (1.1582, based on 2.6.14-rc2-git6) and the box has been
> running for a few hours happily. Could be the fedora kernel that claimed to
> be git4 actually wasn't, or the git4 changelog was really a post-git4
> changelog :). But anyway, bug is gone.

great news.

> This one is still around, so it's a different bug. Looks like it's a 64-bit
> issue, a 32-bit ping gives realistic ping times. tcpdump timestamps are also
> affected, they're completely off too. So looks like someone broke packet
> timestamps on 64-bit some time after 2.6.13.

luckily I'm not the core network maintainer ;)

-- 
- Harald Welte <laforge@gnumonks.org>          	        http://gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Funny timestamps (Was: Re: rwlock recursion on CPU#0, netfilter related?)
  2005-09-29 12:05       ` Harald Welte
@ 2005-09-30  6:49         ` Pekka Pietikainen
  0 siblings, 0 replies; 6+ messages in thread
From: Pekka Pietikainen @ 2005-09-30  6:49 UTC (permalink / raw)
  To: netdev

On Thu, Sep 29, 2005 at 02:05:38PM +0200, Harald Welte wrote:
> > This one is still around, so it's a different bug. Looks like it's a 64-bit
> > issue, a 32-bit ping gives realistic ping times. tcpdump timestamps are also
> > affected, they're completely off too. So looks like someone broke packet
> > timestamps on 64-bit some time after 2.6.13.
> 
> luckily I'm not the core network maintainer ;)
Here's the actual bug report:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=168166

Ended up being a userspace thing, maybe. But still makes me wonder what
change actually broke things. It must have been something soon after 2.6.13.
And there's still tcpdump, which doesn't seem to go into the problem mode
when I test it now, except that nothing should have changed in the
kernel/tcpdump/libpcap versions. Blah.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-09-30  6:49 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-25 10:58 rwlock recursion on CPU#0, netfilter related? Pekka Pietikainen
2005-09-25 13:43 ` Harald Welte
2005-09-25 20:19   ` Pekka Pietikainen
2005-09-28 14:58     ` Pekka Pietikainen
2005-09-29 12:05       ` Harald Welte
2005-09-30  6:49         ` Funny timestamps (Was: Re: rwlock recursion on CPU#0, netfilter related?) Pekka Pietikainen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).