Patrick McHardy wrote: > BORBELY Zoltan wrote: >> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote: >>>> --- /tmp/nf_conntrack_netlink.c-orig 2008-09-29 >>>> 23:28:55.000000000 +0200 >>>> +++ /tmp/nf_conntrack_netlink.c 2008-09-29 23:29:11.000000000 +0200 >>>> @@ -1177,8 +1177,8 @@ >>>> ct->master = master_ct; >>>> } >>>> - add_timer(&ct->timeout); >>>> nf_conntrack_hash_insert(ct); >>>> + add_timer(&ct->timeout); >>>> rcu_read_unlock(); >>> That code looks very fishy. We should be holding the conntrack lock, >>> otherwise the addition is not only racy against the timer, but also >>> against addition of identical conntracks. Let me look into what >>> happened here. >> >> We have experienced a lot of kernel crashes, _every time_ in the >> death_by_timeout() function while we were trying to add a new conntrack >> entry from userspace via netlink (attached the disassembled version >> of the function, ===> points to the EIP upon the crash). There was a >> possibility, that we tried to add conntrack entries with zero timeout >> value, maybe it's necessary to trigger this crash. The previous patch >> has definitly solved the problem for us. >> >> I've got photos from various crashes, but it takes a little time to >> find them. Please let me know if you want to see them. > > Thats not necessary, the problem is pretty obvious, I was mainly > wondering at what point we broke it. > > I'll send you a patch soon. Could you try whether this patch fixes the problem? Pablo, do you recall the reason why the lock isn't held in ctnetlink_create_conntrack()?