Patrick McHardy wrote:
> BORBELY Zoltan wrote:
>> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>>>> --- /tmp/nf_conntrack_netlink.c-orig    2008-09-29 
>>>> 23:28:55.000000000 +0200
>>>> +++ /tmp/nf_conntrack_netlink.c    2008-09-29 23:29:11.000000000 +0200
>>>> @@ -1177,8 +1177,8 @@
>>>>          ct->master = master_ct;
>>>>      }
>>>>  -    add_timer(&ct->timeout);
>>>>      nf_conntrack_hash_insert(ct);
>>>> +    add_timer(&ct->timeout);
>>>>      rcu_read_unlock();
>>> That code looks very fishy. We should be holding the conntrack lock,
>>> otherwise the addition is not only racy against the timer, but also
>>> against addition of identical conntracks. Let me look into what
>>> happened here.
>>
>> We have experienced a lot of kernel crashes, _every time_ in the
>> death_by_timeout() function while we were trying to add a new conntrack
>> entry from userspace via netlink (attached the disassembled version
>> of the function, ===> points to the EIP upon the crash). There was a
>> possibility, that we tried to add conntrack entries with zero timeout
>> value, maybe it's necessary to trigger this crash. The previous patch
>> has definitly solved the problem for us.
>>
>> I've got photos from various crashes, but it takes a little time to
>> find them. Please let me know if you want to see them.
> 
> Thats not necessary, the problem is pretty obvious, I was mainly
> wondering at what point we broke it.
> 
> I'll send you a patch soon.

Could you try whether this patch fixes the problem?

Pablo, do you recall the reason why the lock isn't held in
ctnetlink_create_conntrack()?