Hi,
This mail list provides me a lot of information about problem and I want share solution to bad refcount on bridge.
Solution is applied on kernel
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f3abc9b963e004b8c96cd7fbee6fd905f2bfd620
commit f216f082b2b37c4943f1e7c393e2786648d48f6f
([NETFILTER]: bridge netfilter: deal with martians correctly)
added a refcount leak on in_dev.
Instead of using in_dev_get(), we can use __in_dev_get_rcu(),
as netfilter hooks are running under rcu_read_lock(), as pointed
by Patrick.
@@ -359,7 +359,7 @@ static int
br_nf_pre_routing_finish(struct sk_buff *skb)
},
.proto = 0,
};
- struct in_device *in_dev = in_dev_get(dev);
+ struct in_device *in_dev = __in_dev_get_rcu(dev);
Best Regards,
Jorge.
> Two other recent reports are:
> 1. Buggy applications that hold packets in their input queue forever,
> and/or netfilters. The socket buffer's contain a reference for
> packets in flight.
that may be it, but I am not sure which queue you are talking about,
but there is an application that is using the netfiler ip_queue to
queue packets to user space. in this application, these packets can
be held in user space for extended periods of time (up to 30/60
seconds), and then they are either dropped or released. Could this
possibly be creating a problem?
I don't believe that the system is using any of the VLAN code.
> I have found an appearant leak of a route object, which holds a
> reference
> to a device. I reproduced in both 2.6.11 and 2.6.13 using 802.1Q
> VLANs.
> I have a patch that will print out the place of the leaked reference
> against 2.6.13.
>
> http://www.candelatech.com/oss/rfcnt.patch
>
> Enable the feature in the Networking section of Kconfig.
Ben, i will incorporate this patch and let you know if i turn up any
results.
thanks,
--robert
On Aug 31, 2005, at 9:37 PM, Stephen Hemminger wrote:
> On Wed, 31 Aug 2005 19:04:01 -0700
> Robert Scott <rbscott at axentra.net> wrote:
>
>
>> Hello,
>>
>> I know that this bug has been discussed before at length on this
>> mailing list, but previous post seemed to indicate that it was fixed
>> before kernel 2.6.12. I am still seeing this occasionally in kernel
>> 2.6.12.3. The system is running knoppix, and IPV6 is not compiled
>> into the kernel(other posts mentioned numerous problems with the IPV6
>> code). But every so often, when bringing down the bridge (it doesn't
>> happen every time), the process hangs, and the following message
>> appears in dmesg repeatedly:
>>
>> 'unregister_netdevice: waiting for br0 to become free. Usage count
>> = 1'
>>
>> None of the processes involved can be killed, and an attempt to run
>> an ifconfig results in a process that is also waiting forever. At
>> this point the box must be rebooted forcefully.
>>
>> Two questions.
>> 1. In a previous post, someone mentioned one solution was to
>> commenting out the check that is hanging in the kernel. Does this
>> check preventing something terrible from happening(i assumed that it
>> does), or is it safe to remove it
>>
>
> Really bad idea, because if the thing that is holding the reference
> like packets stuck in some dead queue, ever get processed the kernel
> will die.
>
>
>> 2. Any ideas of something to try in order to make this repeatable?
>>
>
> Two other recent reports are:
> 1. Buggy applications that hold packets in their input queue forever,
> and/or netfilters. The socket buffer's contain a reference for
> packets in flight.
>
> 2. The VLAN code had a number of reference bugs, if you look through
> recent netdev mailing list you will see the discussion.
>