Jarek Poplawski wrote: > Marcel Holtmann wrote, On 06/14/2008 02:35 PM: > ... > >> ======================================================= >> [ INFO: possible circular locking dependency detected ] >> 2.6.26-rc2 #5 >> ------------------------------------------------------- >> hcid/4136 is trying to acquire lock: >> (genl_mutex){--..}, at: [] .ctrl_dumpfamily+0x74/0x174 >> >> but task is already holding lock: >> (nlk->cb_mutex){--..}, at: [] .netlink_dump+0x58/0x27c >> >> which lock already depends on the new lock. >> > ... > > Hi, > > IMHO it looks like a real lockup threat. Probably it needs something > better, but for now here is my simplistic patch proposal for testing. > So we have: genl_rcv() : take genl_mutex genl_rcv_msg() : call netlink_dump_start() while holding genl_mutex netlink_dump_start(), netlink_dump() : take nlk->cb_mutex ctrl_dumpfamily() : try to detect this case and not take genl_mutex a second time netlink_rcv() : call netlink_dump netlink_dump : take nlk->cb_mutex ctrl_dumpfamily() : take genl_mutex which is a real bug. It seems the best fix is to use genl_mutex for the netlink cb_mutex, drop genl_mutex before calling netlink_dump_start and don't take it in ctrl_dumpfamily, relying completely on af_netlink.c for dump locking. Unfortunately this creates a race since the ops passed to netlink_dump_start are also protect by the mutex, so this patch is just for testing whether it fixes the warning. On second though - that race seems to be present already since the ops can be unregistered and the module unloaded while a dump is in progress.