From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ying Xue Subject: Re: [PATCH] tipc: fix a lockdep warning Date: Fri, 22 Nov 2013 16:35:17 +0800 Message-ID: <528F1745.4050202@windriver.com> References: <528F136E.1030506@huawei.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net, dingtianhong@huawei.com To: wangweidong , , , , "Paul.Gortmaker@windriver.com" Return-path: In-Reply-To: <528F136E.1030506@huawei.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: tipc-discussion-bounces@lists.sourceforge.net List-Id: netdev.vger.kernel.org Hi Weidong, I know you ever posted below patch into netdev mail list last month. And at that moment I also told you we already had one better solution to fix the issue. However, after our TIPC internal development team went through a long and bitter dispute about that "better" solution, we eventually made a consensus by proposing below approach to fix your met issue: http://thread.gmane.org/gmane.network.tipc.general/4926/ I think the patchset should be ready now. Sorry for your inconvenience. Regards, Ying On 11/22/2013 04:18 PM, wangweidong wrote: > PC1:tipc-config -netid=1234 -a=1.1.2 -be=eth:eth0/1.1.0 > PC2:tipc-config -netid=1234 -a=1.1.3 -be=eth:eth0/1.1.0 > > I used a server code Like this: > ---------------- > sk=socket(AF_TIPC,SOCK_RDM,0); > bind(sk, &addr, len); > while(1) { > recvfrom(sk,...); > ... > sendto(sk,...); > } > ---------------- > > when I did ./server in PC1, I got a lockdep as bellow: > > ====================================================== > [ INFO: possible circular locking dependency detected ] > 3.12.0-lockdep+ #4 Not tainted > ------------------------------------------------------- > server/3772 is trying to acquire lock: > (tipc_net_lock){++.-..}, at: [] tipc_link_send+0x2f/0xc0 [tipc] > > but task is already holding lock: > (tipc_nametbl_lock){++--..}, at: [] tipc_nametbl_publish+0x46/0xc0 [tipc] > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > -> #1 (tipc_nametbl_lock){++--..}: > [] validate_chain+0x6a7/0x7d0 > [] __lock_acquire+0x361/0x610 > [] lock_acquire+0xe2/0x110 > [] _raw_write_lock_bh+0x31/0x40 > [] tipc_named_reinit+0x10/0x70 [tipc] > [] tipc_net_start+0x22/0x80 [tipc] > [] tipc_core_start_net+0xe/0x40 [tipc] > [] cfg_set_own_addr+0x75/0xc0 [tipc] > [] tipc_cfg_do_cmd+0x135/0x550 [tipc] > [] handle_cmd+0x49/0xe0 [tipc] > [] genl_family_rcv_msg+0x22d/0x3c0 > [] genl_rcv_msg+0x70/0xd0 > [] netlink_rcv_skb+0x89/0xb0 > [] genl_rcv+0x27/0x40 > [] netlink_unicast+0x14e/0x1a0 > [] netlink_sendmsg+0x245/0x420 > [] __sock_sendmsg+0x66/0x80 > [] sock_aio_write+0xb2/0xc0 > [] do_sync_write+0x60/0x90 > [] vfs_write+0x1d1/0x1e0 > [] SyS_write+0x5d/0xa0 > [] system_call_fastpath+0x16/0x1b > > -> #0 (tipc_net_lock){++.-..}: > [] check_prev_add+0x41e/0x490 > [] validate_chain+0x6a7/0x7d0 > [] __lock_acquire+0x361/0x610 > [] lock_acquire+0xe2/0x110 > [] _raw_read_lock_bh+0x34/0x50 > [] tipc_link_send+0x2f/0xc0 [tipc] > [] named_cluster_distribute+0x6b/0x80 [tipc] > [] tipc_named_publish+0x7b/0x90 [tipc] > [] tipc_nametbl_publish+0x7b/0xc0 [tipc] > [] tipc_publish+0x98/0xf0 [tipc] > [] bind+0x78/0xb0 [tipc] > [] SyS_bind+0xb0/0xd0 > [] system_call_fastpath+0x16/0x1b > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(tipc_nametbl_lock); > lock(tipc_net_lock); > lock(tipc_nametbl_lock); > lock(tipc_net_lock); > > *** DEADLOCK *** > ---------------------------------------------------------- > > problem is that tipc_nametbl_publish which will hold tipc_nametbl_lock > and acquire tipc_net_lock, while the tipc_net_start which hold > tipc_net_lock and acquir tipc_nametbl_lock, so the dead lock occurs. > > tipc_link_send protected by tipc_net_lock, we can unlock the > tipc_nametbl_lock, and it no need the tipc_nametbl_lock to protect it. > so I just unlock the tbl_lock before it, and lock the tbl_lock after it. > > Signed-off-by: Wang Weidong > --- > net/tipc/name_distr.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/tipc/name_distr.c b/net/tipc/name_distr.c > index e0d0805..ab8f96c 100644 > --- a/net/tipc/name_distr.c > +++ b/net/tipc/name_distr.c > @@ -138,7 +138,9 @@ static void named_cluster_distribute(struct sk_buff *buf) > if (!buf_copy) > break; > msg_set_destnode(buf_msg(buf_copy), n_ptr->addr); > + write_unlock_bh(&tipc_nametbl_lock); > tipc_link_send(buf_copy, n_ptr->addr, n_ptr->addr); > + write_lock_bh(&tipc_nametbl_lock); > } > } > > ------------------------------------------------------------------------------ Shape the Mobile Experience: Free Subscription Software experts and developers: Be at the forefront of tech innovation. Intel(R) Software Adrenaline delivers strategic insight and game-changing conversations that shape the rapidly evolving mobile landscape. Sign up now. http://pubads.g.doubleclick.net/gampad/clk?id=63431311&iu=/4140/ostg.clktrk