From mboxrd@z Thu Jan 1 00:00:00 1970 From: Oleg Nesterov Subject: Re: ipv6: tunnel: hang when destroying ipv6 tunnel Date: Sun, 1 Apr 2012 18:38:33 +0200 Message-ID: <20120401163833.GA29697@redhat.com> References: <1333227549.2325.4051.camel@edumazet-glaptop> <20120331213423.GA21219@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Eric Dumazet , davem@davemloft.net, kuznet@ms2.inr.ac.ru, jmorris@namei.org, yoshfuji@linux-ipv6.org, Patrick McHardy , netdev@vger.kernel.org, "linux-kernel@vger.kernel.org List" , Dave Jones , Tetsuo Handa To: Sasha Levin Return-path: Received: from mx1.redhat.com ([209.132.183.28]:60962 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752414Ab2DARSM (ORCPT ); Sun, 1 Apr 2012 13:18:12 -0400 Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On 04/01, Sasha Levin wrote: > > >> It would be nice to know what sysrq-t says, in particular the trace > >> of khelper thread is interesting. > > > > Sure, I'll get one when it happens again. > > So here's the stack of the usermode thread: Great, thanks, this is even better than khelper's trace, > [ 336.614015] [] schedule+0x24/0x70 > [ 336.614015] [] p9_client_rpc+0x13d/0x360 > [ 336.614015] [] ? wake_up_bit+0x40/0x40 > [ 336.614015] [] ? get_parent_ip+0x11/0x50 > [ 336.614015] [] ? sub_preempt_count+0x9d/0xd0 > [ 336.614015] [] p9_client_walk+0x8f/0x220 > [ 336.614015] [] v9fs_vfs_lookup+0xab/0x1c0 > [ 336.614015] [] d_alloc_and_lookup+0x40/0x80 > [ 336.614015] [] ? d_lookup+0x30/0x50 > [ 336.614015] [] do_lookup+0x28a/0x3b0 > [ 336.614015] [] ? security_inode_permission+0x17/0x20 > [ 336.614015] [] link_path_walk+0x167/0x420 > [ 336.614015] [] ? generic_readlink+0xb0/0xb0 > [ 336.614015] [] ? __raw_spin_lock_init+0x38/0x70 > [ 336.614015] [] path_openat+0xba/0x500 > [ 336.614015] [] ? sched_clock+0x13/0x20 > [ 336.614015] [] ? sched_clock_local+0x25/0x90 > [ 336.614015] [] ? sched_clock_cpu+0xd0/0x120 > [ 336.614015] [] do_filp_open+0x44/0xa0 > [ 336.614015] [] ? __lock_release+0x8d/0x1d0 > [ 336.614015] [] ? get_parent_ip+0x11/0x50 > [ 336.614015] [] ? sub_preempt_count+0x9d/0xd0 > [ 336.614015] [] ? _raw_spin_unlock+0x30/0x60 > [ 336.614015] [] open_exec+0x2d/0xf0 > [ 336.614015] [] do_execve_common+0x128/0x320 > [ 336.614015] [] do_execve+0x35/0x40 > [ 336.614015] [] sys_execve+0x45/0x70 > [ 336.614015] [] kernel_execve+0x68/0xd0 > [ 336.614015] [] ? ____call_usermodehelper+0xf6/0x130 > [ 336.614015] [] call_helper+0x19/0x20 > [ 336.614015] [] kernel_thread_helper+0x4/0x10 > [ 336.614015] [] ? finish_task_switch+0x80/0x110 > [ 336.614015] [] ? retint_restore_args+0x13/0x13 > [ 336.614015] [] ? ____call_usermodehelper+0x130/0x130 > [ 336.614015] [] ? gs_change+0x13/0x13 > > While it seems that 9p is the culprit, I have to point out that this > bug is easily reproducible, and it happens each time due to a > call_usermode_helper() call. Other than that 9p behaves perfectly and > I'd assume that I'd be seeing other things break besides > call_usermode_helper() related ones. Of course I do not know what happens, but at least this obviously explains why UMH_WAIT_EXEC hangs, I think call_usermodehelper_exec() itself is innocent. Oleg.