From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754242Ab2DADDC (ORCPT ); Sat, 31 Mar 2012 23:03:02 -0400 Received: from e37.co.us.ibm.com ([32.97.110.158]:50802 "EHLO e37.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754123Ab2DADC7 (ORCPT ); Sat, 31 Mar 2012 23:02:59 -0400 Date: Sat, 31 Mar 2012 20:02:46 -0700 From: "Paul E. McKenney" To: Paul Rolland Cc: linux-kernel@vger.kernel.org Subject: Re: Blocked in keyctl(...) ??? - RCU related ??? Message-ID: <20120401030246.GQ2450@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20120312170301.5982e054@tux.DEF.witbe.net> <20120316094328.6b37a4d9@tux.DEF.witbe.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20120316094328.6b37a4d9@tux.DEF.witbe.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12040103-7408-0000-0000-000003DFF59E Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 16, 2012 at 09:43:28AM +0100, Paul Rolland wrote: > Hello, > > On Mon, 12 Mar 2012 17:03:01 +0100 > Paul Rolland (ポール・ロラン) wrote: > > > Sometimes, my applications start "locking": nothing happens anymore, can't > > kill them... > > I manage to strace them, to find that they are executing the following > > syscall: > > > > 13686 keyctl(0x1, 0, 0xffffffffffffffff, 0x7fa90fd6a2c0, 0 > > > > and that syscall never returns. > > OK, this happened again, while running 3.3-rc7... I've used sysrq to > collect some informations and what I can see from that is the following : > > Mar 16 08:40:02 tux kernel: Call Trace: > Mar 16 08:40:02 tux kernel: [] ? extract_buf+0x91/0x140 > Mar 16 08:40:02 tux kernel: [] schedule+0x3a/0x50 > Mar 16 08:40:02 tux kernel: [] schedule_timeout+0x1ed/0x260 > Mar 16 08:40:02 tux kernel: [] ? get_parent_ip+0x11/0x50 > Mar 16 08:40:02 tux kernel: [] ? get_parent_ip+0x11/0x50 > Mar 16 08:40:02 tux kernel: [] ? get_parent_ip+0x11/0x50 > Mar 16 08:40:02 tux kernel: [] wait_for_common+0xc0/0x160 > Mar 16 08:40:02 tux kernel: [] ? try_to_wake_up+0x2e0/0x2e0 > Mar 16 08:40:02 tux kernel: [] ? __call_rcu+0x150/0x150 > Mar 16 08:40:02 tux kernel: [] wait_for_completion+0x18/0x20 > Mar 16 08:40:02 tux kernel: [] wait_rcu_gp+0x48/0x50 > Mar 16 08:40:02 tux kernel: [] ? wait_rcu_gp+0x50/0x50 > Mar 16 08:40:02 tux kernel: [] synchronize_rcu+0x1c/0x20 > Mar 16 08:40:02 tux kernel: [] install_session_keyring_to_cred+0x59/0xc0 > Mar 16 08:40:02 tux kernel: [] join_session_keyring+0x12a/0x170 > Mar 16 08:40:02 tux kernel: [] keyctl_join_session_keyring+0x32/0x60 > Mar 16 08:40:02 tux kernel: [] sys_keyctl+0x168/0x180 > Mar 16 08:40:02 tux kernel: [] > system_call_fastpath+0x16/0x1b > > The complete traces collected in /var/log/messages is attached to this email. > > Is there anything I can do to investigate this ? >>From your message output, it looks like something is stalling RCU grace periods, possibly due to a task being preempted in an RCU read-side critical section (this assumes you are running CONFIG_PREEMPT=y). Could you please try reproducing with CONFIG_RCU_CPU_STALL_VERBOSE=y? This would identify tasks stalling the RCU grace period. Thanx, Paul