From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752656Ab2GBNMF (ORCPT ); Mon, 2 Jul 2012 09:12:05 -0400 Received: from mail-ob0-f174.google.com ([209.85.214.174]:61091 "EHLO mail-ob0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750971Ab2GBNMD (ORCPT ); Mon, 2 Jul 2012 09:12:03 -0400 Message-ID: <1341234773.2958.4.camel@lappy> Subject: Re: rcu: BUG: spinlock recursion on CPU#3, trinity-child19/5970 From: Sasha Levin To: Peter Zijlstra Cc: paulmck@linux.vnet.ibm.com, Dave Jones , "linux-kernel@vger.kernel.org" Date: Mon, 02 Jul 2012 15:12:53 +0200 In-Reply-To: <1341230352.23484.7.camel@twins> References: <1340964584.2936.40.camel@lappy> <20120629172320.GA2416@linux.vnet.ibm.com> <1341006040.26928.4.camel@lappy> <1341225139.23484.4.camel@twins> <20120702113541.GI2907@linux.vnet.ibm.com> <1341230352.23484.7.camel@twins> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.2.3 Content-Transfer-Encoding: 7bit Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2012-07-02 at 13:59 +0200, Peter Zijlstra wrote: > It only needs moving back before that spin_release() thing. But unless > there's a reason to do it later, something like the below might be the > right thing. > > --- > kernel/sched/core.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 9bb7d28..7e20e6c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1914,6 +1914,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev, > sched_info_switch(prev, next); > perf_event_task_sched_out(prev, next); > fire_sched_out_preempt_notifiers(prev, next); > + rcu_switch_from(prev); > prepare_lock_switch(rq, next); > prepare_arch_switch(next); > trace_sched_switch(prev, next); > @@ -2082,7 +2083,6 @@ context_switch(struct rq *rq, struct task_struct *prev, > #endif > > /* Here we just switch the register state and the stack. */ > - rcu_switch_from(prev); > switch_to(prev, next, prev); > > barrier(); > With this patch applied, I'm seeing the following: [ 37.047889] ====================================================== [ 37.048018] [ INFO: possible circular locking dependency detected ] [ 37.048018] 3.5.0-rc5-next-20120702-sasha-00007-gefea299 #493 Tainted: G W [ 37.048018] ------------------------------------------------------- [ 37.048018] rcu_torture_rea/2545 is trying to acquire lock: [ 37.048018] (&rsp->gp_wq){..-...}, at: [] __wake_up+0x2d/0x70 [ 37.048018] [ 37.048018] but task is already holding lock: [ 37.048018] (&rq->lock){-.-.-.}, at: [] __schedule+0x1a0/0x870 [ 37.048018] [ 37.048018] which lock already depends on the new lock. [ 37.048018] [ 37.048018] [ 37.048018] the existing dependency chain (in reverse order) is: [ 37.048018] -> #2 (&rq->lock){-.-.-.}: [ 37.048018] [] validate_chain+0x6a3/0x790 [ 37.048018] [] __lock_acquire+0x413/0x4b0 [ 37.048018] [] lock_acquire+0x18a/0x1e0 [ 37.048018] [] _raw_spin_lock+0x3b/0x70 [ 37.048018] [] wake_up_new_task+0x131/0x270 [ 37.048018] [] do_fork+0x27a/0x300 [ 37.048018] [] kernel_thread+0x71/0x80 [ 37.048018] [] rest_init+0x21/0x140 [ 37.048018] [] start_kernel+0x3e1/0x3ee [ 37.048018] [] x86_64_start_reservations+0xff/0x104 [ 37.048018] [] x86_64_start_kernel+0x155/0x164 [ 37.048018] -> #1 (&p->pi_lock){-.-.-.}: [ 37.048018] [] validate_chain+0x6a3/0x790 [ 37.048018] [] __lock_acquire+0x413/0x4b0 [ 37.048018] [] lock_acquire+0x18a/0x1e0 [ 37.048018] [] _raw_spin_lock_irqsave+0x81/0xc0 [ 37.048018] [] try_to_wake_up+0x34/0x290 [ 37.048018] [] default_wake_function+0xd/0x10 [ 37.048018] [] autoremove_wake_function+0x18/0x40 [ 37.048018] [] __wake_up_common+0x52/0x90 [ 37.048018] [] __wake_up+0x43/0x70 [ 37.048018] [] rcu_start_gp+0x85/0xb0 [ 37.048018] [] __rcu_process_callbacks+0x142/0x170 [ 37.048018] [] rcu_process_callbacks+0x80/0xb0 [ 37.048018] [] __do_softirq+0x221/0x460 [ 37.048018] [] run_ksoftirqd+0xfd/0x230 [ 37.048018] [] kthread+0xb2/0xc0 [ 37.048018] [] kernel_thread_helper+0x4/0x10 [ 37.048018] -> #0 (&rsp->gp_wq){..-...}: [ 37.048018] [] check_prev_add+0x11f/0x4d0 [ 37.048018] [] validate_chain+0x6a3/0x790 [ 37.048018] [] __lock_acquire+0x413/0x4b0 [ 37.048018] [] lock_acquire+0x18a/0x1e0 [ 37.048018] [] _raw_spin_lock_irqsave+0x81/0xc0 [ 37.048018] [] __wake_up+0x2d/0x70 [ 37.048018] [] rcu_report_qs_rsp+0x73/0x80 [ 37.048018] [] rcu_report_qs_rnp+0x26d/0x2c0 [ 37.048018] [] rcu_report_unblock_qs_rnp+0x7e/0x90 [ 37.048018] [] rcu_read_unlock_special+0x330/0x400 [ 37.048018] [] rcu_preempt_note_context_switch+0x22a/0x300 [ 37.048018] [] __schedule+0x48c/0x870 [ 37.048018] [] preempt_schedule_irq+0x94/0xd0 [ 37.048018] [] retint_kernel+0x26/0x30 [ 37.048018] [] rcu_torture_read_unlock+0x5d/0x60 [ 37.048018] [] rcu_torture_reader+0x29d/0x380 [ 37.048018] [] kthread+0xb2/0xc0 [ 37.048018] [] kernel_thread_helper+0x4/0x10 [ 37.048018] [ 37.048018] other info that might help us debug this: [ 37.048018] [ 37.048018] Chain exists of: &rsp->gp_wq --> &p->pi_lock --> &rq->lock [ 37.048018] Possible unsafe locking scenario: [ 37.048018] [ 37.048018] CPU0 CPU1 [ 37.048018] ---- ---- [ 37.048018] lock(&rq->lock); [ 37.048018] lock(&p->pi_lock); [ 37.048018] lock(&rq->lock); [ 37.048018] lock(&rsp->gp_wq); [ 37.048018] [ 37.048018] *** DEADLOCK *** [ 37.048018] [ 37.048018] 1 lock held by rcu_torture_rea/2545: [ 37.048018] #0: (&rq->lock){-.-.-.}, at: [] __schedule+0x1a0/0x870 [ 37.048018] [ 37.048018] stack backtrace: [ 37.048018] Pid: 2545, comm: rcu_torture_rea Tainted: G W 3.5.0-rc5-next-20120702-sasha-00007-gefea299 #493 [ 37.048018] Call Trace: [ 37.048018] [] print_circular_bug+0x105/0x120 [ 37.048018] [] check_prev_add+0x11f/0x4d0 [ 37.048018] [] validate_chain+0x6a3/0x790 [ 37.048018] [] ? sched_clock_cpu+0x108/0x120 [ 37.048018] [] __lock_acquire+0x413/0x4b0 [ 37.048018] [] lock_acquire+0x18a/0x1e0 [ 37.048018] [] ? __wake_up+0x2d/0x70 [ 37.048018] [] _raw_spin_lock_irqsave+0x81/0xc0 [ 37.048018] [] ? __wake_up+0x2d/0x70 [ 37.048018] [] __wake_up+0x2d/0x70 [ 37.048018] [] rcu_report_qs_rsp+0x73/0x80 [ 37.048018] [] rcu_report_qs_rnp+0x26d/0x2c0 [ 37.048018] [] rcu_report_unblock_qs_rnp+0x7e/0x90 [ 37.048018] [] rcu_read_unlock_special+0x330/0x400 [ 37.048018] [] ? __lock_acquired+0x2a4/0x2e0 [ 37.048018] [] rcu_preempt_note_context_switch+0x22a/0x300 [ 37.048018] [] __schedule+0x48c/0x870 [ 37.048018] [] ? __schedule+0x83d/0x870 [ 37.048018] [] preempt_schedule_irq+0x94/0xd0 [ 37.048018] [] retint_kernel+0x26/0x30 [ 37.048018] [] ? rcu_preempt_qs+0x120/0x120 [ 37.048018] [] ? __rcu_read_unlock+0x4d/0xa0 [ 37.048018] [] rcu_torture_read_unlock+0x5d/0x60 [ 37.048018] [] rcu_torture_reader+0x29d/0x380 [ 37.048018] [] ? T.861+0x50/0x50 [ 37.048018] [] ? rcu_torture_read_unlock+0x60/0x60 [ 37.048018] [] kthread+0xb2/0xc0 [ 37.048018] [] kernel_thread_helper+0x4/0x10 [ 37.048018] [] ? retint_restore_args+0x13/0x13 [ 37.048018] [] ? __init_kthread_worker+0x70/0x70 [ 37.048018] [] ? gs_change+0x13/0x13