From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754450AbbCLM2r (ORCPT ); Thu, 12 Mar 2015 08:28:47 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:49670 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753130AbbCLM2m (ORCPT ); Thu, 12 Mar 2015 08:28:42 -0400 Message-ID: <55018663.2060107@oracle.com> Date: Thu, 12 Mar 2015 08:28:19 -0400 From: Sasha Levin User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: paulmck@linux.vnet.ibm.com CC: Josh Triplett , Steven Rostedt , Mathieu Desnoyers , Lai Jiangshan , LKML , Dave Jones Subject: Re: rcu: frequent rcu lockups References: <55009E2C.7070100@oracle.com> <20150311201704.GL5412@linux.vnet.ibm.com> <5500A329.9060509@oracle.com> <20150311204108.GM5412@linux.vnet.ibm.com> <5500BF51.5000806@oracle.com> <20150311230133.GN5412@linux.vnet.ibm.com> <5500CA80.9010007@oracle.com> <20150311231613.GO5412@linux.vnet.ibm.com> In-Reply-To: <20150311231613.GO5412@linux.vnet.ibm.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Source-IP: ucsinet21.oracle.com [156.151.31.93] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/11/2015 07:16 PM, Paul E. McKenney wrote: > On Wed, Mar 11, 2015 at 07:06:40PM -0400, Sasha Levin wrote: >> > On 03/11/2015 07:01 PM, Paul E. McKenney wrote: >>>> > >> With the commit I didn't hit it yet, but I do see 4 different WARNings: >>> > > I wish that I could say that I am surprised, but the sad fact is that >>> > > I am still shaking the bugs out. >> > >> > I have one more to add: >> > >> > [ 93.330539] WARNING: CPU: 1 PID: 8 at kernel/rcu/tree_plugin.h:476 rcu_gp_kthread+0x1eaa/0x4dd0() > A bit different, but still in the class of a combining-tree bitmask > handling bug. I left it overnight, and am still seeing hangs. Although (and don't catch me by that) it seems to be significantly less of them. [ 4423.001809] INFO: rcu_preempt detected stalls on CPUs/tasks: [ 4423.001809] Tasks blocked on level-1 rcu_node (CPUs 16-31): [ 4423.001809] (detected by 0, t=30502 jiffies, g=60989, c=60988, q=18648) [ 4423.001809] All QSes seen, last rcu_preempt kthread activity 1 (4295375352-4295375351), jiffies_till_next_fqs=1, root ->qsmask 0x2 [ 4423.001809] trinity-c0 R running task 27480 15862 9833 0x10080000 [ 4423.001809] 0000000000002669 00000000ac401e1d ffff880050607de8 ffffffff9327679b [ 4423.001809] ffff880050607db8 ffffffffa0b36000 0000000000000001 00000001000639f7 [ 4423.001809] ffffffffa0b351c8 dffffc0000000000 ffff880050622000 ffffffffa0721000 [ 4423.001809] Call Trace: [ 4423.001809] sched_show_task (kernel/sched/core.c:4542) [ 4423.001809] rcu_check_callbacks (kernel/rcu/tree.c:1225 kernel/rcu/tree.c:1331 kernel/rcu/tree.c:3400 kernel/rcu/tree.c:3464 kernel/rcu/tree.c:2682) [ 4423.001809] ? acct_account_cputime (kernel/tsacct.c:168) [ 4423.001809] update_process_times (./arch/x86/include/asm/preempt.h:22 kernel/time/timer.c:1386) [ 4423.001809] tick_periodic (kernel/time/tick-common.c:92) [ 4423.001809] ? tick_handle_periodic (kernel/time/tick-common.c:105) [ 4423.001809] tick_handle_periodic (kernel/time/tick-common.c:105) [ 4423.001809] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891) [ 4423.001809] ? irq_enter (kernel/softirq.c:338) [ 4423.001809] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:650 arch/x86/kernel/apic/apic.c:915) [ 4423.001809] apic_timer_interrupt (arch/x86/kernel/entry_64.S:920) [ 4423.001809] ? remove_wait_queue (include/linux/wait.h:145 kernel/sched/wait.c:50) [ 4423.001809] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:162 kernel/locking/spinlock.c:191) [ 4423.001809] remove_wait_queue (kernel/sched/wait.c:52) [ 4423.001809] do_wait (kernel/exit.c:1465 (discriminator 1)) [ 4423.001809] ? wait_consider_task (kernel/exit.c:1465) [ 4423.001809] ? find_get_pid (kernel/pid.c:490) [ 4423.001809] SyS_wait4 (kernel/exit.c:1618 kernel/exit.c:1586) [ 4423.001809] ? SyS_waitid (kernel/exit.c:1586) [ 4423.001809] ? kill_orphaned_pgrp (kernel/exit.c:1444) [ 4423.001809] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592) [ 4423.001809] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42) [ 4423.001809] tracesys_phase2 (arch/x86/kernel/entry_64.S:347) Thanks, Sasha