public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Josh Triplett <josh@joshtriplett.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Jones <davej@redhat.com>
Subject: Re: rcu: frequent rcu lockups
Date: Thu, 12 Mar 2015 07:07:49 -0700	[thread overview]
Message-ID: <20150312140749.GP5412@linux.vnet.ibm.com> (raw)
In-Reply-To: <55018745.3080109@oracle.com>

On Thu, Mar 12, 2015 at 08:32:05AM -0400, Sasha Levin wrote:
> On 03/12/2015 08:28 AM, Sasha Levin wrote:
> > On 03/11/2015 07:16 PM, Paul E. McKenney wrote:
> >> > On Wed, Mar 11, 2015 at 07:06:40PM -0400, Sasha Levin wrote:
> >>>> >> > On 03/11/2015 07:01 PM, Paul E. McKenney wrote:
> >>>>>>>> >>>> > >> With the commit I didn't hit it yet, but I do see 4 different WARNings:
> >>>>>> >>> > > I wish that I could say that I am surprised, but the sad fact is that
> >>>>>> >>> > > I am still shaking the bugs out.  
> >>>> >> > 
> >>>> >> > I have one more to add:
> >>>> >> > 
> >>>> >> > [   93.330539] WARNING: CPU: 1 PID: 8 at kernel/rcu/tree_plugin.h:476 rcu_gp_kthread+0x1eaa/0x4dd0()
> >> > A bit different, but still in the class of a combining-tree bitmask
> >> > handling bug.
> > I left it overnight, and am still seeing hangs. Although (and don't catch me
> > by that) it seems to be significantly less of them.
> 
> In one of the cases, KASan ended up complaining about something odd going on in
> rcu_check_callbacks():

Hmmm...  Looks like I have a synchronization blow between RCU CPU stall
warnings and task exit or some such.  Promising clue, thank you!

							Thanx, Paul

> [ 7170.779042] BUG: KASan: out of bounds on stack in sched_show_task+0x324/0x3d0 at addr ffff88075315fdd8
> [ 7170.779042] Read of size 8 by task trinity-c0/15862
> [ 7170.779042] page:ffffea001d4c57c0 count:0 mapcount:0 mapping:          (null) index:0x0
> [ 7170.779042] flags: 0x5afffff80000000()
> [ 7170.779042] page dumped because: kasan: bad access detected
> [ 7170.779042] CPU: 0 PID: 15862 Comm: trinity-c0 Tainted: G        W       4.0.0-rc3-next-20150311-sasha-00035-g0a45b48-dirty #2052
> [ 7170.779042]  ffff88075315fdd8 00000000ac401e1d ffff880050607cd8 ffffffff9da97e89
> [ 7170.779042]  1ffffd4003a98aff ffff880050607d68 ffff880050607d58 ffffffff9360eda4
> [ 7170.779042]  ffff880050607d98 ffffffff9da8c9df 0000000000000082 ffffffffa0260429
> [ 7170.779042] Call Trace:
> [ 7170.779042] <IRQ> dump_stack (lib/dump_stack.c:52)
> [ 7170.779042] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193)
> [ 7170.779042] ? printk (kernel/printk/printk.c:1854)
> [ 7170.779042] ? bitmap_weight (include/linux/bitmap.h:303)
> [ 7170.779042] __asan_report_load8_noabort (mm/kasan/report.c:230 mm/kasan/report.c:251)
> [ 7170.779042] ? sched_show_task (kernel/sched/core.c:4526)
> [ 7170.779042] sched_show_task (kernel/sched/core.c:4526)
> [ 7170.779042] rcu_check_callbacks (kernel/rcu/tree.c:1225 kernel/rcu/tree.c:1331 kernel/rcu/tree.c:3400 kernel/rcu/tree.c:3464 kernel/rcu/tree.c:2682)
> [ 7170.779042] update_process_times (./arch/x86/include/asm/preempt.h:22 kernel/time/timer.c:1386)
> [ 7170.779042] tick_periodic (kernel/time/tick-common.c:92)
> [ 7170.779042] ? tick_handle_periodic (kernel/time/tick-common.c:105)
> [ 7170.779042] tick_handle_periodic (kernel/time/tick-common.c:105)
> [ 7170.779042] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891)
> [ 7170.779042] ? irq_enter (kernel/softirq.c:338)
> [ 7170.779042] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:650 arch/x86/kernel/apic/apic.c:915)
> [ 7170.779042] apic_timer_interrupt (arch/x86/kernel/entry_64.S:920)
> [ 7170.779042] <EOI> ? do_raw_read_lock (include/asm-generic/qrwlock.h:104 kernel/locking/spinlock_debug.c:204)
> [ 7170.779042] _raw_read_lock (kernel/locking/spinlock.c:224)
> [ 7170.779042] ? do_wait (kernel/exit.c:1498 (discriminator 1))
> [ 7170.779042] do_wait (kernel/exit.c:1498 (discriminator 1))
> [ 7170.779042] ? wait_consider_task (kernel/exit.c:1465)
> [ 7170.779042] ? find_get_pid (kernel/pid.c:490)
> [ 7170.779042] SyS_wait4 (kernel/exit.c:1618 kernel/exit.c:1586)
> [ 7170.779042] ? SyS_waitid (kernel/exit.c:1586)
> [ 7170.779042] ? kill_orphaned_pgrp (kernel/exit.c:1444)
> [ 7170.779042] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592)
> [ 7170.779042] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
> [ 7170.779042] tracesys_phase2 (arch/x86/kernel/entry_64.S:347)
> [ 7170.779042] Memory state around the buggy address:
> [ 7170.779042]  ffff88075315fc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 7170.779042]  ffff88075315fd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [ 7170.779042] >ffff88075315fd80: 00 f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 f4 f4
> [ 7170.779042]                                                     ^
> [ 7170.779042]  ffff88075315fe00: f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1
> [ 7170.779042]  ffff88075315fe80: f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 f4 00
> 
> 
> Thanks,
> Sasha
> 


  reply	other threads:[~2015-03-12 15:55 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-11 19:57 rcu: frequent rcu lockups Sasha Levin
2015-03-11 20:17 ` Paul E. McKenney
2015-03-11 20:18   ` Sasha Levin
2015-03-11 20:41     ` Paul E. McKenney
2015-03-11 22:18       ` Sasha Levin
2015-03-11 23:01         ` Paul E. McKenney
2015-03-11 23:06           ` Sasha Levin
2015-03-11 23:16             ` Paul E. McKenney
2015-03-12 12:28               ` Sasha Levin
2015-03-12 12:32                 ` Sasha Levin
2015-03-12 14:07                   ` Paul E. McKenney [this message]
2015-03-13 17:39                     ` Paul E. McKenney
2015-03-15 20:43                       ` Paul E. McKenney
2015-03-15 23:32                         ` Sasha Levin
2015-03-16  0:02                           ` Paul E. McKenney
2015-03-20 14:15                             ` Sasha Levin
2015-03-20 14:29                               ` Paul E. McKenney
2015-03-20 14:46                                 ` Sasha Levin
2015-03-20 15:35                                   ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150312140749.GP5412@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=rostedt@goodmis.org \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox