From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Josh Triplett <josh@joshtriplett.org>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <laijs@cn.fujitsu.com>,
LKML <linux-kernel@vger.kernel.org>,
Dave Jones <davej@redhat.com>
Subject: Re: rcu: frequent rcu lockups
Date: Fri, 13 Mar 2015 10:39:43 -0700 [thread overview]
Message-ID: <20150313173943.GA2671@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150312140749.GP5412@linux.vnet.ibm.com>
On Thu, Mar 12, 2015 at 07:07:49AM -0700, Paul E. McKenney wrote:
> On Thu, Mar 12, 2015 at 08:32:05AM -0400, Sasha Levin wrote:
> > On 03/12/2015 08:28 AM, Sasha Levin wrote:
> > > On 03/11/2015 07:16 PM, Paul E. McKenney wrote:
> > >> > On Wed, Mar 11, 2015 at 07:06:40PM -0400, Sasha Levin wrote:
> > >>>> >> > On 03/11/2015 07:01 PM, Paul E. McKenney wrote:
> > >>>>>>>> >>>> > >> With the commit I didn't hit it yet, but I do see 4 different WARNings:
> > >>>>>> >>> > > I wish that I could say that I am surprised, but the sad fact is that
> > >>>>>> >>> > > I am still shaking the bugs out.
> > >>>> >> >
> > >>>> >> > I have one more to add:
> > >>>> >> >
> > >>>> >> > [ 93.330539] WARNING: CPU: 1 PID: 8 at kernel/rcu/tree_plugin.h:476 rcu_gp_kthread+0x1eaa/0x4dd0()
> > >> > A bit different, but still in the class of a combining-tree bitmask
> > >> > handling bug.
> > > I left it overnight, and am still seeing hangs. Although (and don't catch me
> > > by that) it seems to be significantly less of them.
> >
> > In one of the cases, KASan ended up complaining about something odd going on in
> > rcu_check_callbacks():
>
> Hmmm... Looks like I have a synchronization blow between RCU CPU stall
> warnings and task exit or some such. Promising clue, thank you!
But I am not seeing it. There should be no synchronization issues
with sched_show_task(current), as the current task cannot exit while
it is calling sched_show_task(current). There is a remote-task
sched_show_task(t), but it is invoked only on tasks that are on RCU's
list of tasks that have blocked within an RCU read-side critical section
during the current grace period. Such a task might exit while in in this
critical section, but in that case exit_rcu() will take care of this, and
it will acquire the same lock held across the call to sched_show_task(t).
I suppose this could happen if someone re-enabled preemption late
in the exit() code path and then entered an RCU read-side critical
section, but that would be a bad breakage of the exit() path.
Am I missing something here?
Thanx, Paul
> > [ 7170.779042] BUG: KASan: out of bounds on stack in sched_show_task+0x324/0x3d0 at addr ffff88075315fdd8
> > [ 7170.779042] Read of size 8 by task trinity-c0/15862
> > [ 7170.779042] page:ffffea001d4c57c0 count:0 mapcount:0 mapping: (null) index:0x0
> > [ 7170.779042] flags: 0x5afffff80000000()
> > [ 7170.779042] page dumped because: kasan: bad access detected
> > [ 7170.779042] CPU: 0 PID: 15862 Comm: trinity-c0 Tainted: G W 4.0.0-rc3-next-20150311-sasha-00035-g0a45b48-dirty #2052
> > [ 7170.779042] ffff88075315fdd8 00000000ac401e1d ffff880050607cd8 ffffffff9da97e89
> > [ 7170.779042] 1ffffd4003a98aff ffff880050607d68 ffff880050607d58 ffffffff9360eda4
> > [ 7170.779042] ffff880050607d98 ffffffff9da8c9df 0000000000000082 ffffffffa0260429
> > [ 7170.779042] Call Trace:
> > [ 7170.779042] <IRQ> dump_stack (lib/dump_stack.c:52)
> > [ 7170.779042] kasan_report_error (mm/kasan/report.c:132 mm/kasan/report.c:193)
> > [ 7170.779042] ? printk (kernel/printk/printk.c:1854)
> > [ 7170.779042] ? bitmap_weight (include/linux/bitmap.h:303)
> > [ 7170.779042] __asan_report_load8_noabort (mm/kasan/report.c:230 mm/kasan/report.c:251)
> > [ 7170.779042] ? sched_show_task (kernel/sched/core.c:4526)
> > [ 7170.779042] sched_show_task (kernel/sched/core.c:4526)
> > [ 7170.779042] rcu_check_callbacks (kernel/rcu/tree.c:1225 kernel/rcu/tree.c:1331 kernel/rcu/tree.c:3400 kernel/rcu/tree.c:3464 kernel/rcu/tree.c:2682)
> > [ 7170.779042] update_process_times (./arch/x86/include/asm/preempt.h:22 kernel/time/timer.c:1386)
> > [ 7170.779042] tick_periodic (kernel/time/tick-common.c:92)
> > [ 7170.779042] ? tick_handle_periodic (kernel/time/tick-common.c:105)
> > [ 7170.779042] tick_handle_periodic (kernel/time/tick-common.c:105)
> > [ 7170.779042] local_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:891)
> > [ 7170.779042] ? irq_enter (kernel/softirq.c:338)
> > [ 7170.779042] smp_apic_timer_interrupt (./arch/x86/include/asm/apic.h:650 arch/x86/kernel/apic/apic.c:915)
> > [ 7170.779042] apic_timer_interrupt (arch/x86/kernel/entry_64.S:920)
> > [ 7170.779042] <EOI> ? do_raw_read_lock (include/asm-generic/qrwlock.h:104 kernel/locking/spinlock_debug.c:204)
> > [ 7170.779042] _raw_read_lock (kernel/locking/spinlock.c:224)
> > [ 7170.779042] ? do_wait (kernel/exit.c:1498 (discriminator 1))
> > [ 7170.779042] do_wait (kernel/exit.c:1498 (discriminator 1))
> > [ 7170.779042] ? wait_consider_task (kernel/exit.c:1465)
> > [ 7170.779042] ? find_get_pid (kernel/pid.c:490)
> > [ 7170.779042] SyS_wait4 (kernel/exit.c:1618 kernel/exit.c:1586)
> > [ 7170.779042] ? SyS_waitid (kernel/exit.c:1586)
> > [ 7170.779042] ? kill_orphaned_pgrp (kernel/exit.c:1444)
> > [ 7170.779042] ? syscall_trace_enter_phase2 (arch/x86/kernel/ptrace.c:1592)
> > [ 7170.779042] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42)
> > [ 7170.779042] tracesys_phase2 (arch/x86/kernel/entry_64.S:347)
> > [ 7170.779042] Memory state around the buggy address:
> > [ 7170.779042] ffff88075315fc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > [ 7170.779042] ffff88075315fd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > [ 7170.779042] >ffff88075315fd80: 00 f1 f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 f4 f4
> > [ 7170.779042] ^
> > [ 7170.779042] ffff88075315fe00: f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1
> > [ 7170.779042] ffff88075315fe80: f1 f1 f1 00 00 00 00 00 00 00 00 00 00 00 f4 00
> >
> >
> > Thanks,
> > Sasha
> >
next prev parent reply other threads:[~2015-03-13 17:39 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-11 19:57 rcu: frequent rcu lockups Sasha Levin
2015-03-11 20:17 ` Paul E. McKenney
2015-03-11 20:18 ` Sasha Levin
2015-03-11 20:41 ` Paul E. McKenney
2015-03-11 22:18 ` Sasha Levin
2015-03-11 23:01 ` Paul E. McKenney
2015-03-11 23:06 ` Sasha Levin
2015-03-11 23:16 ` Paul E. McKenney
2015-03-12 12:28 ` Sasha Levin
2015-03-12 12:32 ` Sasha Levin
2015-03-12 14:07 ` Paul E. McKenney
2015-03-13 17:39 ` Paul E. McKenney [this message]
2015-03-15 20:43 ` Paul E. McKenney
2015-03-15 23:32 ` Sasha Levin
2015-03-16 0:02 ` Paul E. McKenney
2015-03-20 14:15 ` Sasha Levin
2015-03-20 14:29 ` Paul E. McKenney
2015-03-20 14:46 ` Sasha Levin
2015-03-20 15:35 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150313173943.GA2671@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=davej@redhat.com \
--cc=josh@joshtriplett.org \
--cc=laijs@cn.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=rostedt@goodmis.org \
--cc=sasha.levin@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.