public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ed Tomlinson <edt@aei.ca>
To: paulmck@linux.vnet.ibm.com
Cc: linux-kernel@vger.kernel.org, mingo@elte.hu,
	laijs@cn.fujitsu.com, dipankar@in.ibm.com,
	akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
	josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
	peterz@infradead.org, rostedt@goodmis.org,
	Valdis.Kletnieks@vt.edu, dhowells@redhat.com,
	eric.dumazet@gmail.com, darren@dvhart.com, patches@linaro.org,
	greearb@candelatech.com
Subject: Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck
Date: Tue, 19 Jul 2011 22:07:32 -0400	[thread overview]
Message-ID: <201107192207.33813.edt@aei.ca> (raw)
In-Reply-To: <201107192130.02080.edt@aei.ca>

On Tuesday 19 July 2011 21:30:00 Ed Tomlinson wrote:
> On Tuesday 19 July 2011 20:17:38 Paul E. McKenney wrote:

Paul,

Two things to add.  Its possible an eariler error is missing from the log below.  My serial console was missing
entries from before '0 and 40ish' and I did think I saw a bunch of FFFFFFFs scroll by eariler...

Second, booting with threadirqs and boost enabled in .config with patches 2,5,6 is ok so far.

Thanks,
Ed

> Pulling this on top of master and rebuilding with boost enabled and booting with threadirqs does not
> boot correctly here.  
> 
>  * Starting APC UPS daemon ...
> [   46.007217]
> [   46.007221] =================================
> [   46.008013] [ INFO: inconsistent lock state ]
> [   46.008013] 3.0.0-rc7-crc+ #340
> [   46.008013] ---------------------------------
> [   46.008013] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> [   46.008013] ip/2982 [HC0[0]:SC0[0]:HE1:SE1] takes:
> [   46.008013]  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
> [   46.008013] {IN-SOFTIRQ-W} state was registered at:
> [   46.008013]   [<ffffffff8108a1e5>] __lock_acquire+0x5a5/0x16a0
> [   46.008013]   [<ffffffff8108b8d5>] lock_acquire+0x95/0x140
> [   46.008013]   [<ffffffff815793a6>] _raw_spin_lock_irqsave+0x46/0x60
> [   46.008013]   [<ffffffff810bbf80>] __rcu_process_callbacks+0x190/0x2a0
> [   46.008013]   [<ffffffff810bc0ed>] rcu_process_callbacks+0x5d/0x60
> [   46.008013]   [<ffffffff81052e14>] __do_softirq+0x194/0x300
> [   46.008013]   [<ffffffff8105311d>] run_ksoftirqd+0x19d/0x240
> [   46.008013]   [<ffffffff81071296>] kthread+0xb6/0xc0
> [   46.008013]   [<ffffffff81582294>] kernel_thread_helper+0x4/0x10
> [   46.008013] irq event stamp: 6802
> [   46.008013] hardirqs last  enabled at (6801): [<ffffffff815776bd>] __mutex_unlock_slowpath+0x10d/0x1c0
> [   46.008013] hardirqs last disabled at (6802): [<ffffffff8157937c>] _raw_spin_lock_irqsave+0x1c/0x60
> [   46.008013] softirqs last  enabled at (6698): [<ffffffff814c274a>] sk_common_release+0x6a/0x120
> [   46.008013] softirqs last disabled at (6696): [<ffffffff81579476>] _raw_write_lock_bh+0x16/0x50
> [   46.008013]
> [   46.008013] other info that might help us debug this:
> [   46.008013]  Possible unsafe locking scenario:
> [   46.008013]
> [   46.008013]        CPU0
> [   46.008013]        ----
> [   46.008013]   lock(rcu_node_level_0);
> [   46.008013]   <Interrupt>
> [   46.008013]     lock(rcu_node_level_0);
> [   46.008013]
> [   46.008013]  *** DEADLOCK ***
> [   46.008013]
> [   46.008013] 3 locks held by ip/2982:
> [   46.008013]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff814e4fd7>] rtnl_lock+0x17/0x20
> [   46.008013]  #1:  (sync_rcu_preempt_exp_mutex){+.+...}, at: [<ffffffff810bc9e7>] synchronize_rcu_expedited+0x37/0x210
> [   46.008013]  #2:  (rcu_node_level_0){+.?...}, at: [<ffffffff810ba279>] rcu_report_exp_rnp+0x19/0x50
> [   46.008013]
> [   46.008013] stack backtrace:
> [   46.008013] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
> [   46.008013] Call Trace:
> [   46.008013]  [<ffffffff81088c33>] print_usage_bug+0x223/0x270
> [   46.008013]  [<ffffffff81089558>] mark_lock+0x328/0x400
> [   46.008013]  [<ffffffff8108969e>] mark_held_locks+0x6e/0xa0
> [   46.008013]  [<ffffffff81579a5d>] ? _raw_spin_unlock_irqrestore+0x7d/0x90
> [   46.008013]  [<ffffffff8108999d>] trace_hardirqs_on_caller+0x14d/0x190
> [   46.008013]  [<ffffffff810899ed>] trace_hardirqs_on+0xd/0x10
> [   46.008013]  [<ffffffff81579a5d>] _raw_spin_unlock_irqrestore+0x7d/0x90
> [   46.008013]  [<ffffffff810bcb18>] synchronize_rcu_expedited+0x168/0x210
> [   46.008013]  [<ffffffff8111a073>] ? might_fault+0x53/0xb0
> [   46.008013]  [<ffffffff8125200a>] ? security_capable+0x2a/0x30
> [   46.008013]  [<ffffffff814d0985>] synchronize_net+0x45/0x50
> [   46.008013]  [<ffffffffa0359613>] ipip6_tunnel_ioctl+0x5f3/0x800 [sit]
> [   46.008013]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit] 
> [   46.008013]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
> [   46.008013]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
> [   46.008013]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
> [   46.008013]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
> [   46.008013]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
> [   46.008013]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
> [   46.008013]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
> [   46.008013]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
> [   46.008013]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
> [   46.008013]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
> [   46.008013]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
> [   46.008013]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b
> [   46.786123] BUG: sleeping function called from invalid context at net/ipv4/route.c:757
> [   46.799548] in_atomic(): 1, irqs_disabled(): 0, pid: 2982, name: ip
> [   46.799553] INFO: lockdep is turned off.
> [   46.799561] Pid: 2982, comm: ip Tainted: G        W   3.0.0-rc7-crc+ #340
> [   46.799565] Call Trace:
> [   46.799576]  [<ffffffff81036b39>] __might_sleep+0xf9/0x120
> [   46.799586]  [<ffffffff814fea38>] rt_do_flush+0x178/0x1b0
> [   46.799594]  [<ffffffff814feafc>] rt_cache_flush+0x5c/0x70
> [   46.799606]  [<ffffffff81542d92>] fib_netdev_event+0x72/0xf0
> [   46.799615]  [<ffffffff8157d576>] notifier_call_chain+0x56/0x80
> [   46.799625]  [<ffffffff81077ff6>] raw_notifier_call_chain+0x16/0x20
> [   46.799633]  [<ffffffff814d1cd4>] call_netdevice_notifiers+0x74/0x90
> [   46.799642]  [<ffffffff814d2c37>] netdev_state_change+0x27/0x40
> [   46.799653]  [<ffffffffa0359686>] ipip6_tunnel_ioctl+0x666/0x800 [sit]
> [   46.799663]  [<ffffffffa0359020>] ? ipip6_tunnel_locate+0x2e0/0x2e0 [sit]
> [   46.799673]  [<ffffffff814d43ba>] dev_ifsioc+0x11a/0x2c0
> [   46.799682]  [<ffffffff814d678a>] dev_ioctl+0x35a/0x810
> [   46.799690]  [<ffffffff81089a8d>] ? debug_check_no_locks_freed+0x9d/0x150
> [   46.799700]  [<ffffffff8108999d>] ? trace_hardirqs_on_caller+0x14d/0x190
> [   46.799707]  [<ffffffff810899ed>] ? trace_hardirqs_on+0xd/0x10
> [   46.799718]  [<ffffffff814bb83a>] sock_ioctl+0xea/0x2b0
> [   46.799726]  [<ffffffff81162474>] do_vfs_ioctl+0xa4/0x5a0
> [   46.799735]  [<ffffffff815799cc>] ? _raw_spin_unlock+0x5c/0x70
> [   46.799744]  [<ffffffff8114c1cc>] ? fd_install+0x7c/0x90
> [   46.799752]  [<ffffffff8158149c>] ? sysret_check+0x27/0x62
> [   46.799760]  [<ffffffff81162a09>] sys_ioctl+0x99/0xa0
> [   46.799769]  [<ffffffff8158146b>] system_call_fastpath+0x16/0x1b
> 
> Followed by more BUGs and yielding a box that needed in interrupt button pressed to
> force a reboot.   
> 
> I've previously tested with patches 5 and 6 from Peter Z with good results with threadirqs 
> with boost disabled.
> 
> Ed
> 
> > This patch set contains fixes for a trainwreck involving RCU, the
> > scheduler, and threaded interrupts.  This trainwreck involved RCU failing
> > to properly protect one of its bit fields, use of RCU by the scheduler
> > from portions of irq_exit() where in_irq() returns false, uses of the
> > scheduler by RCU colliding with uses of RCU by the scheduler, threaded
> > interrupts exercising the problematic portions of irq_exit() more heavily,
> > and so on.  The patches are as follows:
> > 
> > 1.	Properly protect current->rcu_read_unlock_special().
> > 	Lack of protection was causing RCU to recurse on itself, which
> > 	in turn resulted in deadlocks involving RCU and the scheduler.
> > 	This affects only RCU_BOOST=y configurations.
> > 2.	Streamline code produced by __rcu_read_unlock().  This one is
> > 	an innocent bystander that is being carried due to conflicts
> > 	with other patches.  (A later version will likely merge it
> > 	with #3 below.)
> > 3.	Make __rcu_read_unlock() delay counting the per-task
> > 	->rcu_read_lock_nesting variable to zero until all cleanup for the
> > 	just-ended RCU read-side critical section has completed.  This
> > 	prevents a number of other cases that could result in deadlock
> > 	due to self recursion.	This affects only TREE_PREEMPT_RCU=y
> > 	configurations.
> > 4.	Make scheduler_ipi() correctly identify itself as being
> > 	in_irq() when it needs to do anything that might involve RCU,
> > 	thus enabling RCU to avoid yet another class of potential
> > 	self-recursions and deadlocks.	This affects PREEMPT_RCU=y
> > 	configurations.
> > 5.	Make irq_exit() inform RCU when it is invoking the scheduler
> > 	in situations where in_irq() would return false, thus
> > 	allowing RCU to correctly avoid self-recursion.  This affects
> > 	TREE_PREEMPT_RCU=y configurations.
> > 6.	Make __lock_task_sighand() execute the entire RCU read-side
> > 	critical section with irqs disabled.  (An experimental patch at
> > 	http://marc.info/?l=linux-kernel&m=131110647222185 might possibly
> > 	make it legal to have an RCU read-side critical section where
> > 	the rcu_read_unlock() is executed with interrupts disabled,
> > 	but where some protion of the RCU read-side critical section
> > 	was preemptible.)  This affects TREE_PREEMPT_RCU=y configurations.
> > 
> > TINY_PREEMPT_RCU will also need a few of these changes, but in the
> > meantime this patch stack helps organize things better for testing.
> > These are also available from the following subject-to-rebase git branch:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git rcu/urgent
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> >  b/include/linux/sched.h   |    3 +++
> >  b/kernel/rcutree_plugin.h |    3 ++-
> >  b/kernel/sched.c          |   44 ++++++++++++++++++++++++++++++++++++++------
> >  b/kernel/signal.c         |   16 +++++++++++-----
> >  b/kernel/softirq.c        |   12 ++++++++++--
> >  kernel/rcutree_plugin.h   |   45 ++++++++++++++++++++++++++++++++-------------
> >  6 files changed, 96 insertions(+), 27 deletions(-)
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

  reply	other threads:[~2011-07-20  2:07 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-20  0:17 [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
2011-07-20  2:40   ` Peter Zijlstra
2011-07-20  4:54     ` Paul E. McKenney
2011-07-20 11:23       ` Ed Tomlinson
2011-07-20 11:31         ` Ed Tomlinson
2011-07-20 12:35         ` Peter Zijlstra
2011-07-20 13:23         ` Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
2011-07-20 12:54   ` Peter Zijlstra
2011-07-20 13:25     ` Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
2011-07-20  0:18 ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney
2011-07-20  1:10 ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Ben Greear
2011-07-20  1:30 ` Ed Tomlinson
2011-07-20  2:07   ` Ed Tomlinson [this message]
2011-07-20  4:44     ` Paul E. McKenney
2011-07-20  5:03       ` Linus Torvalds
2011-07-20 13:34         ` Paul E. McKenney
2011-07-20 17:02           ` Ben Greear
2011-07-20 17:15             ` Paul E. McKenney
2011-07-20 18:44               ` Ingo Molnar
2011-07-20 18:52                 ` Peter Zijlstra
2011-07-20 19:01                   ` Paul E. McKenney
2011-07-20 19:25                     ` Peter Zijlstra
2011-07-20 20:06                       ` Paul E. McKenney
2011-07-20 19:02                   ` Linus Torvalds
2011-07-20 19:29                     ` Paul E. McKenney
2011-07-20 19:39                       ` Ingo Molnar
2011-07-20 19:57                         ` Ingo Molnar
2011-07-20 20:33                           ` Paul E. McKenney
2011-07-20 20:54                             ` Ben Greear
2011-07-20 21:12                               ` Paul E. McKenney
2011-07-21  3:25                                 ` Ben Greear
2011-07-21 16:04                                   ` Paul E. McKenney
2011-07-20 21:04                           ` [GIT PULL] RCU fixes for v3.0 Ingo Molnar
2011-07-20 21:55                             ` Ed Tomlinson
2011-07-20 22:06                               ` Paul E. McKenney
2011-07-20 20:08                         ` [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck Paul E. McKenney
2011-07-20 21:05                       ` Peter Zijlstra
2011-07-20 21:39                         ` Paul E. McKenney
2011-07-20 10:49       ` Ed Tomlinson
2011-07-20 18:25 ` [PATCH rcu/urgent 0/7 v2] " Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 1/7] rcu: decrease rcu_report_exp_rnp coupling with scheduler Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 2/7] rcu: Fix RCU_BOOST race handling current->rcu_read_unlock_special Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 3/7] rcu: Streamline code produced by __rcu_read_unlock() Paul E. McKenney
2011-07-20 22:44     ` Linus Torvalds
2011-07-21  5:09       ` Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 4/7] rcu: protect __rcu_read_unlock() against scheduler-using irq handlers Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 5/7] sched: Add irq_{enter,exit}() to scheduler_ipi() Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 6/7] softirq,rcu: Inform RCU of irq_exit() activity Paul E. McKenney
2011-07-20 18:26   ` [PATCH tip/core/urgent 7/7] signal: align __lock_task_sighand() irq disabling and RCU Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201107192207.33813.edt@aei.ca \
    --to=edt@aei.ca \
    --cc=Valdis.Kletnieks@vt.edu \
    --cc=akpm@linux-foundation.org \
    --cc=darren@dvhart.com \
    --cc=dhowells@redhat.com \
    --cc=dipankar@in.ibm.com \
    --cc=eric.dumazet@gmail.com \
    --cc=greearb@candelatech.com \
    --cc=josh@joshtriplett.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=mingo@elte.hu \
    --cc=niv@us.ibm.com \
    --cc=patches@linaro.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox