All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>
Cc: tj@kernel.org
Subject: Re: RCU stalls -> lockup.
Date: Thu, 2 Oct 2014 12:36:55 -0700	[thread overview]
Message-ID: <20141002193655.GS5015@linux.vnet.ibm.com> (raw)
In-Reply-To: <20141002175515.GA28665@redhat.com>

On Thu, Oct 02, 2014 at 01:55:15PM -0400, Dave Jones wrote:
> I just hit this on my box running 3.17rc7
> It was followed by a userspace lockup. (Could still ping, and sysrq
> from the console, but even getty wasn't responding on the console).
> 
> I was trying to reproduce another bug faster, and had ramped up the
> number of processes trinity to uses to 512. This didn't take long
> to fall out..

This might be related to an exchange I had with Tejun (CCed), where
the work queues were running all out, preventing any quiescent states
from happening.  One fix under consideration is to add a quiescent state,
similar to the one in softirq handling.

							Thanx, Paul

> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	(detected by 3, t=6502 jiffies, g=1014253, c=1014252, q=0)
> INFO: Stall ended before state dump start
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	(detected by 0, t=26007 jiffies, g=1014253, c=1014252, q=0)
> INFO: Stall ended before state dump start
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	(detected by 2, t=45512 jiffies, g=1014253, c=1014252, q=0)
> INFO: Stall ended before state dump start
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3):
> 	(detected by 1, t=65017 jiffies, g=1014253, c=1014252, q=0)
> INFO: Stall ended before state dump start
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P15547 P15232 P15616 P15634
> 	Tasks blocked on level-0 rcu_node (CPUs 0-3): P15547 P15232 P15616 P15634
> 	(detected by 1, t=6502 jiffies, g=1014254, c=1014253, q=0)
> trinity-c318    R  running task    13480 15547  14371 0x00000000
>  ffff880031df7df0 0000000000000002 ffffffff870cb70e ffff88008ec30000
>  00000000001d4080 0000000000000000 ffff880031df7fd8 00000000001d4080
>  ffff8802166c2de0 ffff88008ec30000 ffff880031df7fd8 ffffffff872361f4
> Call Trace:
>  [<ffffffff870cb70e>] ? put_lock_stats.isra.28+0xe/0x30
>  [<ffffffff872361f4>] ? bdi_queue_work+0xe4/0x1a0
>  [<ffffffff8781df46>] preempt_schedule+0x36/0x60
>  [<ffffffff8700e3d6>] ___preempt_schedule+0x56/0xb0
>  [<ffffffff872361f4>] ? bdi_queue_work+0xe4/0x1a0
>  [<ffffffff8707db77>] ? __local_bh_enable_ip+0xb7/0xe0
>  [<ffffffff87823a45>] _raw_spin_unlock_bh+0x35/0x40
>  [<ffffffff872361f4>] bdi_queue_work+0xe4/0x1a0
>  [<ffffffff87236318>] __bdi_start_writeback+0x68/0x190
>  [<ffffffff87238eb0>] wakeup_flusher_threads+0x100/0x1e0
>  [<ffffffff87238de0>] ? wakeup_flusher_threads+0x30/0x1e0
>  [<ffffffff8723cd86>] sys_sync+0x36/0xb0
>  [<ffffffff878249e4>] tracesys+0xdd/0xe2
> trinity-c9      R  running task    13496 15232  14371 0x00000000
>  ffff88011a01fdf0 0000000000000002 ffffffff870cb70e ffff8800a19616f0
>  00000000001d4080 0000000000000000 ffff88011a01ffd8 00000000001d4080
>  ffff8802166c2de0 ffff8800a19616f0 ffff88011a01ffd8 ffffffff872361f4
> Call Trace:
>  [<ffffffff870cb70e>] ? put_lock_stats.isra.28+0xe/0x30
>  [<ffffffff872361f4>] ? bdi_queue_work+0xe4/0x1a0
>  [<ffffffff8781df46>] preempt_schedule+0x36/0x60
>  [<ffffffff8700e3d6>] ___preempt_schedule+0x56/0xb0
>  [<ffffffff872361f4>] ? bdi_queue_work+0xe4/0x1a0
>  [<ffffffff8707db77>] ? __local_bh_enable_ip+0xb7/0xe0
>  [<ffffffff87823a45>] _raw_spin_unlock_bh+0x35/0x40
>  [<ffffffff872361f4>] bdi_queue_work+0xe4/0x1a0
>  [<ffffffff87236318>] __bdi_start_writeback+0x68/0x190
>  [<ffffffff87238eb0>] wakeup_flusher_threads+0x100/0x1e0
>  [<ffffffff87238de0>] ? wakeup_flusher_threads+0x30/0x1e0
>  [<ffffffff8723cd86>] sys_sync+0x36/0xb0
>  [<ffffffff878249e4>] tracesys+0xdd/0xe2
> trinity-c387    R  running task    13272 15616  14371 0x00000004
>  ffff880043fd37f8 0000000000000002 ffff880001fd3868 ffff88023304ade0
>  00000000001d4080 0000000000000000 ffff880043fd3fd8 00000000001d4080
>  ffff8802166c2de0 ffff88023304ade0 ffff880043fd3fd8 0000000000000000
> Call Trace:
>  [<ffffffff8781e182>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff87825550>] retint_kernel+0x20/0x30
>  [<ffffffff87190624>] ? find_get_entry+0xb4/0x270
>  [<ffffffff8719076e>] ? find_get_entry+0x1fe/0x270
>  [<ffffffff87190575>] ? find_get_entry+0x5/0x270
>  [<ffffffff871907ff>] find_lock_entry+0x1f/0x90
>  [<ffffffff871acb65>] shmem_getpage_gfp+0xd5/0xa10
>  [<ffffffff871adb3d>] shmem_fault+0x6d/0x1c0
>  [<ffffffff871bff58>] __do_fault+0x48/0xd0
>  [<ffffffff871c25e0>] do_shared_fault.isra.75+0x40/0x1c0
>  [<ffffffff8718892b>] ? __perf_sw_event+0x4b/0x380
>  [<ffffffff871c3b21>] handle_mm_fault+0x261/0xcd0
>  [<ffffffff870cb687>] ? __lock_is_held+0x57/0x80
>  [<ffffffff87042c84>] __do_page_fault+0x1a4/0x600
>  [<ffffffff870cbbf6>] ? lock_release_holdtime.part.29+0xe6/0x160
>  [<ffffffff8718eda7>] ? context_tracking_user_exit+0x67/0x1b0
>  [<ffffffff870430fe>] do_page_fault+0x1e/0x70
>  [<ffffffff878264b2>] page_fault+0x22/0x30
>  [<ffffffff871bd7e3>] ? copy_page_to_iter+0x3b3/0x500
>  [<ffffffff871bd5fe>] ? copy_page_to_iter+0x1ce/0x500
>  [<ffffffff8723a2b0>] ? vmsplice_to_user+0x130/0x130
>  [<ffffffff8723a2d2>] pipe_to_user+0x22/0x40
>  [<ffffffff87239d4e>] __splice_from_pipe+0x11e/0x190
>  [<ffffffff8723a254>] vmsplice_to_user+0xd4/0x130
>  [<ffffffff870cb101>] ? trace_hardirqs_off_caller+0x21/0xc0
>  [<ffffffff878253e4>] ? retint_restore_args+0xe/0xe
>  [<ffffffff870ac74d>] ? get_parent_ip+0xd/0x50
>  [<ffffffff870ac8ab>] ? preempt_count_sub+0x6b/0xf0
>  [<13480 15547  14371 0x00000000
>  ffff880031df7df0 0000000000000002 ffffffff870cb70e ffff88008ec30000
>  00000000001d4080 0000000000000000 ffff880031df7fd8 00000000001d4080
>  ffff8801039344d0 ffff88008ec30000 ffff880031df7fd8 ffffffff872361f4
> Call Trace:
>  [<ffffffff870cb70e>] ? put_lock_stats.isra.28+0xe/0x30
>  [<ffffffff872361f4>] ? bdi_queue_work+0xe4/0x1a0
>  [<ffffffff8781df46>] preempt_schedule+0x36/0x60
>  [<ffffffff8700e3d6>] ___preempt_schedule+0x56/0xb0
>  [<ffffffff872361f4>] ? bdi_queue_work+0xe4/0x1a0
>  [<ffffffff8707db77>] ? __local_bh_enable_ip+0xb7/0xe0
>  [<ffffffff87823a45>] _raw_spin_unlock_bh+0x35/0x40
>  [<ffffffff872361f4>] bdi_queue_work+0xe4/0x1a0
>  [<ffffffff87236318>] __bdi_start_writeback+0x68/0x190
>  [<ffffffff87238eb0>] wakeup_flusher_threads+0x100/0x1e0
>  [<ffffffff87238de0>] ? wakeup_flusher_threads+0x30/0x1e0
>  [<ffffffff8723cd86>] sys_sync+0x36/0xb0
>  [<ffffffff878249e4>] tracesys+0xdd/0xe2
> trinity-c9      R  running task    13496 15232  14371 0x00000000
>  ffff88011a01fe58 0000000000000002 ffffffff87238ecf ffff8800a19616f0
>  00000000001d4080 0000000000000000 ffff88011a01ffd8 00000000001d4080
>  ffff880085d35bc0 ffff8800a19616f0 ffff88011a01ffd8 0000000000000000
> Call Trace:
>  [<ffffffff87238ecf>] ? wakeup_flusher_threads+0x11f/0x1e0
>  [<ffffffff87238ecf>] ? wakeup_flusher_threads+0x11f/0x1e0
>  [<ffffffff8781e182>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff87825550>] retint_kernel+0x20/0x30
>  [<ffffffff87238ecf>] ? wakeup_flusher_threads+0x11f/0x1e0
>  [<ffffffff870d1049>] ? lock_release+0x29/0x300
>  [<ffffffff87238ee7>] wakeup_flusher_threads+0x137/0x1e0
>  [<ffffffff87238de0>] ? wakeup_flusher_threads+0x30/0x1e0
>  [<ffffffff8723cd86>] sys_sync+0x36/0xb0
>  [<ffffffff878249e4>] tracesys+0xdd/0xe2
> trinity-c387    R  running task    13272 15616  14371 0x00000004
>  ffff880043fd37f8 0000000000000002 ffff880001fd3868 ffff88023304ade0
>  00000000001d4080 0000000000000000 ffff880043fd3fd8 00000000001d4080
>  ffff8801039344d0 ffff88023304ade0 ffff880043fd3fd8 0000000000000000
> Call Trace:
>  [<ffffffff8781e182>] preempt_schedule_irq+0x52/0xb0
>  [<ffffffff87825550>] retint_kernel+0x20/0x30
>  [<ffffffff87190624>] ? find_get_entry+0xb4/0x270
>  [<ffffffff8719076e>] ? find_get_entry+0x1fe/0x270
>  [<ffffffff87190575>] ? find_get_entry+0x5/0x270
>  [<ffffffff871907ff>] find_lock_entry+0x1f/0x90
>  [<ffffffff871acb65>] shmem_getpage_gfp+0xd5/0xa10
>  [<ffffffff871adb3d>] shmem_fault+0x6d/0x1c0
>  [<ffffffff871bff58>] __do_fault+0x48/0xd0
>  [<ffffffff871c25e0>] do_shared_fault.isra.75+0x40/0x1c0
>  [<ffffffff8718892b>] ? __perf_sw_event+0x4b/0x380
>  [<ffffffff871c3b21>] handle_mm_fault+0x261/0xcd0
>  [<ffffffff870cb687>] ? __lock_is_held+0x57/0x80
>  [<ffffffff87042c84>] __do_page_fault+0x1a4/0x600
>  [<ffffffff870cbbf6>] ? lock_release_holdtime.part.29+0xe6/0x160
>  [<ffffffff8718eda7>] ? context_tracking_user_exit+0x67/0x1b0
>  [<ffffffff870430fe>] do_page_fault+0x1e/0x70
>  [<ffffffff878264b2>] page_fault+0x22/0x30
>  [<ffffffff871bd7e3>] ? copy_page_to_iter+0x3b3/0x500
>  [<ffffffff871bd5fe>] ? copy_page_to_iter+0x1ce/0x500
>  [<ffffffff8723a2b0>] ? vmsplice_to_user+0x130/0x130
>  [<ffffffff8723a2d2>] pipe_to_user+0x22/0x40
>  [<ffffffff87239d4e>] __splice_from_pipe+0x11e/0x190
>  [<ffffffff8723a254>] vmsplice_to_user+0xd4/0x130
>  [<ffffffff870cb101>] ? trace_hardirqs_off_caller+0x21/0xc0
>  [<ffffffff878253e4>] ? retint_restore_args+0xe/0xe
>  [<ffffffff870ac74d>] ? get_parent_ip+0xd/0x50
>  [<ffffffff870ac8ab>] ? preempt_count_sub+0x6b/0xf0
>  [<ffffffff8723bd31>] SyS_vmsplice+0xc1/0xe0
> 


  reply	other threads:[~2014-10-02 19:37 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-02 17:55 RCU stalls -> lockup Dave Jones
2014-10-02 19:36 ` Paul E. McKenney [this message]
2014-10-05  2:15   ` Tejun Heo
2014-10-07  1:27     ` Dave Jones
2014-10-09 20:09     ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141002193655.GS5015@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.