All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: "peterz@infradead.org" <peterz@infradead.org>,
	linux-kernel@vger.kernel.org,
	"umgwanakikbuti@gmail.com" <umgwanakikbuti@gmail.com>,
	torvalds@linux-foundation.org,
	"mingo@kernel.org" <mingo@kernel.org>,
	Will Deacon <Will.Deacon@arm.com>
Subject: Re: PRREMPT RCU stalls with 9edfbfed3f544a78 ("sched/core: Rework rq->clock update skips")
Date: Mon, 29 Jun 2015 11:11:34 -0700	[thread overview]
Message-ID: <20150629181134.GN3717@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150629173751.GB19863@leverpostej>

On Mon, Jun 29, 2015 at 06:37:52PM +0100, Mark Rutland wrote:
> Hi,
> 
> Recently (since ~v4.0) I've been seeing new RCU stall warnings when
> running hackbench (and cpu-affine cyclictest instances) on
> CONFIG_PREEMPT=y kernels, both arm64 and x86_64:
> 
> arm64
> -----
> INFO: rcu_preempt detected stalls on CPUs/tasks:
>         Tasks blocked on level-0 rcu_node (CPUs 0-5):

So you appear to be blocked on a task that was preempted within its RCU
read-side critical section.  There is a bug that causes the task not
to be printed, which is fixed by 82efed06d5e3 ("rcu: Fix missing task
information during rcu-preempt stall").

Not that this will fix the stall itself, but it will at least allow you
to see the task that is causing the stall.

>         (detected by 1, t=2102 jiffies, g=957, c=956, q=23)
> All QSes seen, last rcu_preempt kthread activity 2 (4294981808-4294981806), jiffies_till_next_fqs=1, root ->qsmask 0x0
> hackbench       R  running task        0 10388   9559 0x00000000
> Call trace:
> [<ffffffc000089914>] dump_backtrace+0x0/0x124
> [<ffffffc000089a48>] show_stack+0x10/0x1c
> [<ffffffc0000d65a0>] sched_show_task+0xa4/0x104
> [<ffffffc0000fc2c8>] rcu_check_callbacks+0x998/0x9a0
> [<ffffffc0000ff4dc>] update_process_times+0x38/0x6c
> [<ffffffc00010e288>] tick_sched_handle.isra.16+0x1c/0x68
> [<ffffffc00010e314>] tick_sched_timer+0x40/0x88
> [<ffffffc0000fff64>] __run_hrtimer.isra.34+0x48/0x108
> [<ffffffc000100254>] hrtimer_interrupt+0xc4/0x248
> [<ffffffc0004bb9cc>] arch_timer_handler_phys+0x28/0x38
> [<ffffffc0000f381c>] handle_percpu_devid_irq+0x74/0x9c
> [<ffffffc0000ef73c>] generic_handle_irq+0x30/0x4c
> [<ffffffc0000efa54>] __handle_domain_irq+0x5c/0xac
> [<ffffffc00008241c>] gic_handle_irq+0x30/0x80
> Exception stack(0xffffffc073bc3a80 to 0xffffffc073bc3ba0)
> 3a80: 00000064 00000000 00000064 00000000 73bc3bc0 ffffffc0 00331ffc ffffffc0
> 3aa0: 00000064 00000000 00000064 00000000 ffffffff ffffffff 160c3d20 00000000
> 3ac0: e9d018ac 0000007f 00000064 00000000 5a046200 ffffffc9 000000c0 00000000
> 3ae0: 00000000 00000000 00000000 00000000 ffffffff 00000000 00000018 00000000
> 3b00: 00000002 00000000 00000000 00000000 948d9a5c 0000007f 94a1e590 0000007f
> 3b20: 0019b7e0 ffffffc0 94a316d0 0000007f e9d01530 0000007f 00000064 00000000
> 3b40: 00000064 00000000 00000064 00000000 00000000 00000000 00000064 00000000
> 3b60: 73bc3d78 ffffffc0 757be080 ffffffc9 6241b980 ffffffc9 00000001 00000000
> 3b80: 55da5500 ffffffc9 73bc3bc0 ffffffc0 00331f9c ffffffc0 73bc3bc0 ffffffc0
> [<ffffffc0000855a4>] el1_irq+0x64/0xd8
> [<ffffffc0004f1d6c>] skb_copy_datagram_from_iter+0x5c/0x1f0
> [<ffffffc000579e68>] unix_stream_sendmsg+0xf0/0x334
> [<ffffffc0004e1d68>] sock_sendmsg+0x14/0x58
> [<ffffffc0004e1e20>] sock_write_iter+0x74/0xd4
> [<ffffffc00019a494>] __vfs_write+0xac/0x10c
> [<ffffffc00019ad38>] vfs_write+0x8c/0x194
> [<ffffffc00019b820>] SyS_write+0x40/0xa0
> 
> x86_64
> ------
> [ 2616.307011] INFO: rcu_preempt detected stalls on CPUs/tasks:
> [ 2616.311868]  Tasks blocked on level-0 rcu_node (CPUs 0-3):

Same here, same diagnostic fix.

> [ 2616.311868]  (detected by 1, t=21014 jiffies, g=10009, c=10008, q=274)
> [ 2616.322320] All QSes seen, last rcu_preempt kthread activity 3 (4297283620-4297283617), jiffies_till_next_fqs=3, root ->qsmask 0x0
> [ 2616.322320] hackbench       R  running task    13984 30174  29054 0x00000008
> [ 2616.322320]  00000000000036a0 ffff8801bec83c98 ffffffff8107b751 0000000000000003
> [ 2616.322320]  ffffffff81e3b8c0 ffffffff81f01280 ffffffff81e3b8c0 ffff8801bec83d28
> [ 2616.352044]  ffffffff810a90f7 0000000000000000 ffffffff8107faac 000000000000f6a0
> [ 2616.352044] Call Trace:
> [ 2616.352044]  <IRQ>  [<ffffffff8107b751>] sched_show_task+0xe1/0x150
> [ 2616.352044]  [<ffffffff810a90f7>] rcu_check_callbacks+0x8f7/0x900
> [ 2616.352044]  [<ffffffff8107faac>] ? account_system_time+0x9c/0x190
> [ 2616.352044]  [<ffffffff810ae244>] update_process_times+0x34/0x60
> [ 2616.352044]  [<ffffffff810bd471>] tick_sched_handle.isra.15+0x31/0x40
> [ 2616.352044]  [<ffffffff810bd4bf>] tick_sched_timer+0x3f/0x70
> [ 2616.352044]  [<ffffffff810aefaf>] __run_hrtimer+0x7f/0x230
> [ 2616.352044]  [<ffffffff810bd480>] ? tick_sched_handle.isra.15+0x40/0x40
> [ 2616.416043]  [<ffffffff810af3b3>] hrtimer_interrupt+0xf3/0x220
> [ 2616.416043]  [<ffffffff810439b1>] hpet_interrupt_handler+0x11/0x30
> [ 2616.416043]  [<ffffffff8109e437>] handle_irq_event_percpu+0x47/0x1d0
> [ 2616.428774]  [<ffffffff8109e5fc>] handle_irq_event+0x3c/0x60
> [ 2616.428774]  [<ffffffff810a13cf>] handle_edge_irq+0x8f/0x130
> [ 2616.428774]  [<ffffffff81004f17>] handle_irq+0xf7/0x180
> [ 2616.450371]  [<ffffffff8107a231>] ? get_parent_ip+0x11/0x50
> [ 2616.450371]  [<ffffffff810046d8>] do_IRQ+0x58/0x100
> [ 2616.450371]  [<ffffffff8190caeb>] common_interrupt+0x6b/0x6b
> [ 2616.467277]  <EOI>  [<ffffffff8170aa47>] ? sock_def_readable+0x67/0x70
> [ 2616.467277]  [<ffffffff812bfb05>] ? avc_has_perm+0x105/0x1b0
> [ 2616.467277]  [<ffffffff812bfad7>] ? avc_has_perm+0xd7/0x1b0
> [ 2616.482406]  [<ffffffff812c2092>] inode_has_perm.isra.29+0x22/0x30
> [ 2616.482406]  [<ffffffff812c21a7>] file_has_perm+0x87/0xa0
> [ 2616.493232]  [<ffffffff811af786>] ? fsnotify+0x2d6/0x400
> [ 2616.493232]  [<ffffffff812c48c5>] selinux_file_permission+0xa5/0x120
> [ 2616.493232]  [<ffffffff812bde1e>] security_file_permission+0x1e/0xa0
> [ 2616.493232]  [<ffffffff81170fd1>] rw_verify_area+0x51/0xd0
> [ 2616.493232]  [<ffffffff811711ff>] vfs_write+0x6f/0x1b0
> [ 2616.493232]  [<ffffffff81171fd1>] SyS_write+0x41/0xb0
> [ 2616.493232]  [<ffffffff81186220>] ? SyS_poll+0x60/0xf0
> [ 2616.493232]  [<ffffffff8190c017>] system_call_fastpath+0x12/0x6a
> 
> I've bisected to 9edfbfed3f544a78 ("sched/core: Rework rq->clock update
> skips"), and with that reverted I no longer see stalls. For v4.1 I also
> had to revert 44fb085bfa17628c "sched/deadline: Add rq->clock update
> skip for dl task yield" as it calls function introduced in
> 9edfbfed3f544a78.
> 
> Any ideas as to what could be going on?

I don't see why those commits would prevent a preempted task from running,
but then I don't claim to fully understand these changes.

> To reproduce the issue I'm running the following in parallel on
> dual-core machines:
> 
> $ while true; do ./hackbench 100 process 1000; sleep 30; done
> # ./cyclictest -a0 -n -M -m
> # ./cyclictest -a1 -n -M -m
> 
> Usually it takes a couple of minutes to trigger, but I've seen it take
> up to half an hour. With the reverts I haven't seen stalls after testing
> for an hour or so.

I would want to see at least a 5-hour run before being all that sure,
but an hour is at least promising.

							Thanx, Paul


  reply	other threads:[~2015-06-29 18:11 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-29 17:37 PRREMPT RCU stalls with 9edfbfed3f544a78 ("sched/core: Rework rq->clock update skips") Mark Rutland
2015-06-29 18:11 ` Paul E. McKenney [this message]
2015-06-30 10:59 ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150629181134.GN3717@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=Will.Deacon@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.