All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
	Anders Roxell <anders.roxell@linaro.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Potapenko <glider@google.com>,
	Dmitry Vyukov <dvyukov@google.com>, Jann Horn <jannh@google.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	kasan-dev <kasan-dev@googlegroups.com>,
	rcu@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Tejun Heo <tj@kernel.org>, Lai Jiangshan <jiangshanlai@gmail.com>
Subject: Re: [PATCH] kfence: Avoid stalling work queue task without allocations
Date: Wed, 18 Nov 2020 23:56:21 +0100	[thread overview]
Message-ID: <20201118225621.GA1770130@elver.google.com> (raw)
In-Reply-To: <20201117182915.GM1437@paulmck-ThinkPad-P72>

[-- Attachment #1: Type: text/plain, Size: 1922 bytes --]

On Tue, Nov 17, 2020 at 10:29AM -0800, Paul E. McKenney wrote:
[...] 
> But it would be good to get the kcompactd() people to look at this (not
> immediately seeing who they are in MAINTAINERS).  Perhaps preemption is
> disabled somehow and I am failing to see it.
> 
> Failing that, maybe someone knows of a way to check for overly long
> timeout handlers.

I think I figured out one piece of the puzzle. Bisection keeps pointing
me at some -rcu merge commit, which kept throwing me off. Nor did it
help that reproduction is a bit flaky. However, I think there are 2
independent problems, but the manifestation of 1 problem triggers the
2nd problem:

1. problem: slowed forward progress (workqueue lockup / RCU stall reports)

2. problem: DEADLOCK which causes complete system lockup

	| ...
	|        CPU0
	|        ----
	|   lock(rcu_node_0);
	|   <Interrupt>
	|     lock(rcu_node_0);
	| 
	|  *** DEADLOCK ***
	| 
	| 1 lock held by event_benchmark/105:
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
	|  #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
	| ...

Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:

	rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled

At which point the system always boots to user space -- albeit with a
bunch of warnings still (attached). The supposed "good" version doesn't
end up with all those warnings deterministically, so I couldn't say if
the warnings are expected due to recent changes or not (Arm64 QEMU
emulation, 1 CPU, and lots of debugging tools on).

Does any of that make sense?

Thanks,
-- Marco

[-- Attachment #2: log --]
[-- Type: text/plain, Size: 8804 bytes --]

Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events: 
hrtimer: interrupt took 10156432 ns
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2785, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 3752 (4295396561-4295392809), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 3752 jiffies! g2785 f0x0 RCU_GP_ONOFF(3) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
 context_switch kernel/sched/core.c:3773 [inline]
 __schedule+0x2d8/0x980 kernel/sched/core.c:4522
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4681
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
 __raw_spin_unlock include/linux/spinlock_api_smp.h:152 [inline]
 _raw_spin_unlock+0x94/0xa8 kernel/locking/spinlock.c:183
 rcu_gp_init kernel/rcu/tree.c:1820 [inline]
 rcu_gp_kthread+0x34c/0x1bd8 kernel/rcu/tree.c:2105
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2817, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 856 (4295412565-4295411709), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 856 jiffies! g2817 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 __switch_to+0x10c/0x200 arch/arm64/kernel/process.c:578
 context_switch kernel/sched/core.c:3773 [inline]
 __schedule+0x2d8/0x980 kernel/sched/core.c:4522
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4681
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x84/0x98 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2015 [inline]
 rcu_gp_kthread+0x1038/0x1bd8 kernel/rcu/tree.c:2119
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 16s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
  pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
    pending: neigh_periodic_work, do_cache_clean
workqueue rcu_gp: flags=0x8
  pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
    in-flight: 15:srcu_invoke_callbacks
pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=0s workers=3 idle: 111 5
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2829, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 1522 (4295422970-4295421448), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 1522 jiffies! g2829 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x00000428
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 sched_show_task kernel/sched/core.c:6445 [inline]
 sched_show_task+0x1fc/0x228 kernel/sched/core.c:6420
 rcu_check_gp_kthread_starvation+0xc8/0xe4 kernel/rcu/tree_stall.h:452
 print_other_cpu_stall kernel/rcu/tree_stall.h:520 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:640 [inline]
 rcu_pending kernel/rcu/tree.c:3752 [inline]
 rcu_sched_clock_irq+0xb34/0xc48 kernel/rcu/tree.c:2581
 update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
 tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
 tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
 _raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
 finish_lock_switch kernel/sched/core.c:3513 [inline]
 finish_task_switch+0xa8/0x290 kernel/sched/core.c:3613
 context_switch kernel/sched/core.c:3776 [inline]
 __schedule+0x2dc/0x980 kernel/sched/core.c:4522
 preempt_schedule_common+0x4c/0x1a8 kernel/sched/core.c:4681
 preempt_schedule+0x38/0x40 kernel/sched/core.c:4706
 __raw_spin_unlock_irq include/linux/spinlock_api_smp.h:169 [inline]
 _raw_spin_unlock_irq+0x84/0x98 kernel/locking/spinlock.c:199
 rcu_gp_cleanup kernel/rcu/tree.c:2046 [inline]
 rcu_gp_kthread+0x1144/0x1bd8 kernel/rcu/tree.c:2119
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
	(detected by 0, t=3752 jiffies, g=2845, q=6)
rcu: All QSes seen, last rcu_preempt kthread activity 2796 (4295435367-4295432571), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 2796 jiffies! g2845 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:R  running task     stack:    0 pid:   10 ppid:     2 flags:0x0000042a
Call trace:
 dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
 show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
 sched_show_task kernel/sched/core.c:6445 [inline]
 sched_show_task+0x1fc/0x228 kernel/sched/core.c:6420
 rcu_check_gp_kthread_starvation+0xc8/0xe4 kernel/rcu/tree_stall.h:452
 print_other_cpu_stall kernel/rcu/tree_stall.h:520 [inline]
 check_cpu_stall kernel/rcu/tree_stall.h:640 [inline]
 rcu_pending kernel/rcu/tree.c:3752 [inline]
 rcu_sched_clock_irq+0xb34/0xc48 kernel/rcu/tree.c:2581
 update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
 tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
 tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
 __run_hrtimer kernel/time/hrtimer.c:1519 [inline]
 __hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
 hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
 timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
 arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
 handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
 generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
 generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
 __handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
 handle_domain_irq include/linux/irqdesc.h:170 [inline]
 gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
 el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
 arch_local_irq_restore arch/arm64/include/asm/irqflags.h:124 [inline]
 rcu_irq_enter_irqson+0x40/0x78 kernel/rcu/tree.c:1078
 trace_preempt_disable_rcuidle include/trace/events/preemptirq.h:51 [inline]
 trace_preempt_off+0x108/0x1f8 kernel/trace/trace_preemptirq.c:130
 preempt_latency_start kernel/sched/core.c:4164 [inline]
 preempt_latency_start kernel/sched/core.c:4157 [inline]
 preempt_schedule_notrace+0x170/0x1c0 kernel/sched/core.c:4747
 __ftrace_ops_list_func kernel/trace/ftrace.c:6956 [inline]
 ftrace_ops_list_func+0x108/0x230 kernel/trace/ftrace.c:6977
 ftrace_graph_call+0x0/0x4
 preempt_count_add+0x8/0x1a0 arch/arm64/include/asm/atomic.h:65
 schedule+0x44/0x100 kernel/sched/core.c:4599
 schedule_timeout+0x240/0x538 kernel/time/timer.c:1871
 rcu_gp_fqs_loop kernel/rcu/tree.c:1942 [inline]
 rcu_gp_kthread+0x618/0x1bd8 kernel/rcu/tree.c:2115
 kthread+0x13c/0x188 kernel/kthread.c:292
 ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
OK

  reply	other threads:[~2020-11-18 22:56 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-10 13:53 [PATCH] kfence: Avoid stalling work queue task without allocations Marco Elver
2020-11-10 14:25 ` Dmitry Vyukov
2020-11-10 14:53   ` Marco Elver
2020-11-10 23:23 ` Anders Roxell
2020-11-11  8:29   ` Marco Elver
2020-11-11 13:38     ` Marco Elver
2020-11-11 18:05       ` Steven Rostedt
2020-11-11 18:23         ` Paul E. McKenney
2020-11-11 18:34           ` Marco Elver
2020-11-11 19:21             ` Paul E. McKenney
2020-11-11 20:21               ` Marco Elver
2020-11-12  0:11                 ` Paul E. McKenney
2020-11-12 12:49                   ` Marco Elver
2020-11-12 16:14                     ` Marco Elver
2020-11-12 17:54                       ` Paul E. McKenney
2020-11-12 18:12                         ` Marco Elver
2020-11-12 20:00                           ` Paul E. McKenney
2020-11-13 11:06                             ` Marco Elver
2020-11-13 17:20                               ` Paul E. McKenney
2020-11-13 17:57                         ` Paul E. McKenney
2020-11-17 10:52                           ` Marco Elver
2020-11-17 18:29                             ` Paul E. McKenney
2020-11-18 22:56                               ` Marco Elver [this message]
2020-11-18 23:38                                 ` Paul E. McKenney
2020-11-19 12:53                                   ` Marco Elver
2020-11-19 15:14                                     ` Paul E. McKenney
2020-11-19 17:02                                       ` Marco Elver
2020-11-19 18:48                                         ` Paul E. McKenney
2020-11-19 19:38                                           ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
2020-11-19 19:38                                             ` Marco Elver
2020-11-19 21:35                                             ` Paul E. McKenney
2020-11-19 21:35                                               ` Paul E. McKenney
2020-11-19 22:53                                               ` Will Deacon
2020-11-19 22:53                                                 ` Will Deacon
2020-11-20 10:30                                                 ` Mark Rutland
2020-11-20 10:30                                                   ` Mark Rutland
2020-11-20 14:03                                                   ` Marco Elver
2020-11-20 14:03                                                     ` Marco Elver
2020-11-23 19:32                                                     ` Mark Rutland
2020-11-23 19:32                                                       ` Mark Rutland
2020-11-24 14:03                                                       ` Marco Elver
2020-11-24 14:03                                                         ` Marco Elver
2020-11-24 15:01                                                         ` Paul E. McKenney
2020-11-24 15:01                                                           ` Paul E. McKenney
2020-11-24 19:43                                                           ` Mark Rutland
2020-11-24 19:43                                                             ` Mark Rutland
2020-11-24 20:32                                                             ` Steven Rostedt
2020-11-24 20:32                                                               ` Steven Rostedt
2020-11-24 19:30                                                         ` Mark Rutland
2020-11-24 19:30                                                           ` Mark Rutland
2020-11-25  9:45                                                           ` Marco Elver
2020-11-25  9:45                                                             ` Marco Elver
2020-11-25 10:28                                                             ` Mark Rutland
2020-11-25 10:28                                                               ` Mark Rutland
2020-11-20 14:19                                               ` Marco Elver
2020-11-20 14:19                                                 ` Marco Elver
2020-11-20 14:39                                                 ` Paul E. McKenney
2020-11-20 14:39                                                   ` Paul E. McKenney
2020-11-20 15:22                                                   ` Mark Rutland
2020-11-20 15:22                                                     ` Mark Rutland
2020-11-20 17:38                                                     ` Paul E. McKenney
2020-11-20 17:38                                                       ` Paul E. McKenney
2020-11-20 18:02                                                       ` Mark Rutland
2020-11-20 18:02                                                         ` Mark Rutland
2020-11-20 18:57                                                         ` Paul E. McKenney
2020-11-20 18:57                                                           ` Paul E. McKenney
2020-11-20 15:26                                                 ` Steven Rostedt
2020-11-20 15:26                                                   ` Steven Rostedt
2020-11-20 18:17                                                   ` Marco Elver
2020-11-20 18:17                                                     ` Marco Elver
2020-11-20 18:57                                                     ` Steven Rostedt
2020-11-20 18:57                                                       ` Steven Rostedt
2020-11-20 19:16                                                     ` Steven Rostedt
2020-11-20 19:16                                                       ` Steven Rostedt
2020-11-20 19:22                                                       ` Marco Elver
2020-11-20 19:22                                                         ` Marco Elver
2020-11-20 19:27                                     ` [PATCH] kfence: Avoid stalling work queue task without allocations Steven Rostedt
2020-11-23 15:27                                       ` Marco Elver
2020-11-23 16:28                                         ` Steven Rostedt
2020-11-23 16:36                                           ` Steven Rostedt
2020-11-23 18:53                                             ` Marco Elver
2020-11-23 18:42                                           ` Steven Rostedt
2020-11-24  2:59                                             ` Boqun Feng
2020-11-24  3:44                                               ` Paul E. McKenney
2020-11-11 18:21       ` Paul E. McKenney
2020-11-11 15:01     ` Anders Roxell
2020-11-11 15:22       ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201118225621.GA1770130@elver.google.com \
    --to=elver@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=anders.roxell@linaro.org \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=jannh@google.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mark.rutland@arm.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.