* linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
[not found] ` <20201119184854.GY1437@paulmck-ThinkPad-P72>
@ 2020-11-19 19:38 ` Marco Elver
2020-11-19 21:35 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Marco Elver @ 2020-11-19 19:38 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Mark Rutland, Anders Roxell, Jann Horn, Peter Zijlstra,
Lai Jiangshan, Linux Kernel Mailing List, Steven Rostedt, rcu,
Linux-MM, Alexander Potapenko, kasan-dev, Tejun Heo,
Andrew Morton, Dmitry Vyukov, linux-arm-kernel
[-- Attachment #1: Type: text/plain, Size: 14251 bytes --]
On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> > On Thu, Nov 19, 2020 at 07:14AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 01:53:57PM +0100, Marco Elver wrote:
> > > > On Wed, Nov 18, 2020 at 03:38PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Nov 18, 2020 at 11:56:21PM +0100, Marco Elver wrote:
> > > > > > [...]
> > > > > > I think I figured out one piece of the puzzle. Bisection keeps pointing
> > > > > > me at some -rcu merge commit, which kept throwing me off. Nor did it
> > > > > > help that reproduction is a bit flaky. However, I think there are 2
> > > > > > independent problems, but the manifestation of 1 problem triggers the
> > > > > > 2nd problem:
> > > > > >
> > > > > > 1. problem: slowed forward progress (workqueue lockup / RCU stall reports)
> > > > > >
> > > > > > 2. problem: DEADLOCK which causes complete system lockup
> > > > > >
> > > > > > | ...
> > > > > > | CPU0
> > > > > > | ----
> > > > > > | lock(rcu_node_0);
> > > > > > | <Interrupt>
> > > > > > | lock(rcu_node_0);
> > > > > > |
> > > > > > | *** DEADLOCK ***
> > > > > > |
> > > > > > | 1 lock held by event_benchmark/105:
> > > > > > | #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:493 [inline]
> > > > > > | #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:652 [inline]
> > > > > > | #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3752 [inline]
> > > > > > | #0: ffffbb6e0b804458 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x428/0xd40 kernel/rcu/tree.c:2581
> > > > > > | ...
> > > > > >
> > > > > > Problem 2 can with reasonable confidence (5 trials) be fixed by reverting:
> > > > > >
> > > > > > rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled
> > > > > >
> > > > > > At which point the system always boots to user space -- albeit with a
> > > > > > bunch of warnings still (attached). The supposed "good" version doesn't
> > > > > > end up with all those warnings deterministically, so I couldn't say if
> > > > > > the warnings are expected due to recent changes or not (Arm64 QEMU
> > > > > > emulation, 1 CPU, and lots of debugging tools on).
> > > > > >
> > > > > > Does any of that make sense?
> > > > >
> > > > > Marco, it makes all too much sense! :-/
> > > > >
> > > > > Does the patch below help?
> > > > >
> > > > > Thanx, Paul
> > > > >
> > > > > ------------------------------------------------------------------------
> > > > >
> > > > > commit 444ef3bbd0f243b912fdfd51f326704f8ee872bf
> > > > > Author: Peter Zijlstra <peterz@infradead.org>
> > > > > Date: Sat Aug 29 10:22:24 2020 -0700
> > > > >
> > > > > sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
> > > >
> > > > My assumption is that this is a replacement for "rcu: Don't invoke
> > > > try_invoke_on_locked_down_task() with irqs disabled", right?
> > >
> > > Hmmm... It was actually intended to be in addition.
> > >
> > > > That seems to have the same result (same test setup) as only reverting
> > > > "rcu: Don't invoke..." does: still results in a bunch of workqueue
> > > > lockup warnings and RCU stall warnings, but boots to user space. I
> > > > attached a log. If the warnings are expected (are they?), then it looks
> > > > fine to me.
> > >
> > > No, they are not at all expected, but might be a different symptom
> > > of the original problem. Please see below.
> > >
> > > > (And just in case: with "rcu: Don't invoke..." and "sched/core:
> > > > Allow..." both applied I still get DEADLOCKs -- but that's probably
> > > > expected.)
> > >
> > > As noted earlier, it is a surprise. Could you please send me the
> > > console output?
> >
> > I've attached the output of a run with both commits applied.
>
> Got it, thank you!
>
> > > > Testing all events: OK
> > > > hrtimer: interrupt took 17120368 ns
> > > > Running tests again, along with the function tracer
> > > > Running tests on all trace events:
> > > > Testing all events:
> > > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!
> > > > Showing busy workqueues and worker pools:
> > > > workqueue events: flags=0x0
> > > > pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > > pending: vmstat_shepherd
> > > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 17s!
> > > > Showing busy workqueues and worker pools:
> > > > workqueue events: flags=0x0
> > > > pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
> > > > pending: vmstat_shepherd
> > > > workqueue events_power_efficient: flags=0x82
> > > > pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
> > > > pending: neigh_periodic_work
> > > > ------------[ cut here ]------------
> > > > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > > WARNING: CPU: 0 PID: 1 at kernel/rcu/tree_stall.h:758 rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > >
> > > I have different line numbering,
> >
> > This is still using next-20201110. I'll rerun with latest -next as well.
>
> No problem, as it looks like next-20201105 is a reasonable approximation.
>
> > > but the only warning that I see in this
> > > function is the one complaining that RCU has been ignoring a request to
> > > start a grace period for too long. This usually happens because the RCU
> > > grace-period kthread (named "rcu_preempt" in your case, but can also be
> > > named "rcu_sched") is being prevented from running, but can be caused
> > > by other things as well.
> > >
> > > > Modules linked in:
> > > > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.10.0-rc3-next-20201110-00003-g920304642405-dirty #30
> > > > Hardware name: linux,dummy-virt (DT)
> > > > pstate: 20000085 (nzCv daIf -PAN -UAO -TCO BTYPE=--)
> > > > pc : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > > pc : rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > > > lr : __xchg_mb arch/arm64/include/asm/cmpxchg.h:88 [inline]
> > > > lr : atomic_xchg include/asm-generic/atomic-instrumented.h:615 [inline]
> > > > lr : rcu_check_gp_start_stall kernel/rcu/tree_stall.h:751 [inline]
> > > > lr : rcu_check_gp_start_stall.isra.0+0x148/0x210 kernel/rcu/tree_stall.h:711
> > >
> > > Two program counters and four link registers? Awesome! ;-)
> >
> > Ah I'm using syzkaller's symbolizer, which duplicates lines if there was
> > an inline function (remove all the "[inline]" and it should make sense,
> > but the "[inline]" tell you the actual line). Obviously for things like
> > this it's a bit unintuitive. :-)
>
> Very useful, though, and a big THANK YOU to those who made it happen!
>
> > > > sp : ffff800010003d20
> > > > x29: ffff800010003d20 x28: ffff274ac3a10000
> > > > x27: 0000000000000000 x26: ffff274b3dbe72d8
> > > > x25: ffffbcb867722000 x24: 0000000000000000
> > > > x23: 0000000000000000 x22: ffffbcb8681d1260
> > > > x21: ffffbcb86735b000 x20: ffffbcb867404440
> > > > x19: ffffbcb867404440 x18: 0000000000000123
> > > > x17: ffffbcb865d400f0 x16: 0000000000000002
> > > > x15: 0000000000000002 x14: 0000000000000000
> > > > x13: 003d090000000000 x12: 00001e8480000000
> > > > x11: ffffbcb867958980 x10: ffff800010003cf0
> > > > x9 : ffffbcb864f4b7c8 x8 : 0000000000000080
> > > > x7 : 0000000000000026 x6 : ffffbcb86774e4c0
> > > > x5 : 0000000000000000 x4 : 00000000d4001f4b
> > > > x3 : 0000000000000000 x2 : 0000000000000000
> > > > x1 : 0000000000000001 x0 : 0000000000000000
> > > > Call trace:
> > > > rcu_check_gp_start_stall kernel/rcu/tree_stall.h:750 [inline]
> > > > rcu_check_gp_start_stall.isra.0+0x14c/0x210 kernel/rcu/tree_stall.h:711
> > > > rcu_core+0x168/0x9e0 kernel/rcu/tree.c:2719
> > > > rcu_core_si+0x18/0x28 kernel/rcu/tree.c:2737
> > >
> > > The RCU_SOFTIRQ handler is causing this checking to occur, for whatever
> > > that is worth.
> > >
> > > > __do_softirq+0x188/0x6b4 kernel/softirq.c:298
> > > > do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> > > > invoke_softirq kernel/softirq.c:393 [inline]
> > > > __irq_exit_rcu kernel/softirq.c:423 [inline]
> > > > irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> > > > __handle_domain_irq+0xb4/0x130 kernel/irq/irqdesc.c:690
> > > > handle_domain_irq include/linux/irqdesc.h:170 [inline]
> > > > gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
> > > > el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
> > > > arch_local_irq_restore+0x8/0x10 arch/arm64/include/asm/irqflags.h:124
> > > > release_probes kernel/tracepoint.c:113 [inline]
> > > > tracepoint_remove_func kernel/tracepoint.c:315 [inline]
> > > > tracepoint_probe_unregister+0x220/0x378 kernel/tracepoint.c:382
> > > > trace_event_reg+0x58/0x150 kernel/trace/trace_events.c:298
> > > > __ftrace_event_enable_disable+0x424/0x608 kernel/trace/trace_events.c:412
> > > > ftrace_event_enable_disable kernel/trace/trace_events.c:495 [inline]
> > > > __ftrace_set_clr_event_nolock+0x120/0x180 kernel/trace/trace_events.c:811
> > > > __ftrace_set_clr_event+0x60/0x90 kernel/trace/trace_events.c:833
> > > > event_trace_self_tests+0xd4/0x114 kernel/trace/trace_events.c:3661
> > > > event_trace_self_test_with_function kernel/trace/trace_events.c:3734 [inline]
> > > > event_trace_self_tests_init+0x88/0xa8 kernel/trace/trace_events.c:3747
> > > > do_one_initcall+0xa4/0x500 init/main.c:1212
> > > > do_initcall_level init/main.c:1285 [inline]
> > > > do_initcalls init/main.c:1301 [inline]
> > > > do_basic_setup init/main.c:1321 [inline]
> > > > kernel_init_freeable+0x344/0x3c4 init/main.c:1521
> > > > kernel_init+0x20/0x16c init/main.c:1410
> > > > ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:961
> > > > irq event stamp: 3274113
> > > > hardirqs last enabled at (3274112): [<ffffbcb864f8aee4>] rcu_core+0x974/0x9e0 kernel/rcu/tree.c:2716
> > > > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:108 [inline]
> > > > hardirqs last disabled at (3274113): [<ffffbcb866233bf0>] _raw_spin_lock_irqsave+0xb8/0x14c kernel/locking/spinlock.c:159
> > > > softirqs last enabled at (3272576): [<ffffbcb864e10b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] invoke_softirq kernel/softirq.c:393 [inline]
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] __irq_exit_rcu kernel/softirq.c:423 [inline]
> > > > softirqs last disabled at (3274101): [<ffffbcb864ec6c54>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
> > > > ---[ end trace 902768efebf5a607 ]---
> > > > rcu: rcu_preempt: wait state: RCU_GP_WAIT_GPS(1) ->state: 0x0 delta ->gp_activity 4452 ->gp_req_activity 3848 ->gp_wake_time 3848 ->gp_wake_seq 2696 ->gp_seq 2696 ->gp_seq_needed 2700 ->gp_flags 0x1
> > >
> > > The last thing that RCU's grace-period kthread did was to go to sleep
> > > waiting for a grace-period request (RCU_GP_WAIT_GPS).
> > >
> > > > rcu: rcu_node 0:0 ->gp_seq 2696 ->gp_seq_needed 2700
> > > > rcu: RCU callbacks invoked since boot: 2583
> > > > rcu_tasks: RTGS_WAIT_CBS(11) since 567120 g:1 i:0/0 k.
> > > > rcu_tasks_rude: RTGS_WAIT_CBS(11) since 567155 g:1 i:0/1 k.
> > > > rcu_tasks_trace: RTGS_INIT(0) since 4295464549 g:0 i:0/0 k. N0 h:0/0/0
> > > > rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
> > > > (detected by 0, t=3752 jiffies, g=2705, q=8)
> > > > rcu: All QSes seen, last rcu_preempt kthread activity 557 (4295471128-4295470571), jiffies_till_next_fqs=1, root ->qsmask 0x0
> > > > rcu: rcu_preempt kthread starved for 557 jiffies! g2705 f0x2 RCU_GP_CLEANUP(7) ->state=0x0 ->cpu=0
> > >
> > > And here we see that RCU's grace-period kthread has in fact been starved.
> > >
> > > This kthread is now in RCU_GP_CLEANUP, perhaps because of the wakeup that is
> > > sent in rcu_check_gp_kthread_starvation().
> > >
> > > My current guess is that this is a consequence of the earlier failures,
> > > but who knows?
> >
> > I can try bisection again, or reverting some commits that might be
> > suspicious? But we'd need some selection of suspicious commits.
>
> The report claims that one of the rcu_node ->lock fields is held
> with interrupts enabled, which would indeed be bad. Except that all
> of the stack traces that it shows have these locks held within the
> scheduling-clock interrupt handler. Now with the "rcu: Don't invoke
> try_invoke_on_locked_down_task() with irqs disabled" but without the
> "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> commit, I understand why. With both, I don't see how this happens.
I'm at a loss, but happy to keep bisecting and trying patches. I'm also
considering:
Is it the compiler? Probably not, I tried 2 versions of GCC.
Can we trust lockdep to precisely know IRQ state? I know there's
been some recent work around this, but hopefully we're not
affected here?
Is QEMU buggy?
> At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> calls at various points in that code, as shown in the patch below.
>
> At this point, I would guess that your first priority would be the
> initial bug rather than this following issue, but you never know, this
> might well help diagnose the initial bug.
I don't mind either way. I'm worried deadlocking the whole system might
be worse.
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit ccedf00693ef60f7c06d23490fc41bb60dd43dc3
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Thu Nov 19 10:13:06 2020 -0800
>
> rcu: Add lockdep_assert_irqs_disabled() to rcu_sched_clock_irq() and callees
Applied that. See new log attached -- I don't see any of them have
fired.
I also upgraded to next-20201119 -- nothing has changed though, and the
warnings and deadlocks still appear.
Thanks,
-- Marco
[-- Attachment #2: log --]
[-- Type: text/plain, Size: 13429 bytes --]
Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events:
hrtimer: interrupt took 11213232 ns
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 11s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_shepherd
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 13s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_shepherd
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
pending: neigh_periodic_work
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 11s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
pending: neigh_periodic_work
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 21s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
in-flight: 15:vmstat_shepherd
workqueue events_power_efficient: flags=0x82
pwq 2: cpus=0 flags=0x4 nice=0 active=1/256 refcnt=3
pending: neigh_periodic_work
pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=8s workers=2 idle: 5
BUG: workqueue lockup - pool cpus=0 flags=0x4 nice=0 stuck for 34s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
pending: vmstat_shepherd
workqueue events_power_efficient: flags=0x82
pwq 2: cpus=0 flags=0x4 nice=0 active=2/256 refcnt=4
pending: neigh_periodic_work, do_cache_clean
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
(detected by 0, t=3752 jiffies, g=2365, q=2)
rcu: All QSes seen, last rcu_preempt kthread activity 2599 (4295010564-4295007965), jiffies_till_next_fqs=1, root ->qsmask 0x0
rcu: rcu_preempt kthread starved for 2599 jiffies! g2365 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt state:R running task stack: 0 pid: 10 ppid: 2 flags:0x00000428
Call trace:
__switch_to+0x148/0x1f0 arch/arm64/kernel/process.c:577
context_switch kernel/sched/core.c:4268 [inline]
__schedule+0x2dc/0x9a8 kernel/sched/core.c:5018
preempt_schedule_notrace+0x70/0x1c0 kernel/sched/core.c:5251
__ftrace_ops_list_func kernel/trace/ftrace.c:6955 [inline]
ftrace_ops_list_func+0x10c/0x218 kernel/trace/ftrace.c:6976
ftrace_graph_call+0x0/0x4
preempt_count_add+0x8/0x1a0 arch/arm64/include/asm/atomic.h:65
schedule+0x44/0x100 kernel/sched/core.c:5096
schedule_timeout+0x240/0x538 kernel/time/timer.c:1871
rcu_gp_fqs_loop kernel/rcu/tree.c:1946 [inline]
rcu_gp_kthread+0x618/0x1bd8 kernel/rcu/tree.c:2119
kthread+0x13c/0x188 kernel/kthread.c:292
ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
rcu: Stack dump where RCU GP kthread last ran:
Task dump for CPU 0:
task:khungtaskd state:R running task stack: 0 pid: 23 ppid: 2 flags:0x0000042a
Call trace:
dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
sched_show_task kernel/sched/core.c:6947 [inline]
sched_show_task+0x208/0x230 kernel/sched/core.c:6921
dump_cpu_task+0x4c/0x5c kernel/sched/core.c:8985
rcu_check_gp_kthread_starvation+0x240/0x388 kernel/rcu/tree_stall.h:482
print_other_cpu_stall kernel/rcu/tree_stall.h:556 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
rcu_pending kernel/rcu/tree.c:3764 [inline]
rcu_sched_clock_irq+0x131c/0x1408 kernel/rcu/tree.c:2588
update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
__run_hrtimer kernel/time/hrtimer.c:1519 [inline]
__hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
__handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
handle_domain_irq include/linux/irqdesc.h:170 [inline]
gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
_raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
finish_lock_switch kernel/sched/core.c:4046 [inline]
finish_task_switch+0xb4/0x398 kernel/sched/core.c:4146
context_switch kernel/sched/core.c:4271 [inline]
__schedule+0x2e0/0x9a8 kernel/sched/core.c:5018
preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
kthread+0x13c/0x188 kernel/kthread.c:292
ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
================================
WARNING: inconsistent lock state
5.10.0-rc4-next-20201119-00003-gb88cbc8d31ee #4 Not tainted
--------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
khungtaskd/23 [HC0[0]:SC0[0]:HE0:SE1] takes:
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:516 [inline]
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3764 [inline]
ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c8/0x1408 kernel/rcu/tree.c:2588
{IN-HARDIRQ-W} state was registered at:
mark_lock kernel/locking/lockdep.c:4373 [inline]
mark_usage kernel/locking/lockdep.c:4301 [inline]
__lock_acquire+0xae8/0x1ac8 kernel/locking/lockdep.c:4784
lock_acquire kernel/locking/lockdep.c:5435 [inline]
lock_acquire+0x268/0x508 kernel/locking/lockdep.c:5400
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x78/0x14c kernel/locking/spinlock.c:159
print_other_cpu_stall kernel/rcu/tree_stall.h:516 [inline]
check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
rcu_pending kernel/rcu/tree.c:3764 [inline]
rcu_sched_clock_irq+0x7c8/0x1408 kernel/rcu/tree.c:2588
update_process_times+0x6c/0xb8 kernel/time/timer.c:1709
tick_sched_handle.isra.0+0x58/0x88 kernel/time/tick-sched.c:176
tick_sched_timer+0x68/0xe0 kernel/time/tick-sched.c:1328
__run_hrtimer kernel/time/hrtimer.c:1519 [inline]
__hrtimer_run_queues+0x288/0x730 kernel/time/hrtimer.c:1583
hrtimer_interrupt+0x114/0x288 kernel/time/hrtimer.c:1645
timer_handler drivers/clocksource/arm_arch_timer.c:647 [inline]
arch_timer_handler_virt+0x50/0x70 drivers/clocksource/arm_arch_timer.c:658
handle_percpu_devid_irq+0x104/0x4c0 kernel/irq/chip.c:930
generic_handle_irq_desc include/linux/irqdesc.h:152 [inline]
generic_handle_irq+0x54/0x78 kernel/irq/irqdesc.c:650
__handle_domain_irq+0xac/0x130 kernel/irq/irqdesc.c:687
handle_domain_irq include/linux/irqdesc.h:170 [inline]
gic_handle_irq+0x70/0x108 drivers/irqchip/irq-gic.c:370
el1_irq+0xc0/0x180 arch/arm64/kernel/entry.S:651
arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
_raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
finish_lock_switch kernel/sched/core.c:4046 [inline]
finish_task_switch+0xb4/0x398 kernel/sched/core.c:4146
context_switch kernel/sched/core.c:4271 [inline]
__schedule+0x2e0/0x9a8 kernel/sched/core.c:5018
preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
kthread+0x13c/0x188 kernel/kthread.c:292
ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
irq event stamp: 15198
hardirqs last enabled at (15197): [<ffffd1f05041241c>] el1_irq+0xdc/0x180 arch/arm64/kernel/entry.S:685
hardirqs last disabled at (15198): [<ffffd1f0504123c0>] el1_irq+0x80/0x180 arch/arm64/kernel/entry.S:648
softirqs last enabled at (14846): [<ffffd1f050410b80>] __do_softirq+0x630/0x6b4 kernel/softirq.c:325
softirqs last disabled at (14843): [<ffffd1f0504c623c>] do_softirq_own_stack include/linux/interrupt.h:568 [inline]
softirqs last disabled at (14843): [<ffffd1f0504c623c>] invoke_softirq kernel/softirq.c:393 [inline]
softirqs last disabled at (14843): [<ffffd1f0504c623c>] __irq_exit_rcu kernel/softirq.c:423 [inline]
softirqs last disabled at (14843): [<ffffd1f0504c623c>] irq_exit+0x1cc/0x1e0 kernel/softirq.c:447
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(rcu_node_0);
<Interrupt>
lock(rcu_node_0);
*** DEADLOCK ***
2 locks held by khungtaskd/23:
#0: ffffd1f0529f1420 (rcu_read_lock){....}-{1:2}, at: check_hung_uninterruptible_tasks kernel/hung_task.c:226 [inline]
#0: ffffd1f0529f1420 (rcu_read_lock){....}-{1:2}, at: watchdog+0x15c/0xbe8 kernel/hung_task.c:338
#1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: print_other_cpu_stall kernel/rcu/tree_stall.h:516 [inline]
#1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: check_cpu_stall kernel/rcu/tree_stall.h:679 [inline]
#1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_pending kernel/rcu/tree.c:3764 [inline]
#1: ffffd1f0529f47d8 (rcu_node_0){?.-.}-{2:2}, at: rcu_sched_clock_irq+0x7c8/0x1408 kernel/rcu/tree.c:2588
stack backtrace:
CPU: 0 PID: 23 Comm: khungtaskd Not tainted 5.10.0-rc4-next-20201119-00003-gb88cbc8d31ee #4
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x140/0x1bc lib/dump_stack.c:120
print_usage_bug kernel/locking/lockdep.c:3738 [inline]
print_usage_bug+0x2a0/0x2f0 kernel/locking/lockdep.c:3705
valid_state kernel/locking/lockdep.c:3749 [inline]
mark_lock_irq kernel/locking/lockdep.c:3952 [inline]
mark_lock.part.0+0x438/0x4e8 kernel/locking/lockdep.c:4409
mark_lock kernel/locking/lockdep.c:4007 [inline]
mark_held_locks+0x54/0x90 kernel/locking/lockdep.c:4010
__trace_hardirqs_on_caller kernel/locking/lockdep.c:4028 [inline]
lockdep_hardirqs_on_prepare+0xe0/0x290 kernel/locking/lockdep.c:4096
trace_hardirqs_on+0x90/0x370 kernel/trace/trace_preemptirq.c:49
el1_irq+0xdc/0x180 arch/arm64/kernel/entry.S:685
arch_local_irq_enable arch/arm64/include/asm/irqflags.h:37 [inline]
__raw_spin_unlock_irq include/linux/spinlock_api_smp.h:168 [inline]
_raw_spin_unlock_irq+0x50/0x98 kernel/locking/spinlock.c:199
finish_lock_switch kernel/sched/core.c:4046 [inline]
finish_task_switch+0xb4/0x398 kernel/sched/core.c:4146
context_switch kernel/sched/core.c:4271 [inline]
__schedule+0x2e0/0x9a8 kernel/sched/core.c:5018
preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
kthread+0x13c/0x188 kernel/kthread.c:292
ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
BUG: scheduling while atomic: khungtaskd/23/0x00000002
INFO: lockdep is turned off.
Modules linked in:
Preemption disabled at:
[<ffffd1f05181ce7c>] preempt_schedule_irq+0x3c/0xa0 kernel/sched/core.c:5278
CPU: 0 PID: 23 Comm: khungtaskd Not tainted 5.10.0-rc4-next-20201119-00003-gb88cbc8d31ee #4
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x140/0x1bc lib/dump_stack.c:120
__schedule_bug+0xcc/0xe0 kernel/sched/core.c:4757
schedule_debug kernel/sched/core.c:4784 [inline]
__schedule+0x888/0x9a8 kernel/sched/core.c:4912
preempt_schedule_irq+0x4c/0xa0 kernel/sched/core.c:5280
arm64_preempt_schedule_irq+0xd0/0x118 arch/arm64/kernel/process.c:725
el1_irq+0xd8/0x180 arch/arm64/kernel/entry.S:664
check_hung_uninterruptible_tasks kernel/hung_task.c:231 [inline]
watchdog+0x5d8/0xbe8 kernel/hung_task.c:338
kthread+0x13c/0x188 kernel/kthread.c:292
ret_from_fork+0x10/0x34 arch/arm64/kernel/entry.S:963
[-- Attachment #3: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-19 19:38 ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
@ 2020-11-19 21:35 ` Paul E. McKenney
2020-11-19 22:53 ` Will Deacon
2020-11-20 14:19 ` Marco Elver
0 siblings, 2 replies; 24+ messages in thread
From: Paul E. McKenney @ 2020-11-19 21:35 UTC (permalink / raw)
To: Marco Elver
Cc: Mark Rutland, Anders Roxell, Jann Horn, Peter Zijlstra,
Lai Jiangshan, Linux Kernel Mailing List, Steven Rostedt, rcu,
Linux-MM, Alexander Potapenko, kasan-dev, Tejun Heo,
Andrew Morton, Dmitry Vyukov, linux-arm-kernel
On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
[ . . . ]
> > > I can try bisection again, or reverting some commits that might be
> > > suspicious? But we'd need some selection of suspicious commits.
> >
> > The report claims that one of the rcu_node ->lock fields is held
> > with interrupts enabled, which would indeed be bad. Except that all
> > of the stack traces that it shows have these locks held within the
> > scheduling-clock interrupt handler. Now with the "rcu: Don't invoke
> > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > commit, I understand why. With both, I don't see how this happens.
>
> I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> considering:
>
> Is it the compiler? Probably not, I tried 2 versions of GCC.
>
> Can we trust lockdep to precisely know IRQ state? I know there's
> been some recent work around this, but hopefully we're not
> affected here?
>
> Is QEMU buggy?
>
> > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > calls at various points in that code, as shown in the patch below.
> >
> > At this point, I would guess that your first priority would be the
> > initial bug rather than this following issue, but you never know, this
> > might well help diagnose the initial bug.
>
> I don't mind either way. I'm worried deadlocking the whole system might
> be worse.
Here is another set of lockdep_assert_irqs_disabled() calls on the
off-chance that they actually find something.
Thanx, Paul
------------------------------------------------------------------------
commit bcca5277df3f24db15e15ccc8b05ecf346d05169
Author: Paul E. McKenney <paulmck@kernel.org>
Date: Thu Nov 19 13:30:33 2020 -0800
rcu: Add lockdep_assert_irqs_disabled() to raw_spin_unlock_rcu_node() macros
This commit adds a lockdep_assert_irqs_disabled() call to the
helper macros that release the rcu_node structure's ->lock, namely
to raw_spin_unlock_rcu_node(), raw_spin_unlock_irq_rcu_node() and
raw_spin_unlock_irqrestore_rcu_node(). The point of this is to help track
down a situation where lockdep appears to be insisting that interrupts
are enabled while holding an rcu_node structure's ->lock.
Link: https://lore.kernel.org/lkml/20201111133813.GA81547@elver.google.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 59ef1ae..bf0827d 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -378,7 +378,11 @@ do { \
smp_mb__after_unlock_lock(); \
} while (0)
-#define raw_spin_unlock_rcu_node(p) raw_spin_unlock(&ACCESS_PRIVATE(p, lock))
+#define raw_spin_unlock_rcu_node(p) \
+do { \
+ lockdep_assert_irqs_disabled(); \
+ raw_spin_unlock(&ACCESS_PRIVATE(p, lock)); \
+} while (0)
#define raw_spin_lock_irq_rcu_node(p) \
do { \
@@ -387,7 +391,10 @@ do { \
} while (0)
#define raw_spin_unlock_irq_rcu_node(p) \
- raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock))
+do { \
+ lockdep_assert_irqs_disabled(); \
+ raw_spin_unlock_irq(&ACCESS_PRIVATE(p, lock)); \
+} while (0)
#define raw_spin_lock_irqsave_rcu_node(p, flags) \
do { \
@@ -396,7 +403,10 @@ do { \
} while (0)
#define raw_spin_unlock_irqrestore_rcu_node(p, flags) \
- raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags)
+do { \
+ lockdep_assert_irqs_disabled(); \
+ raw_spin_unlock_irqrestore(&ACCESS_PRIVATE(p, lock), flags); \
+} while (0)
#define raw_spin_trylock_rcu_node(p) \
({ \
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-19 21:35 ` Paul E. McKenney
@ 2020-11-19 22:53 ` Will Deacon
2020-11-20 10:30 ` Mark Rutland
2020-11-20 14:19 ` Marco Elver
1 sibling, 1 reply; 24+ messages in thread
From: Will Deacon @ 2020-11-19 22:53 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Mark Rutland, Marco Elver, Anders Roxell, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Dmitry Vyukov, linux-arm-kernel
On Thu, Nov 19, 2020 at 01:35:12PM -0800, Paul E. McKenney wrote:
> On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
>
> [ . . . ]
>
> > > > I can try bisection again, or reverting some commits that might be
> > > > suspicious? But we'd need some selection of suspicious commits.
> > >
> > > The report claims that one of the rcu_node ->lock fields is held
> > > with interrupts enabled, which would indeed be bad. Except that all
> > > of the stack traces that it shows have these locks held within the
> > > scheduling-clock interrupt handler. Now with the "rcu: Don't invoke
> > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > commit, I understand why. With both, I don't see how this happens.
> >
> > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > considering:
> >
> > Is it the compiler? Probably not, I tried 2 versions of GCC.
> >
> > Can we trust lockdep to precisely know IRQ state? I know there's
> > been some recent work around this, but hopefully we're not
> > affected here?
> >
> > Is QEMU buggy?
> >
> > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > calls at various points in that code, as shown in the patch below.
> > >
> > > At this point, I would guess that your first priority would be the
> > > initial bug rather than this following issue, but you never know, this
> > > might well help diagnose the initial bug.
> >
> > I don't mind either way. I'm worried deadlocking the whole system might
> > be worse.
>
> Here is another set of lockdep_assert_irqs_disabled() calls on the
> off-chance that they actually find something.
FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
looking at that and I think he is close to having something workable.
Mark -- is there anything Marco and Paul can try out?
Will
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-19 22:53 ` Will Deacon
@ 2020-11-20 10:30 ` Mark Rutland
2020-11-20 14:03 ` Marco Elver
0 siblings, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2020-11-20 10:30 UTC (permalink / raw)
To: Will Deacon
Cc: linux-arm-kernel, Marco Elver, Anders Roxell, Paul E. McKenney,
Jann Horn, Peter Zijlstra, Lai Jiangshan,
Linux Kernel Mailing List, Steven Rostedt, rcu, Linux-MM,
Alexander Potapenko, kasan-dev, Tejun Heo, Andrew Morton,
Dmitry Vyukov
On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> On Thu, Nov 19, 2020 at 01:35:12PM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> >
> > [ . . . ]
> >
> > > > > I can try bisection again, or reverting some commits that might be
> > > > > suspicious? But we'd need some selection of suspicious commits.
> > > >
> > > > The report claims that one of the rcu_node ->lock fields is held
> > > > with interrupts enabled, which would indeed be bad. Except that all
> > > > of the stack traces that it shows have these locks held within the
> > > > scheduling-clock interrupt handler. Now with the "rcu: Don't invoke
> > > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > > commit, I understand why. With both, I don't see how this happens.
> > >
> > > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > > considering:
> > >
> > > Is it the compiler? Probably not, I tried 2 versions of GCC.
> > >
> > > Can we trust lockdep to precisely know IRQ state? I know there's
> > > been some recent work around this, but hopefully we're not
> > > affected here?
> > >
> > > Is QEMU buggy?
> > >
> > > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > > calls at various points in that code, as shown in the patch below.
> > > >
> > > > At this point, I would guess that your first priority would be the
> > > > initial bug rather than this following issue, but you never know, this
> > > > might well help diagnose the initial bug.
> > >
> > > I don't mind either way. I'm worried deadlocking the whole system might
> > > be worse.
> >
> > Here is another set of lockdep_assert_irqs_disabled() calls on the
> > off-chance that they actually find something.
>
> FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> looking at that and I think he is close to having something workable.
>
> Mark -- is there anything Marco and Paul can try out?
I initially traced some issues back to commit:
044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
... and that change of semantic could cause us to miss edges in some
cases, but IIUC mostly where we haven't done the right thing in
exception entry/return.
I don't think my patches address this case yet, but my WIP (currently
just fixing user<->kernel transitions) is at:
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
I'm looking into the kernel<->kernel transitions now, and I know that we
mess up RCU management for a small window around arch_cpu_idle, but it's
not immediately clear to me if either of those cases could cause this
report.
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 10:30 ` Mark Rutland
@ 2020-11-20 14:03 ` Marco Elver
2020-11-23 19:32 ` Mark Rutland
0 siblings, 1 reply; 24+ messages in thread
From: Marco Elver @ 2020-11-20 14:03 UTC (permalink / raw)
To: Mark Rutland
Cc: linux-arm-kernel, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Will Deacon, Dmitry Vyukov
On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > On Thu, Nov 19, 2020 at 01:35:12PM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > > > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> > >
> > > [ . . . ]
> > >
> > > > > > I can try bisection again, or reverting some commits that might be
> > > > > > suspicious? But we'd need some selection of suspicious commits.
> > > > >
> > > > > The report claims that one of the rcu_node ->lock fields is held
> > > > > with interrupts enabled, which would indeed be bad. Except that all
> > > > > of the stack traces that it shows have these locks held within the
> > > > > scheduling-clock interrupt handler. Now with the "rcu: Don't invoke
> > > > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > > > commit, I understand why. With both, I don't see how this happens.
> > > >
> > > > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > > > considering:
> > > >
> > > > Is it the compiler? Probably not, I tried 2 versions of GCC.
> > > >
> > > > Can we trust lockdep to precisely know IRQ state? I know there's
> > > > been some recent work around this, but hopefully we're not
> > > > affected here?
> > > >
> > > > Is QEMU buggy?
> > > >
> > > > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > > > calls at various points in that code, as shown in the patch below.
> > > > >
> > > > > At this point, I would guess that your first priority would be the
> > > > > initial bug rather than this following issue, but you never know, this
> > > > > might well help diagnose the initial bug.
> > > >
> > > > I don't mind either way. I'm worried deadlocking the whole system might
> > > > be worse.
> > >
> > > Here is another set of lockdep_assert_irqs_disabled() calls on the
> > > off-chance that they actually find something.
> >
> > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > looking at that and I think he is close to having something workable.
> >
> > Mark -- is there anything Marco and Paul can try out?
>
> I initially traced some issues back to commit:
>
> 044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
>
> ... and that change of semantic could cause us to miss edges in some
> cases, but IIUC mostly where we haven't done the right thing in
> exception entry/return.
>
> I don't think my patches address this case yet, but my WIP (currently
> just fixing user<->kernel transitions) is at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
>
> I'm looking into the kernel<->kernel transitions now, and I know that we
> mess up RCU management for a small window around arch_cpu_idle, but it's
> not immediately clear to me if either of those cases could cause this
> report.
Thank you -- I tried your irq-fixes, however that didn't seem to fix the
problem (still get warnings and then a panic). :-/
| [ 118.375217] Testing all events: OK
| [ 174.878839] Running tests again, along with the function tracer
| [ 174.894781] Running tests on all trace events:
| [ 174.906734] Testing all events:
| [ 176.204533] hrtimer: interrupt took 9035008 ns
| [ 286.788330] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 10s!
| [ 286.883216] Showing busy workqueues and worker pools:
| [ 286.899647] workqueue events: flags=0x0
| [ 286.920606] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
| [ 286.933010] pending: vmstat_shepherd
| [ 644.323445] INFO: task kworker/u2:1:107 blocked for more than 12 seconds.
| [ 649.448126] Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #17
| [ 656.619598] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
| [ 660.623500] task:kworker/u2:1 state:R stack: 0 pid: 107 ppid: 2 flags:0x00000428
| [ 671.587980] Call trace:
| [ 674.885884] __switch_to+0x148/0x1f0
| [ 675.267490] __schedule+0x2dc/0x9a8
| [ 677.748050] schedule+0x4c/0x100
| [ 679.223880] worker_thread+0xe8/0x510
| [ 680.663844] kthread+0x13c/0x188
| [ 681.663992] ret_from_fork+0x10/0x34
| [ 684.493389]
| [ 684.493389] Showing all locks held in the system:
| [ 688.554449] 4 locks held by swapper/0/1:
| [ 691.747079] 1 lock held by khungtaskd/23:
| [ 692.525727] #0: ffffa1ebd7ff1420 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x34/0x198
| [ 704.403177]
| [ 704.630928] =============================================
| [ 704.630928]
| [ 706.168072] Kernel panic - not syncing: hung_task: blocked tasks
| [ 706.172894] CPU: 0 PID: 23 Comm: khungtaskd Not tainted 5.10.0-rc4-next-20201119-00004-g77838ee21ff6-dirty #17
| [ 706.178507] Hardware name: linux,dummy-virt (DT)
| [ 706.182658] Call trace:
| [ 706.186231] dump_backtrace+0x0/0x240
| [ 706.190124] show_stack+0x34/0x88
| [ 706.193917] dump_stack+0x140/0x1bc
| [ 706.197728] panic+0x1e4/0x494
| [ 706.201440] watchdog+0x668/0xbe8
| [ 706.205238] kthread+0x13c/0x188
| [ 706.208991] ret_from_fork+0x10/0x34
| [ 706.214532] Kernel Offset: 0x21ebc5a00000 from 0xffff800010000000
| [ 706.219014] PHYS_OFFSET: 0xffffad8a80000000
| [ 706.223148] CPU features: 0x0240022,61806082
| [ 706.227149] Memory Limit: none
| [ 706.233359] ---[ end Kernel panic - not syncing: hung_task: blocked tasks ]---
Thanks,
-- Marco
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-19 21:35 ` Paul E. McKenney
2020-11-19 22:53 ` Will Deacon
@ 2020-11-20 14:19 ` Marco Elver
2020-11-20 14:39 ` Paul E. McKenney
2020-11-20 15:26 ` Steven Rostedt
1 sibling, 2 replies; 24+ messages in thread
From: Marco Elver @ 2020-11-20 14:19 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Mark Rutland, Anders Roxell, Jann Horn, Peter Zijlstra,
Lai Jiangshan, Linux Kernel Mailing List, Steven Rostedt, rcu,
Linux-MM, Alexander Potapenko, kasan-dev, Tejun Heo,
Andrew Morton, Dmitry Vyukov, linux-arm-kernel
On Thu, Nov 19, 2020 at 01:35PM -0800, Paul E. McKenney wrote:
> On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
>
> [ . . . ]
>
> > > > I can try bisection again, or reverting some commits that might be
> > > > suspicious? But we'd need some selection of suspicious commits.
> > >
> > > The report claims that one of the rcu_node ->lock fields is held
> > > with interrupts enabled, which would indeed be bad. Except that all
> > > of the stack traces that it shows have these locks held within the
> > > scheduling-clock interrupt handler. Now with the "rcu: Don't invoke
> > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > commit, I understand why. With both, I don't see how this happens.
> >
> > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > considering:
> >
> > Is it the compiler? Probably not, I tried 2 versions of GCC.
> >
> > Can we trust lockdep to precisely know IRQ state? I know there's
> > been some recent work around this, but hopefully we're not
> > affected here?
> >
> > Is QEMU buggy?
> >
> > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > calls at various points in that code, as shown in the patch below.
> > >
> > > At this point, I would guess that your first priority would be the
> > > initial bug rather than this following issue, but you never know, this
> > > might well help diagnose the initial bug.
> >
> > I don't mind either way. I'm worried deadlocking the whole system might
> > be worse.
>
> Here is another set of lockdep_assert_irqs_disabled() calls on the
> off-chance that they actually find something.
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit bcca5277df3f24db15e15ccc8b05ecf346d05169
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date: Thu Nov 19 13:30:33 2020 -0800
>
> rcu: Add lockdep_assert_irqs_disabled() to raw_spin_unlock_rcu_node() macros
None of those triggered either.
I found that disabling ftrace for some of kernel/rcu (see below) solved
the stalls (and any mention of deadlocks as a side-effect I assume),
resulting in successful boot.
Does that provide any additional clues? I tried to narrow it down to 1-2
files, but that doesn't seem to work.
Thanks,
-- Marco
------ >8 ------
diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
index 0cfb009a99b9..678b4b094f94 100644
--- a/kernel/rcu/Makefile
+++ b/kernel/rcu/Makefile
@@ -3,6 +3,13 @@
# and is generally not a function of system call inputs.
KCOV_INSTRUMENT := n
+ifdef CONFIG_FUNCTION_TRACER
+CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
+CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
+endif
+
ifeq ($(CONFIG_KCSAN),y)
KBUILD_CFLAGS += -g -fno-omit-frame-pointer
endif
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 14:19 ` Marco Elver
@ 2020-11-20 14:39 ` Paul E. McKenney
2020-11-20 15:22 ` Mark Rutland
2020-11-20 15:26 ` Steven Rostedt
1 sibling, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2020-11-20 14:39 UTC (permalink / raw)
To: Marco Elver
Cc: Mark Rutland, Anders Roxell, Jann Horn, Peter Zijlstra,
Lai Jiangshan, Linux Kernel Mailing List, Steven Rostedt, rcu,
Linux-MM, Alexander Potapenko, kasan-dev, Tejun Heo,
Andrew Morton, Dmitry Vyukov, linux-arm-kernel
On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> On Thu, Nov 19, 2020 at 01:35PM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 19, 2020 at 08:38:19PM +0100, Marco Elver wrote:
> > > On Thu, Nov 19, 2020 at 10:48AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Nov 19, 2020 at 06:02:59PM +0100, Marco Elver wrote:
> >
> > [ . . . ]
> >
> > > > > I can try bisection again, or reverting some commits that might be
> > > > > suspicious? But we'd need some selection of suspicious commits.
> > > >
> > > > The report claims that one of the rcu_node ->lock fields is held
> > > > with interrupts enabled, which would indeed be bad. Except that all
> > > > of the stack traces that it shows have these locks held within the
> > > > scheduling-clock interrupt handler. Now with the "rcu: Don't invoke
> > > > try_invoke_on_locked_down_task() with irqs disabled" but without the
> > > > "sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled"
> > > > commit, I understand why. With both, I don't see how this happens.
> > >
> > > I'm at a loss, but happy to keep bisecting and trying patches. I'm also
> > > considering:
> > >
> > > Is it the compiler? Probably not, I tried 2 versions of GCC.
> > >
> > > Can we trust lockdep to precisely know IRQ state? I know there's
> > > been some recent work around this, but hopefully we're not
> > > affected here?
> > >
> > > Is QEMU buggy?
> > >
> > > > At this point, I am reduced to adding lockdep_assert_irqs_disabled()
> > > > calls at various points in that code, as shown in the patch below.
> > > >
> > > > At this point, I would guess that your first priority would be the
> > > > initial bug rather than this following issue, but you never know, this
> > > > might well help diagnose the initial bug.
> > >
> > > I don't mind either way. I'm worried deadlocking the whole system might
> > > be worse.
> >
> > Here is another set of lockdep_assert_irqs_disabled() calls on the
> > off-chance that they actually find something.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit bcca5277df3f24db15e15ccc8b05ecf346d05169
> > Author: Paul E. McKenney <paulmck@kernel.org>
> > Date: Thu Nov 19 13:30:33 2020 -0800
> >
> > rcu: Add lockdep_assert_irqs_disabled() to raw_spin_unlock_rcu_node() macros
>
> None of those triggered either.
>
> I found that disabling ftrace for some of kernel/rcu (see below) solved
> the stalls (and any mention of deadlocks as a side-effect I assume),
> resulting in successful boot.
>
> Does that provide any additional clues? I tried to narrow it down to 1-2
> files, but that doesn't seem to work.
There were similar issues during the x86/entry work. Are the ARM guys
doing arm64/entry work now?
Thanx, Paul
> Thanks,
> -- Marco
>
> ------ >8 ------
>
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 0cfb009a99b9..678b4b094f94 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -3,6 +3,13 @@
> # and is generally not a function of system call inputs.
> KCOV_INSTRUMENT := n
>
> +ifdef CONFIG_FUNCTION_TRACER
> +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> +endif
> +
> ifeq ($(CONFIG_KCSAN),y)
> KBUILD_CFLAGS += -g -fno-omit-frame-pointer
> endif
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 14:39 ` Paul E. McKenney
@ 2020-11-20 15:22 ` Mark Rutland
2020-11-20 17:38 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2020-11-20 15:22 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-arm-kernel, Marco Elver, Anders Roxell, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > the stalls (and any mention of deadlocks as a side-effect I assume),
> > resulting in successful boot.
> >
> > Does that provide any additional clues? I tried to narrow it down to 1-2
> > files, but that doesn't seem to work.
>
> There were similar issues during the x86/entry work. Are the ARM guys
> doing arm64/entry work now?
I'm currently looking at it. I had been trying to shift things to C for
a while, and right now I'm trying to fix the lockdep state tracking,
which is requiring untangling lockdep/rcu/tracing.
The main issue I see remaining atm is that we don't save/restore the
lockdep state over exceptions taken from kernel to kernel. That could
result in lockdep thinking IRQs are disabled when they're actually
enabled (because code in the nested context might do a save/restore
while IRQs are disabled, then return to a context where IRQs are
enabled), but AFAICT shouldn't result in the inverse in most cases since
the non-NMI handlers all call lockdep_hardirqs_disabled().
I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
pointers to the issuies ween with the x86 rework that'd be quite handy.
Thanks,
Mark.
>
> Thanx, Paul
>
> > Thanks,
> > -- Marco
> >
> > ------ >8 ------
> >
> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > index 0cfb009a99b9..678b4b094f94 100644
> > --- a/kernel/rcu/Makefile
> > +++ b/kernel/rcu/Makefile
> > @@ -3,6 +3,13 @@
> > # and is generally not a function of system call inputs.
> > KCOV_INSTRUMENT := n
> >
> > +ifdef CONFIG_FUNCTION_TRACER
> > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > +endif
> > +
> > ifeq ($(CONFIG_KCSAN),y)
> > KBUILD_CFLAGS += -g -fno-omit-frame-pointer
> > endif
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 14:19 ` Marco Elver
2020-11-20 14:39 ` Paul E. McKenney
@ 2020-11-20 15:26 ` Steven Rostedt
2020-11-20 18:17 ` Marco Elver
1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2020-11-20 15:26 UTC (permalink / raw)
To: Marco Elver
Cc: Mark Rutland, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
kasan-dev, rcu, Linux-MM, Alexander Potapenko, linux-arm-kernel,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, 20 Nov 2020 15:19:28 +0100
Marco Elver <elver@google.com> wrote:
> None of those triggered either.
>
> I found that disabling ftrace for some of kernel/rcu (see below) solved
> the stalls (and any mention of deadlocks as a side-effect I assume),
> resulting in successful boot.
>
> Does that provide any additional clues? I tried to narrow it down to 1-2
> files, but that doesn't seem to work.
>
> Thanks,
> -- Marco
>
> ------ >8 ------
>
> diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> index 0cfb009a99b9..678b4b094f94 100644
> --- a/kernel/rcu/Makefile
> +++ b/kernel/rcu/Makefile
> @@ -3,6 +3,13 @@
> # and is generally not a function of system call inputs.
> KCOV_INSTRUMENT := n
>
> +ifdef CONFIG_FUNCTION_TRACER
> +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> +endif
> +
Can you narrow it down further? That is, do you really need all of the
above to stop the stalls?
Also, since you are using linux-next, you have ftrace recursion debugging.
Please enable:
CONFIG_FTRACE_RECORD_RECURSION=y
CONFIG_RING_BUFFER_RECORD_RECURSION=y
when enabling any of the above. If you can get to a successful boot, you
can then:
# cat /sys/kernel/tracing/recursed_functions
Which would let me know if there's an recursion issue in RCU somewhere.
-- Steve
-- Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 15:22 ` Mark Rutland
@ 2020-11-20 17:38 ` Paul E. McKenney
2020-11-20 18:02 ` Mark Rutland
0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2020-11-20 17:38 UTC (permalink / raw)
To: Mark Rutland
Cc: linux-arm-kernel, Marco Elver, Anders Roxell, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote:
> On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > > the stalls (and any mention of deadlocks as a side-effect I assume),
> > > resulting in successful boot.
> > >
> > > Does that provide any additional clues? I tried to narrow it down to 1-2
> > > files, but that doesn't seem to work.
> >
> > There were similar issues during the x86/entry work. Are the ARM guys
> > doing arm64/entry work now?
>
> I'm currently looking at it. I had been trying to shift things to C for
> a while, and right now I'm trying to fix the lockdep state tracking,
> which is requiring untangling lockdep/rcu/tracing.
>
> The main issue I see remaining atm is that we don't save/restore the
> lockdep state over exceptions taken from kernel to kernel. That could
> result in lockdep thinking IRQs are disabled when they're actually
> enabled (because code in the nested context might do a save/restore
> while IRQs are disabled, then return to a context where IRQs are
> enabled), but AFAICT shouldn't result in the inverse in most cases since
> the non-NMI handlers all call lockdep_hardirqs_disabled().
>
> I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
> pointers to the issuies ween with the x86 rework that'd be quite handy.
There were several over a number of months. I especially recall issues
with the direct-from-idle execution of smp_call_function*() handlers,
and also with some of the special cases in the entry code, for example,
reentering the kernel from the kernel. This latter could cause RCU to
not be watching when it should have been or vice versa.
I would of course be most aware of the issues that impinged on RCU
and that were located by rcutorture. This is actually not hard to run,
especially if the ARM bits in the scripting have managed to avoid bitrot.
The "modprobe rcutorture" approach has fewer dependencies. Either way:
https://paulmck.livejournal.com/57769.html and later posts.
Thanx, Paul
> Thanks,
> Mark.
>
> >
> > Thanx, Paul
> >
> > > Thanks,
> > > -- Marco
> > >
> > > ------ >8 ------
> > >
> > > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > > index 0cfb009a99b9..678b4b094f94 100644
> > > --- a/kernel/rcu/Makefile
> > > +++ b/kernel/rcu/Makefile
> > > @@ -3,6 +3,13 @@
> > > # and is generally not a function of system call inputs.
> > > KCOV_INSTRUMENT := n
> > >
> > > +ifdef CONFIG_FUNCTION_TRACER
> > > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > > +endif
> > > +
> > > ifeq ($(CONFIG_KCSAN),y)
> > > KBUILD_CFLAGS += -g -fno-omit-frame-pointer
> > > endif
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 17:38 ` Paul E. McKenney
@ 2020-11-20 18:02 ` Mark Rutland
2020-11-20 18:57 ` Paul E. McKenney
0 siblings, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2020-11-20 18:02 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-arm-kernel, Marco Elver, Anders Roxell, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, Nov 20, 2020 at 09:38:24AM -0800, Paul E. McKenney wrote:
> On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote:
> > On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> > > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > > > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > > > the stalls (and any mention of deadlocks as a side-effect I assume),
> > > > resulting in successful boot.
> > > >
> > > > Does that provide any additional clues? I tried to narrow it down to 1-2
> > > > files, but that doesn't seem to work.
> > >
> > > There were similar issues during the x86/entry work. Are the ARM guys
> > > doing arm64/entry work now?
> >
> > I'm currently looking at it. I had been trying to shift things to C for
> > a while, and right now I'm trying to fix the lockdep state tracking,
> > which is requiring untangling lockdep/rcu/tracing.
> >
> > The main issue I see remaining atm is that we don't save/restore the
> > lockdep state over exceptions taken from kernel to kernel. That could
> > result in lockdep thinking IRQs are disabled when they're actually
> > enabled (because code in the nested context might do a save/restore
> > while IRQs are disabled, then return to a context where IRQs are
> > enabled), but AFAICT shouldn't result in the inverse in most cases since
> > the non-NMI handlers all call lockdep_hardirqs_disabled().
> >
> > I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
> > pointers to the issuies ween with the x86 rework that'd be quite handy.
>
> There were several over a number of months. I especially recall issues
> with the direct-from-idle execution of smp_call_function*() handlers,
> and also with some of the special cases in the entry code, for example,
> reentering the kernel from the kernel. This latter could cause RCU to
> not be watching when it should have been or vice versa.
Ah; those are precisely the cases I'm currently fixing, so if we're
lucky this is an indirect result of one of those rather than a novel
source of pain...
> I would of course be most aware of the issues that impinged on RCU
> and that were located by rcutorture. This is actually not hard to run,
> especially if the ARM bits in the scripting have managed to avoid bitrot.
> The "modprobe rcutorture" approach has fewer dependencies. Either way:
> https://paulmck.livejournal.com/57769.html and later posts.
That is a very good idea. I'd been relying on Syzkaller to tickle the
issue, but the torture infrastructure is a much better fit for this
problem. I hadn't realise how comprehensive the scripting was, thanks
for this!
I'll see about giving that a go once I have the irq-from-idle cases
sorted, as those are very obviously broken if you hack
trace_hardirqs_{on,off}() to check that RCU is watching.
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 15:26 ` Steven Rostedt
@ 2020-11-20 18:17 ` Marco Elver
2020-11-20 18:57 ` Steven Rostedt
2020-11-20 19:16 ` Steven Rostedt
0 siblings, 2 replies; 24+ messages in thread
From: Marco Elver @ 2020-11-20 18:17 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mark Rutland, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
kasan-dev, rcu, Linux-MM, Alexander Potapenko, linux-arm-kernel,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, Nov 20, 2020 at 10:26AM -0500, Steven Rostedt wrote:
> On Fri, 20 Nov 2020 15:19:28 +0100
> Marco Elver <elver@google.com> wrote:
>
> > None of those triggered either.
> >
> > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > the stalls (and any mention of deadlocks as a side-effect I assume),
> > resulting in successful boot.
> >
> > Does that provide any additional clues? I tried to narrow it down to 1-2
> > files, but that doesn't seem to work.
> >
> > Thanks,
> > -- Marco
> >
> > ------ >8 ------
> >
> > diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile
> > index 0cfb009a99b9..678b4b094f94 100644
> > --- a/kernel/rcu/Makefile
> > +++ b/kernel/rcu/Makefile
> > @@ -3,6 +3,13 @@
> > # and is generally not a function of system call inputs.
> > KCOV_INSTRUMENT := n
> >
> > +ifdef CONFIG_FUNCTION_TRACER
> > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > +endif
> > +
>
> Can you narrow it down further? That is, do you really need all of the
> above to stop the stalls?
I tried to reduce it to 1 or combinations of 2 files only, but that
didn't work.
> Also, since you are using linux-next, you have ftrace recursion debugging.
> Please enable:
>
> CONFIG_FTRACE_RECORD_RECURSION=y
> CONFIG_RING_BUFFER_RECORD_RECURSION=y
>
> when enabling any of the above. If you can get to a successful boot, you
> can then:
>
> # cat /sys/kernel/tracing/recursed_functions
>
> Which would let me know if there's an recursion issue in RCU somewhere.
To get the system to boot in the first place (as mentioned in other
emails) I again needed to revert
"rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled",
as otherwise would run into the deadlock. That used to still result in
stall warnings, except when ftrace's recursion detection is on it seems.
With that, this is what I get:
| # cat /sys/kernel/tracing/recursed_functions
| trace_selftest_test_recursion_func+0x34/0x48: trace_selftest_dynamic_test_func+0x4/0x28
| el1_irq+0xc0/0x180: gic_handle_irq+0x4/0x108
| gic_handle_irq+0x70/0x108: __handle_domain_irq+0x4/0x130
| __handle_domain_irq+0x7c/0x130: irq_enter+0x4/0x28
| trace_rcu_dyntick+0x168/0x190: rcu_read_lock_sched_held+0x4/0x98
| rcu_read_lock_sched_held+0x30/0x98: rcu_read_lock_held_common+0x4/0x88
| rcu_read_lock_held_common+0x50/0x88: rcu_lockdep_current_cpu_online+0x4/0xd0
| irq_enter+0x1c/0x28: irq_enter_rcu+0x4/0xa8
| irq_enter_rcu+0x3c/0xa8: irqtime_account_irq+0x4/0x198
| irq_enter_rcu+0x44/0xa8: preempt_count_add+0x4/0x1a0
| trace_hardirqs_off+0x254/0x2d8: __srcu_read_lock+0x4/0xa0
| trace_hardirqs_off+0x25c/0x2d8: rcu_irq_enter_irqson+0x4/0x78
| trace_rcu_dyntick+0xd8/0x190: __traceiter_rcu_dyntick+0x4/0x80
| trace_hardirqs_off+0x294/0x2d8: rcu_irq_exit_irqson+0x4/0x78
| trace_hardirqs_off+0x2a0/0x2d8: __srcu_read_unlock+0x4/0x88
Thanks,
-- Marco
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 18:17 ` Marco Elver
@ 2020-11-20 18:57 ` Steven Rostedt
2020-11-20 19:16 ` Steven Rostedt
1 sibling, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2020-11-20 18:57 UTC (permalink / raw)
To: Marco Elver
Cc: Mark Rutland, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
kasan-dev, rcu, Linux-MM, Alexander Potapenko, linux-arm-kernel,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, 20 Nov 2020 19:17:37 +0100
Marco Elver <elver@google.com> wrote:
> | # cat /sys/kernel/tracing/recursed_functions
> | trace_selftest_test_recursion_func+0x34/0x48: trace_selftest_dynamic_test_func+0x4/0x28
> | el1_irq+0xc0/0x180: gic_handle_irq+0x4/0x108
> | gic_handle_irq+0x70/0x108: __handle_domain_irq+0x4/0x130
> | __handle_domain_irq+0x7c/0x130: irq_enter+0x4/0x28
> | trace_rcu_dyntick+0x168/0x190: rcu_read_lock_sched_held+0x4/0x98
> | rcu_read_lock_sched_held+0x30/0x98: rcu_read_lock_held_common+0x4/0x88
> | rcu_read_lock_held_common+0x50/0x88: rcu_lockdep_current_cpu_online+0x4/0xd0
> | irq_enter+0x1c/0x28: irq_enter_rcu+0x4/0xa8
> | irq_enter_rcu+0x3c/0xa8: irqtime_account_irq+0x4/0x198
> | irq_enter_rcu+0x44/0xa8: preempt_count_add+0x4/0x1a0
> | trace_hardirqs_off+0x254/0x2d8: __srcu_read_lock+0x4/0xa0
> | trace_hardirqs_off+0x25c/0x2d8: rcu_irq_enter_irqson+0x4/0x78
> | trace_rcu_dyntick+0xd8/0x190: __traceiter_rcu_dyntick+0x4/0x80
> | trace_hardirqs_off+0x294/0x2d8: rcu_irq_exit_irqson+0x4/0x78
> | trace_hardirqs_off+0x2a0/0x2d8: __srcu_read_unlock+0x4/0x88
These look normal. They happen when an interrupt occurs while tracing
something with interrupts enabled, and the interrupt traces a function
before it sets the "preempt_count" to reflect that its in a new context.
That is:
normal_context:
func_A();
trace_function();
<interrupt>
irq_enter();
trace_function()
if (int_interrupt())
[returns false]
set_preempt_count (in interrupt)
And the recursion detection is tricked into thinking it recursed in the
same context. The lastest code handles this by allowing one level of
recursion:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b02414c8f045ab3b9afc816c3735bc98c5c3d262
-- Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 18:02 ` Mark Rutland
@ 2020-11-20 18:57 ` Paul E. McKenney
0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2020-11-20 18:57 UTC (permalink / raw)
To: Mark Rutland
Cc: linux-arm-kernel, Marco Elver, Anders Roxell, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, Nov 20, 2020 at 06:02:06PM +0000, Mark Rutland wrote:
> On Fri, Nov 20, 2020 at 09:38:24AM -0800, Paul E. McKenney wrote:
> > On Fri, Nov 20, 2020 at 03:22:00PM +0000, Mark Rutland wrote:
> > > On Fri, Nov 20, 2020 at 06:39:28AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Nov 20, 2020 at 03:19:28PM +0100, Marco Elver wrote:
> > > > > I found that disabling ftrace for some of kernel/rcu (see below) solved
> > > > > the stalls (and any mention of deadlocks as a side-effect I assume),
> > > > > resulting in successful boot.
> > > > >
> > > > > Does that provide any additional clues? I tried to narrow it down to 1-2
> > > > > files, but that doesn't seem to work.
> > > >
> > > > There were similar issues during the x86/entry work. Are the ARM guys
> > > > doing arm64/entry work now?
> > >
> > > I'm currently looking at it. I had been trying to shift things to C for
> > > a while, and right now I'm trying to fix the lockdep state tracking,
> > > which is requiring untangling lockdep/rcu/tracing.
> > >
> > > The main issue I see remaining atm is that we don't save/restore the
> > > lockdep state over exceptions taken from kernel to kernel. That could
> > > result in lockdep thinking IRQs are disabled when they're actually
> > > enabled (because code in the nested context might do a save/restore
> > > while IRQs are disabled, then return to a context where IRQs are
> > > enabled), but AFAICT shouldn't result in the inverse in most cases since
> > > the non-NMI handlers all call lockdep_hardirqs_disabled().
> > >
> > > I'm at a loss to explaim the rcu vs ftrace bits, so if you have any
> > > pointers to the issuies ween with the x86 rework that'd be quite handy.
> >
> > There were several over a number of months. I especially recall issues
> > with the direct-from-idle execution of smp_call_function*() handlers,
> > and also with some of the special cases in the entry code, for example,
> > reentering the kernel from the kernel. This latter could cause RCU to
> > not be watching when it should have been or vice versa.
>
> Ah; those are precisely the cases I'm currently fixing, so if we're
> lucky this is an indirect result of one of those rather than a novel
> source of pain...
Here is hoping!
> > I would of course be most aware of the issues that impinged on RCU
> > and that were located by rcutorture. This is actually not hard to run,
> > especially if the ARM bits in the scripting have managed to avoid bitrot.
> > The "modprobe rcutorture" approach has fewer dependencies. Either way:
> > https://paulmck.livejournal.com/57769.html and later posts.
>
> That is a very good idea. I'd been relying on Syzkaller to tickle the
> issue, but the torture infrastructure is a much better fit for this
> problem. I hadn't realise how comprehensive the scripting was, thanks
> for this!
But why not both rcutorture and Syzkaller? ;-)
> I'll see about giving that a go once I have the irq-from-idle cases
> sorted, as those are very obviously broken if you hack
> trace_hardirqs_{on,off}() to check that RCU is watching.
Sounds good!
Thanx, Paul
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 18:17 ` Marco Elver
2020-11-20 18:57 ` Steven Rostedt
@ 2020-11-20 19:16 ` Steven Rostedt
2020-11-20 19:22 ` Marco Elver
1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2020-11-20 19:16 UTC (permalink / raw)
To: Marco Elver
Cc: Mark Rutland, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
kasan-dev, rcu, Linux-MM, Alexander Potapenko, linux-arm-kernel,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, 20 Nov 2020 19:17:37 +0100
Marco Elver <elver@google.com> wrote:
> > > +++ b/kernel/rcu/Makefile
> > > @@ -3,6 +3,13 @@
> > > # and is generally not a function of system call inputs.
> > > KCOV_INSTRUMENT := n
> > >
> > > +ifdef CONFIG_FUNCTION_TRACER
> > > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > > +endif
> > > +
> >
> > Can you narrow it down further? That is, do you really need all of the
> > above to stop the stalls?
>
> I tried to reduce it to 1 or combinations of 2 files only, but that
> didn't work.
I'm curious if this would help at all?
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2a52f42f64b6..d020ecefd151 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1094,7 +1094,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
* if the current CPU is not in its idle loop or is in an interrupt or
* NMI handler, return true.
*/
-bool rcu_is_watching(void)
+notrace bool rcu_is_watching(void)
{
bool ret;
Although I don't see it in the recursion list.
-- Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 19:16 ` Steven Rostedt
@ 2020-11-20 19:22 ` Marco Elver
0 siblings, 0 replies; 24+ messages in thread
From: Marco Elver @ 2020-11-20 19:22 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mark Rutland, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
kasan-dev, rcu, Linux-MM, Alexander Potapenko, Linux ARM,
Tejun Heo, Andrew Morton, Dmitry Vyukov
On Fri, 20 Nov 2020 at 20:16, Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Fri, 20 Nov 2020 19:17:37 +0100
> Marco Elver <elver@google.com> wrote:
>
> > > > +++ b/kernel/rcu/Makefile
> > > > @@ -3,6 +3,13 @@
> > > > # and is generally not a function of system call inputs.
> > > > KCOV_INSTRUMENT := n
> > > >
> > > > +ifdef CONFIG_FUNCTION_TRACER
> > > > +CFLAGS_REMOVE_update.o = $(CC_FLAGS_FTRACE)
> > > > +CFLAGS_REMOVE_sync.o = $(CC_FLAGS_FTRACE)
> > > > +CFLAGS_REMOVE_srcutree.o = $(CC_FLAGS_FTRACE)
> > > > +CFLAGS_REMOVE_tree.o = $(CC_FLAGS_FTRACE)
> > > > +endif
> > > > +
> > >
> > > Can you narrow it down further? That is, do you really need all of the
> > > above to stop the stalls?
> >
> > I tried to reduce it to 1 or combinations of 2 files only, but that
> > didn't work.
>
> I'm curious if this would help at all?
>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2a52f42f64b6..d020ecefd151 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1094,7 +1094,7 @@ static void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
> * if the current CPU is not in its idle loop or is in an interrupt or
> * NMI handler, return true.
> */
> -bool rcu_is_watching(void)
> +notrace bool rcu_is_watching(void)
> {
> bool ret;
>
> Although I don't see it in the recursion list.
It seems a patch to that effect is already in -next ("rcu,ftrace: Fix
ftrace recursion"), and my experiments so far have all been with it.
Thanks,
-- Marco
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-20 14:03 ` Marco Elver
@ 2020-11-23 19:32 ` Mark Rutland
2020-11-24 14:03 ` Marco Elver
0 siblings, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2020-11-23 19:32 UTC (permalink / raw)
To: Marco Elver
Cc: linux-arm-kernel, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Will Deacon, Dmitry Vyukov
On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > looking at that and I think he is close to having something workable.
> > >
> > > Mark -- is there anything Marco and Paul can try out?
> >
> > I initially traced some issues back to commit:
> >
> > 044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> >
> > ... and that change of semantic could cause us to miss edges in some
> > cases, but IIUC mostly where we haven't done the right thing in
> > exception entry/return.
> >
> > I don't think my patches address this case yet, but my WIP (currently
> > just fixing user<->kernel transitions) is at:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> >
> > I'm looking into the kernel<->kernel transitions now, and I know that we
> > mess up RCU management for a small window around arch_cpu_idle, but it's
> > not immediately clear to me if either of those cases could cause this
> > report.
>
> Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> problem (still get warnings and then a panic). :-/
I've just updated that branch with a new version which I hope covers
kernel<->kernel transitions too. If you get a chance, would you mind
giving that a spin?
The HEAD commit should be:
a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
Otherwise, I intend to clean that up and post it tomorrow (without the
additional debug hacks). I've thrown my local Syzkaller instance at it
in the mean time (and if I get the chance tomrrow I'll try to get
rcutorture setup), and the only report I'm seeing so far looks genuine:
| BUG: sleeping function called from invalid context in sta_info_move_state
... as that was reported on x86 too, per:
https://syzkaller.appspot.com/bug?id=6c7899acf008be2ddcddb46a2567c2153193632a
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-23 19:32 ` Mark Rutland
@ 2020-11-24 14:03 ` Marco Elver
2020-11-24 15:01 ` Paul E. McKenney
2020-11-24 19:30 ` Mark Rutland
0 siblings, 2 replies; 24+ messages in thread
From: Marco Elver @ 2020-11-24 14:03 UTC (permalink / raw)
To: Mark Rutland
Cc: linux-arm-kernel, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, boqun.feng, Lai Jiangshan,
Linux Kernel Mailing List, Steven Rostedt, rcu, Linux-MM,
Alexander Potapenko, kasan-dev, Tejun Heo, Andrew Morton,
Will Deacon, tglx, Dmitry Vyukov
[-- Attachment #1: Type: text/plain, Size: 2521 bytes --]
On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > looking at that and I think he is close to having something workable.
> > > >
> > > > Mark -- is there anything Marco and Paul can try out?
> > >
> > > I initially traced some issues back to commit:
> > >
> > > 044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > >
> > > ... and that change of semantic could cause us to miss edges in some
> > > cases, but IIUC mostly where we haven't done the right thing in
> > > exception entry/return.
> > >
> > > I don't think my patches address this case yet, but my WIP (currently
> > > just fixing user<->kernel transitions) is at:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > >
> > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > not immediately clear to me if either of those cases could cause this
> > > report.
> >
> > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > problem (still get warnings and then a panic). :-/
>
> I've just updated that branch with a new version which I hope covers
> kernel<->kernel transitions too. If you get a chance, would you mind
> giving that a spin?
>
> The HEAD commit should be:
>
> a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
Thank you! Your series appears to work and fixes the stalls and
deadlocks (3 trials)! I noticed there are a bunch of warnings in the log
that might be relevant (see attached).
Note, I also reverted
sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
and that still works.
Thanks,
-- Marco
> Otherwise, I intend to clean that up and post it tomorrow (without the
> additional debug hacks). I've thrown my local Syzkaller instance at it
> in the mean time (and if I get the chance tomrrow I'll try to get
> rcutorture setup), and the only report I'm seeing so far looks genuine:
>
> | BUG: sleeping function called from invalid context in sta_info_move_state
>
> ... as that was reported on x86 too, per:
>
> https://syzkaller.appspot.com/bug?id=6c7899acf008be2ddcddb46a2567c2153193632a
>
> Thanks,
> Mark.
[-- Attachment #2: vm.log --]
[-- Type: text/plain, Size: 29420 bytes --]
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x411fd070]
[ 0.000000] Linux version 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 (elver@elver.muc.corp.google.com) (aarch64-none-linux-gnu-gcc (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 9.2.1 20191025, GNU ld (GNU Toolchain for the A-profile Architecture 9.2-2019.12 (arm-9.10)) 2.33.1.20191209) #25 SMP PREEMPT Tue Nov 24 13:47:30 CET 2020
[ 0.000000] Machine model: linux,dummy-virt
[ 0.000000] efi: UEFI not found.
[ 0.000000] cma: Reserved 32 MiB at 0x00000000be000000
[ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[ 0.000000] printk: bootconsole [pl11] enabled
[ 0.000000] NUMA: No NUMA configuration found
[ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0xbdbf6000-0xbdbf7fff]
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000040000000-0x000000007fffffff]
[ 0.000000] DMA32 [mem 0x0000000080000000-0x00000000bfffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff]
[ 0.000000] On node 0 totalpages: 524288
[ 0.000000] DMA zone: 4096 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 262144 pages, LIFO batch:63
[ 0.000000] DMA32 zone: 4096 pages used for memmap
[ 0.000000] DMA32 zone: 262144 pages, LIFO batch:63
[ 0.000000] psci: probing for conduit method from DT.
[ 0.000000] psci: PSCIv0.2 detected in firmware.
[ 0.000000] psci: Using standard PSCI v0.2 function IDs
[ 0.000000] psci: Trusted OS migration not required
[ 0.000000] percpu: Embedded 49 pages/cpu s162704 r8192 d29808 u200704
[ 0.000000] pcpu-alloc: s162704 r8192 d29808 u200704 alloc=49*4096
[ 0.000000] pcpu-alloc: [0] 0
[ 0.000000] Detected PIPT I-cache on CPU0
[ 0.000000] CPU features: detected: ARM erratum 832075
[ 0.000000] CPU features: detected: ARM erratum 834220
[ 0.000000] CPU features: detected: EL2 vector hardening
[ 0.000000] CPU features: kernel page table isolation forced ON by KASLR
[ 0.000000] CPU features: detected: Kernel page table isolation (KPTI)
[ 0.000000] CPU features: detected: Spectre-v2
[ 0.000000] CPU features: detected: Spectre-v4
[ 0.000000] CPU features: detected: ARM errata 1165522, 1319367, or 1530923
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 516096
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: console=ttyAMA0 root=/dev/sda debug earlycon earlyprintk=serial slub_debug=UZ slub_debug=- workqueue.watchdog_thresh=10
[ 0.000000] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[ 0.000000] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear)
[ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[ 0.000000] software IO TLB: mapped [mem 0x000000007bfff000-0x000000007ffff000] (64MB)
[ 0.000000] Memory: 1903696K/2097152K available (20800K kernel code, 4024K rwdata, 8508K rodata, 8896K init, 11238K bss, 160688K reserved, 32768K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] ftrace: allocating 56178 entries in 220 pages
[ 0.000000] ftrace: allocated 220 pages with 5 groups
[ 0.000000] Running RCU self tests
[ 0.000000] rcu: Preemptible hierarchical RCU implementation.
[ 0.000000] rcu: RCU event tracing is enabled.
[ 0.000000] rcu: RCU lockdep checking is enabled.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=1.
[ 0.000000] Trampoline variant of Tasks RCU enabled.
[ 0.000000] Rude variant of Tasks RCU enabled.
[ 0.000000] Tracing variant of Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[ 0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143]
[ 0.000000] random: get_random_bytes called from start_kernel+0x468/0x670 with crng_init=0
[ 0.000000] arch_timer: cp15 timer(s) running at 62.50MHz (virt).
[ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
[ 0.000291] sched_clock: 56 bits at 62MHz, resolution 16ns, wraps every 4398046511096ns
[ 0.014280] Console: colour dummy device 80x25
[ 0.015796] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
[ 0.016131] ... MAX_LOCKDEP_SUBCLASSES: 8
[ 0.016378] ... MAX_LOCK_DEPTH: 48
[ 0.016626] ... MAX_LOCKDEP_KEYS: 8192
[ 0.016887] ... CLASSHASH_SIZE: 4096
[ 0.017138] ... MAX_LOCKDEP_ENTRIES: 32768
[ 0.017385] ... MAX_LOCKDEP_CHAINS: 65536
[ 0.017633] ... CHAINHASH_SIZE: 32768
[ 0.017884] memory used by lock dependency info: 6365 kB
[ 0.018144] memory used for stack traces: 4224 kB
[ 0.018400] per task-struct memory footprint: 1920 bytes
[ 0.021247] Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=250000)
[ 0.021914] pid_max: default: 32768 minimum: 301
[ 0.025132] LSM: Security Framework initializing
[ 0.027343] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[ 0.028580] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[ 0.115959] rcu: Hierarchical SRCU implementation.
[ 0.135096] EFI services will not be available.
[ 0.139828] smp: Bringing up secondary CPUs ...
[ 0.140267] smp: Brought up 1 node, 1 CPU
[ 0.140587] SMP: Total of 1 processors activated.
[ 0.141037] CPU features: detected: 32-bit EL0 Support
[ 0.141769] CPU features: detected: CRC32 instructions
[ 0.142144] CPU features: detected: 32-bit EL1 Support
[ 0.547379] CPU: All CPU(s) started at EL1
[ 0.548418] alternatives: patching kernel code
[ 0.606936] devtmpfs: initialized
[ 0.679006] KASLR enabled
[ 0.707823] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.708867] futex hash table entries: 256 (order: 3, 32768 bytes, linear)
[ 0.710674] Running postponed tracer tests:
[ 0.714449] Testing tracer function: PASSED
[ 8.629279] Testing dynamic ftrace: PASSED
[ 9.591794] Testing dynamic ftrace ops #1:
[ 12.266300] (1 0 1 0 0)
[ 12.266797] (1 1 2 0 0)
[ 20.579408] (2 1 3 0 1141111)
[ 20.582238] (2 2 4 0 1141539) PASSED
[ 24.613636] Testing dynamic ftrace ops #2:
[ 34.951883] (1 0 1 1116153 0)
[ 34.954384] (1 1 2 1116600 0)
[ 35.023400] (2 1 3 1 3074)
[ 35.024655] (2 2 4 200 3273) PASSED
[ 38.154182] Testing ftrace recursion: PASSED
[ 38.921470] Testing ftrace recursion safe: PASSED
[ 39.687688] Testing ftrace regs: PASSED
[ 40.462198] Testing tracer nop: PASSED
[ 40.466389] Testing tracer irqsoff: PASSED
[ 48.407206] Testing tracer preemptoff: PASSED
[ 56.413646] Testing tracer preemptirqsoff: PASSED
[ 64.506770] Testing tracer wakeup: PASSED
[ 72.330196] Testing tracer wakeup_rt: PASSED
[ 80.131803] Testing tracer wakeup_dl: PASSED
[ 87.972631] Testing tracer function_graph:
[ 91.178997]
[ 91.184432] =============================
[ 91.188301] WARNING: suspicious RCU usage
[ 91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G W
[ 91.197536] -----------------------------
[ 91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
[ 91.206546]
[ 91.206546] other info that might help us debug this:
[ 91.206546]
[ 91.211790]
[ 91.211790] rcu_scheduler_active = 2, debug_locks = 0
[ 91.216454] RCU used illegally from extended quiescent state!
[ 91.220890] no locks held by swapper/0/0.
[ 91.224712]
[ 91.224712] stack backtrace:
[ 91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[ 91.234877] Hardware name: linux,dummy-virt (DT)
[ 91.239032] Call trace:
[ 91.242587] dump_backtrace+0x0/0x240
[ 91.246500] show_stack+0x34/0x88
[ 91.250295] dump_stack+0x140/0x1bc
[ 91.254159] lockdep_rcu_suspicious+0xe4/0xf8
[ 91.258332] trace_hardirqs_off+0x214/0x330
[ 91.262462] trace_graph_return+0x1ac/0x1d8
[ 91.266564] ftrace_return_to_handler+0xa4/0x170
[ 91.270809] return_to_handler+0x1c/0x38
[ 91.274826] default_idle_call+0x94/0x38c
[ 91.278869] do_idle+0x240/0x290
[ 91.282633] rest_init+0x1e8/0x2dc
[ 91.286529] arch_call_rest_init+0x1c/0x28
[ 91.290585] start_kernel+0x638/0x670
[ 91.295524] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5279 check_flags.part.0+0x1d4/0x1f8
[ 91.296302] Modules linked in:
[ 91.297644] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[ 91.298317] Hardware name: linux,dummy-virt (DT)
[ 91.298975] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
[ 91.299648] pc : check_flags.part.0+0x1d4/0x1f8
[ 91.300303] lr : check_flags.part.0+0x1d4/0x1f8
[ 91.300960] sp : ffffdb60f8d73a50
[ 91.301628] x29: ffffdb60f8d73a50 x28: ffffdb60f8d84000
[ 91.303527] x27: ffffdb60f869cbb8 x26: ffffdb60f6835930
[ 91.305431] x25: 0000000000000000 x24: 0000000000000000
[ 91.307343] x23: ffffdb60f8daf360 x22: 0000000000000001
[ 91.309242] x21: 0000000000000000 x20: 0000000000000001
[ 91.311145] x19: ffffdb60f9bcf000 x18: 00000000749f6e65
[ 91.313044] x17: 00000000dcd3f761 x16: 0000000000000005
[ 91.314954] x15: 0000000000000000 x14: 0000000000000028
[ 91.316854] x13: 000000000000067a x12: 0000000000000028
[ 91.318753] x11: 0101010101010101 x10: ffffdb60f8d73820
[ 91.320657] x9 : ffffdb60f6960ff8 x8 : 4e5241575f534b43
[ 91.322582] x7 : 4f4c5f4755424544 x6 : ffff4454fdbd3667
[ 91.324486] x5 : 00000000ffffffc8 x4 : ffff4454fdbd2c60
[ 91.326413] x3 : ffffdb60f6800000 x2 : ffffdb60f7c60000
[ 91.328308] x1 : 1c0af7741e0f0c00 x0 : 0000000000000000
[ 91.330227] Call trace:
[ 91.330880] check_flags.part.0+0x1d4/0x1f8
[ 91.331547] lock_acquire+0x208/0x508
[ 91.332200] _raw_spin_lock+0x5c/0x80
[ 91.332849] vprintk_emit+0xb4/0x380
[ 91.333528] vprintk_default+0x4c/0x60
[ 91.334189] vprintk_func+0x120/0x330
[ 91.334863] printk+0x78/0x9c
[ 91.335523] lockdep_rcu_suspicious+0x2c/0xf8
[ 91.336195] trace_hardirqs_off+0x214/0x330
[ 91.336854] trace_graph_return+0x1ac/0x1d8
[ 91.337518] ftrace_return_to_handler+0xa4/0x170
[ 91.338190] return_to_handler+0x1c/0x38
[ 91.338841] default_idle_call+0x94/0x38c
[ 91.339512] do_idle+0x240/0x290
[ 91.340166] rest_init+0x1e8/0x2dc
[ 91.340840] arch_call_rest_init+0x1c/0x28
[ 91.341499] start_kernel+0x638/0x670
[ 91.342147] irq event stamp: 1727
[ 91.342832] hardirqs last enabled at (1727): [<ffffdb60f7c33094>] exit_el1_irq_or_nmi+0x24/0x50
[ 91.343502] hardirqs last disabled at (1724): [<ffffdb60f7c33060>] enter_el1_irq_or_nmi+0x20/0x30
[ 91.344193] softirqs last enabled at (1726): [<ffffdb60f6835930>] return_to_handler+0x0/0x38
[ 91.344866] softirqs last disabled at (1725): [<ffffdb60f68c6880>] irq_enter_rcu+0x88/0xa8
[ 91.345546] ---[ end trace e131d25144579308 ]---
[ 91.346413]
[ 91.347055]
[ 91.347707] =============================
[ 91.348363] WARNING: suspicious RCU usage
[ 91.349058] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
[ 91.349714] -----------------------------
[ 91.350398] include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage!
[ 91.351025]
[ 91.351702] other info that might help us debug this:
[ 91.352324]
[ 91.352949]
[ 91.353626] rcu_scheduler_active = 2, debug_locks = 1
[ 91.354291] RCU used illegally from extended quiescent state!
[ 91.354965] no locks held by swapper/0/0.
[ 91.355590]
[ 91.356257] stack backtrace:
[ 91.356937] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[ 91.357611] Hardware name: linux,dummy-virt (DT)
[ 91.358270] Call trace:
[ 91.358925] dump_backtrace+0x0/0x240
[ 91.359597] show_stack+0x34/0x88
[ 91.360255] dump_stack+0x140/0x1bc
[ 91.360932] lockdep_rcu_suspicious+0xe4/0xf8
[ 91.361594] smp_cross_call+0x220/0x228
[ 91.362264] arch_irq_work_raise+0x40/0x50
[ 91.362925] __irq_work_queue_local+0xb0/0xe8
[ 91.363580] irq_work_queue+0x5c/0xb8
[ 91.364252] printk_safe_log_store+0x1a0/0x1d8
[ 91.364906] vprintk_func+0x23c/0x330
[ 91.365566] printk+0x78/0x9c
[ 91.366215] lockdep_rcu_suspicious+0x2c/0xf8
[ 91.366883] lock_acquire+0x340/0x508
[ 91.367538] _raw_spin_lock+0x5c/0x80
[ 91.368192] vprintk_emit+0xb4/0x380
[ 91.368866] vprintk_default+0x4c/0x60
[ 91.369521] vprintk_func+0x120/0x330
[ 91.370191] printk+0x78/0x9c
[ 91.370845] lockdep_rcu_suspicious+0x2c/0xf8
[ 91.371519] trace_hardirqs_off+0x214/0x330
[ 91.372172] trace_graph_return+0x1ac/0x1d8
[ 91.372842] ftrace_return_to_handler+0xa4/0x170
[ 91.373503] return_to_handler+0x1c/0x38
[ 91.374158] default_idle_call+0x94/0x38c
[ 91.374825] do_idle+0x240/0x290
[ 91.375479] rest_init+0x1e8/0x2dc
[ 91.376152] arch_call_rest_init+0x1c/0x28
[ 91.376804] start_kernel+0x638/0x670
[ 91.377475] =============================
[ 91.378132] WARNING: suspicious RCU usage
[ 91.378793] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
[ 91.379466] -----------------------------
[ 91.380129] include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
[ 91.380769]
[ 91.381432] other info that might help us debug this:
[ 91.382066]
[ 91.382690]
[ 91.383349] rcu_scheduler_active = 2, debug_locks = 1
[ 91.384023] RCU used illegally from extended quiescent state!
[ 91.384675] no locks held by swapper/0/0.
[ 91.385316]
[ 91.385962] stack backtrace:
[ 91.386647] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
[ 91.387307] Hardware name: linux,dummy-virt (DT)
[ 91.387956] Call trace:
[ 91.388626] dump_backtrace+0x0/0x240
[ 91.389283] show_stack+0x34/0x88
[ 91.389952] dump_stack+0x140/0x1bc
[ 91.390602] lockdep_rcu_suspicious+0xe4/0xf8
[ 91.391270] lock_acquire+0x340/0x508
[ 91.391926] _raw_spin_lock+0x5c/0x80
[ 91.392580] vprintk_emit+0xb4/0x380
[ 91.393259] vprintk_default+0x4c/0x60
[ 91.393913] vprintk_func+0x120/0x330
[ 91.394581] printk+0x78/0x9c
[ 91.395230] lockdep_rcu_suspicious+0x2c/0xf8
[ 91.395899] trace_hardirqs_off+0x214/0x330
[ 91.396555] trace_graph_return+0x1ac/0x1d8
[ 91.397243] ftrace_return_to_handler+0xa4/0x170
[ 91.397901] return_to_handler+0x1c/0x38
[ 91.398556] default_idle_call+0x94/0x38c
[ 91.399231] do_idle+0x240/0x290
[ 91.399883] rest_init+0x1e8/0x2dc
[ 91.400562] arch_call_rest_init+0x1c/0x28
[ 91.401227] start_kernel+0x638/0x670
[ 91.401909] ------------[ cut here ]------------
[ 91.402584] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())possible reason: unannotated irqs-off.
[ 91.403248] irq event stamp: 1727
[ 91.403939] hardirqs last enabled at (1727): [<ffffdb60f7c33094>] exit_el1_irq_or_nmi+0x24/0x50
[ 91.404614] hardirqs last disabled at (1724): [<ffffdb60f7c33060>] enter_el1_irq_or_nmi+0x20/0x30
[ 91.405305] softirqs last enabled at (1726): [<ffffdb60f6835930>] return_to_handler+0x0/0x38
[ 91.405976] softirqs last disabled at (1725): [<ffffdb60f68c6880>] irq_enter_rcu+0x88/0xa8
[ 94.124678] PASSED
[ 94.137271] pinctrl core: initialized pinctrl subsystem
[ 94.166010] DMI not present or invalid.
[ 94.181220] NET: Registered protocol family 16
[ 94.221500] DMA: preallocated 256 KiB GFP_KERNEL pool for atomic allocations
[ 94.222319] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[ 94.224040] DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[ 94.225379] audit: initializing netlink subsys (disabled)
[ 94.232973] audit: type=2000 audit(82.136:1): state=initialized audit_enabled=0 res=1
[ 94.251959] thermal_sys: Registered thermal governor 'step_wise'
[ 94.252110] thermal_sys: Registered thermal governor 'power_allocator'
[ 94.254016] cpuidle: using governor menu
[ 94.259177] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[ 94.260165] ASID allocator initialised with 32768 entries
[ 94.280847] Serial: AMBA PL011 UART driver
[ 94.897305] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 47, base_baud = 0) is a PL011 rev1
[ 94.899357] printk: console [ttyAMA0] enabled
[ 94.899357] printk: console [ttyAMA0] enabled
[ 94.899994] printk: bootconsole [pl11] disabled
[ 94.899994] printk: bootconsole [pl11] disabled
[ 95.136385] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 95.136940] HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
[ 95.137256] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 95.137555] HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
[ 95.172131] cryptd: max_cpu_qlen set to 1000
[ 95.238749] ACPI: Interpreter disabled.
[ 95.280884] iommu: Default domain type: Translated
[ 95.286381] vgaarb: loaded
[ 95.292824] SCSI subsystem initialized
[ 95.295892] libata version 3.00 loaded.
[ 95.301200] usbcore: registered new interface driver usbfs
[ 95.302249] usbcore: registered new interface driver hub
[ 95.303362] usbcore: registered new device driver usb
[ 95.319691] pps_core: LinuxPPS API ver. 1 registered
[ 95.319974] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[ 95.320522] PTP clock support registered
[ 95.324309] EDAC MC: Ver: 3.0.0
[ 95.356513] FPGA manager framework
[ 95.358764] Advanced Linux Sound Architecture Driver Initialized.
[ 95.379906] clocksource: Switched to clocksource arch_sys_counter
[ 109.397497] VFS: Disk quotas dquot_6.6.0
[ 109.398763] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 109.404469] pnp: PnP ACPI: disabled
[ 109.525997] NET: Registered protocol family 2
[ 109.539351] tcp_listen_portaddr_hash hash table entries: 1024 (order: 4, 81920 bytes, linear)
[ 109.541579] TCP established hash table entries: 16384 (order: 5, 131072 bytes, linear)
[ 109.546171] TCP bind hash table entries: 16384 (order: 8, 1179648 bytes, linear)
[ 109.550508] TCP: Hash tables configured (established 16384 bind 16384)
[ 109.553087] UDP hash table entries: 1024 (order: 5, 163840 bytes, linear)
[ 109.554809] UDP-Lite hash table entries: 1024 (order: 5, 163840 bytes, linear)
[ 109.558939] NET: Registered protocol family 1
[ 109.569986] RPC: Registered named UNIX socket transport module.
[ 109.570338] RPC: Registered udp transport module.
[ 109.570584] RPC: Registered tcp transport module.
[ 109.570826] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 109.571261] PCI: CLS 0 bytes, default 64
[ 109.591131] hw perfevents: enabled with armv8_pmuv3 PMU driver, 5 counters available
[ 109.592826] kvm [1]: HYP mode not available
[ 109.700178] Initialise system trusted keyrings
[ 109.703933] workingset: timestamp_bits=44 max_order=19 bucket_order=0
[ 109.829691] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 109.839361] NFS: Registering the id_resolver key type
[ 109.840697] Key type id_resolver registered
[ 109.841002] Key type id_legacy registered
[ 109.842732] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[ 109.846885] 9p: Installing v9fs 9p2000 file system support
[ 109.923870] Key type asymmetric registered
[ 109.924235] Asymmetric key parser 'x509' registered
[ 109.925108] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
[ 109.925557] io scheduler mq-deadline registered
[ 109.925920] io scheduler kyber registered
[ 110.181262] pl061_gpio 9030000.pl061: PL061 GPIO chip registered
[ 110.220260] pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
[ 110.221906] pci-host-generic 4010000000.pcie: IO 0x003eff0000..0x003effffff -> 0x0000000000
[ 110.223253] pci-host-generic 4010000000.pcie: MEM 0x0010000000..0x003efeffff -> 0x0010000000
[ 110.224080] pci-host-generic 4010000000.pcie: MEM 0x8000000000..0xffffffffff -> 0x8000000000
[ 110.226250] pci-host-generic 4010000000.pcie: ECAM at [mem 0x4010000000-0x401fffffff] for [bus 00-ff]
[ 110.229496] pci-host-generic 4010000000.pcie: PCI host bridge to bus 0000:00
[ 110.230080] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 110.230473] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 110.230784] pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
[ 110.231097] pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
[ 110.233438] pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
[ 110.243007] pci 0000:00:01.0: [1af4:1009] type 00 class 0x000200
[ 110.244685] pci 0000:00:01.0: reg 0x10: [io 0x0000-0x003f]
[ 110.246053] pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x00000fff]
[ 110.250653] pci 0000:00:01.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[ 110.260991] pci 0000:00:02.0: [1af4:1009] type 00 class 0x000200
[ 110.262072] pci 0000:00:02.0: reg 0x10: [io 0x0000-0x003f]
[ 110.262820] pci 0000:00:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
[ 110.264321] pci 0000:00:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[ 110.269991] pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
[ 110.270750] pci 0000:00:03.0: reg 0x10: [io 0x0000-0x001f]
[ 110.271274] pci 0000:00:03.0: reg 0x14: [mem 0x00000000-0x00000fff]
[ 110.272464] pci 0000:00:03.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[ 110.273043] pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[ 110.278520] pci 0000:00:04.0: [1af4:1004] type 00 class 0x010000
[ 110.279261] pci 0000:00:04.0: reg 0x10: [io 0x0000-0x003f]
[ 110.280022] pci 0000:00:04.0: reg 0x14: [mem 0x00000000-0x00000fff]
[ 110.280997] pci 0000:00:04.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
[ 110.293126] pci 0000:00:03.0: BAR 6: assigned [mem 0x10000000-0x1003ffff pref]
[ 110.293948] pci 0000:00:01.0: BAR 4: assigned [mem 0x8000000000-0x8000003fff 64bit pref]
[ 110.294793] pci 0000:00:02.0: BAR 4: assigned [mem 0x8000004000-0x8000007fff 64bit pref]
[ 110.295597] pci 0000:00:03.0: BAR 4: assigned [mem 0x8000008000-0x800000bfff 64bit pref]
[ 110.296522] pci 0000:00:04.0: BAR 4: assigned [mem 0x800000c000-0x800000ffff 64bit pref]
[ 110.297205] pci 0000:00:01.0: BAR 1: assigned [mem 0x10040000-0x10040fff]
[ 110.297717] pci 0000:00:02.0: BAR 1: assigned [mem 0x10041000-0x10041fff]
[ 110.298201] pci 0000:00:03.0: BAR 1: assigned [mem 0x10042000-0x10042fff]
[ 110.298701] pci 0000:00:04.0: BAR 1: assigned [mem 0x10043000-0x10043fff]
[ 110.299193] pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x103f]
[ 110.299921] pci 0000:00:02.0: BAR 0: assigned [io 0x1040-0x107f]
[ 110.300409] pci 0000:00:04.0: BAR 0: assigned [io 0x1080-0x10bf]
[ 110.300883] pci 0000:00:03.0: BAR 0: assigned [io 0x10c0-0x10df]
[ 110.359603] EINJ: ACPI disabled.
[ 110.756520] virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
[ 110.766562] virtio-pci 0000:00:02.0: enabling device (0000 -> 0003)
[ 110.774880] virtio-pci 0000:00:03.0: enabling device (0000 -> 0003)
[ 110.781745] virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
[ 110.894006] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 110.949875] SuperH (H)SCI(F) driver initialized
[ 110.962431] msm_serial: driver initialized
[ 110.995190] cacheinfo: Unable to detect cache hierarchy for CPU 0
[ 111.135917] loop: module loaded
[ 111.155979] megasas: 07.714.04.00-rc1
[ 111.179577] scsi host0: Virtio SCSI HBA
[ 111.201653] scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
[ 112.559093] random: fast init done
[ 112.595812] sd 0:0:0:0: Power-on or device reset occurred
[ 112.611860] sd 0:0:0:0: [sda] 524288 512-byte logical blocks: (268 MB/256 MiB)
[ 112.613211] sd 0:0:0:0: [sda] Write Protect is off
[ 112.613728] sd 0:0:0:0: [sda] Mode Sense: 63 00 00 08
[ 112.620764] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 112.623873] sda: detected capacity change from 0 to 268435456
[ 112.697968] physmap-flash 0.flash: physmap platform flash device: [mem 0x00000000-0x03ffffff]
[ 112.704568] 0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
[ 112.705537] Intel/Sharp Extended Query Table at 0x0031
[ 112.706704] Using buffer write method
[ 112.707437] erase region 0: offset=0x0,size=0x40000,blocks=256
[ 112.714453] physmap-flash 0.flash: physmap platform flash device: [mem 0x04000000-0x07ffffff]
[ 112.715449] 0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
[ 112.716117] Intel/Sharp Extended Query Table at 0x0031
[ 112.717014] Using buffer write method
[ 112.717344] erase region 0: offset=0x0,size=0x40000,blocks=256
[ 112.717696] Concatenating MTD devices:
[ 112.717962] (0): "0.flash"
[ 112.718183] (1): "0.flash"
[ 112.718396] into device "0.flash"
[ 112.733693] sda: detected capacity change from 0 to 268435456
[ 112.734221] sd 0:0:0:0: [sda] Attached SCSI disk
[ 112.868897] libphy: Fixed MDIO Bus: probed
[ 112.896853] tun: Universal TUN/TAP device driver, 1.6
[ 112.932681] thunder_xcv, ver 1.0
[ 112.933749] thunder_bgx, ver 1.0
[ 112.934733] nicpf, ver 1.0
[ 112.968658] hclge is initializing
[ 112.969507] hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
[ 112.969832] hns3: Copyright (c) 2017 Huawei Corporation.
[ 112.971113] e1000: Intel(R) PRO/1000 Network Driver
[ 112.971392] e1000: Copyright (c) 1999-2006 Intel Corporation.
[ 112.972667] e1000e: Intel(R) PRO/1000 Network Driver
[ 112.972954] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[ 112.974131] igb: Intel(R) Gigabit Ethernet Network Driver
[ 112.974412] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 112.975297] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[ 112.975775] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[ 112.985686] sky2: driver version 1.30
[ 113.009304] VFIO - User Level meta-driver version: 0.3
[ 113.048551] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[ 113.048946] ehci-pci: EHCI PCI platform driver
[ 113.049956] ehci-platform: EHCI generic platform driver
[ 113.053241] ehci-orion: EHCI orion driver
[ 113.056368] ehci-exynos: EHCI Exynos driver
[ 113.059235] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[ 113.060113] ohci-pci: OHCI PCI platform driver
[ 113.061205] ohci-platform: OHCI generic platform driver
[ 113.064420] ohci-exynos: OHCI Exynos driver
[ 113.075583] usbcore: registered new interface driver usb-storage
[ 113.146880] rtc-pl031 9010000.pl031: registered as rtc0
[ 113.148308] rtc-pl031 9010000.pl031: setting system clock to 2020-11-24T13:08:31 UTC (1606223311)
[ 113.164111] i2c /dev entries driver
[ 113.325456] sdhci: Secure Digital Host Controller Interface driver
[ 113.325754] sdhci: Copyright(c) Pierre Ossman
[ 113.336779] Synopsys Designware Multimedia Card Interface Driver
[ 113.363215] sdhci-pltfm: SDHCI platform and OF driver helper
[ 113.400526] ledtrig-cpu: registered to indicate activity on CPUs
[ 113.441824] usbcore: registered new interface driver usbhid
[ 113.442164] usbhid: USB HID core driver
[ 113.560589] drop_monitor: Initializing network drop monitor service
[ 113.562809] NET: Registered protocol family 17
[ 113.566122] 9pnet: Installing 9P2000 support
[ 113.577170] Key type dns_resolver registered
[ 113.581865] registered taskstats version 1
[ 113.582937] Running tests on all trace events:
[ 113.583197] Testing all events: OK
[ 136.678315] Running tests again, along with the function tracer
[ 136.683603] Running tests on all trace events:
[ 136.689620] Testing all events: OK
[ 349.011031] Testing ftrace filter: OK
[ 349.030607] Loading compiled-in X.509 certificates
[ 349.062241] input: gpio-keys as /devices/platform/gpio-keys/input/input0
[ 349.080633] ALSA device list:
[ 349.081036] No soundcards found.
[ 349.081340] TAP version 14
[ 349.081565] 1..0
[ 349.088742] uart-pl011 9000000.pl011: no DMA platform data
[ 349.105332] EXT4-fs (sda): mounting ext3 file system using the ext4 subsystem
[ 349.195485] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
[ 349.197107] VFS: Mounted root (ext3 filesystem) readonly on device 8:0.
[ 349.201501] devtmpfs: mounted
[ 349.369972] Freeing unused kernel memory: 8896K
[ 349.371955] Run /sbin/init as init process
[ 349.372238] with arguments:
[ 349.372474] /sbin/init
[ 349.372689] with environment:
[ 349.372927] HOME=/
[ 349.373135] TERM=linux
[ 349.373345] earlyprintk=serial
[ 349.838705] EXT4-fs (sda): re-mounted. Opts: (null)
[ 349.840571] ext3 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
[-- Attachment #3: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-24 14:03 ` Marco Elver
@ 2020-11-24 15:01 ` Paul E. McKenney
2020-11-24 19:43 ` Mark Rutland
2020-11-24 19:30 ` Mark Rutland
1 sibling, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2020-11-24 15:01 UTC (permalink / raw)
To: Marco Elver
Cc: Mark Rutland, Anders Roxell, Jann Horn, Peter Zijlstra,
boqun.feng, Lai Jiangshan, Linux Kernel Mailing List,
Steven Rostedt, rcu, Linux-MM, Alexander Potapenko, kasan-dev,
Tejun Heo, Andrew Morton, Will Deacon, tglx, Dmitry Vyukov,
linux-arm-kernel
On Tue, Nov 24, 2020 at 03:03:10PM +0100, Marco Elver wrote:
> On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> > On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > > looking at that and I think he is close to having something workable.
> > > > >
> > > > > Mark -- is there anything Marco and Paul can try out?
> > > >
> > > > I initially traced some issues back to commit:
> > > >
> > > > 044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > > >
> > > > ... and that change of semantic could cause us to miss edges in some
> > > > cases, but IIUC mostly where we haven't done the right thing in
> > > > exception entry/return.
> > > >
> > > > I don't think my patches address this case yet, but my WIP (currently
> > > > just fixing user<->kernel transitions) is at:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > > >
> > > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > > not immediately clear to me if either of those cases could cause this
> > > > report.
> > >
> > > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > > problem (still get warnings and then a panic). :-/
> >
> > I've just updated that branch with a new version which I hope covers
> > kernel<->kernel transitions too. If you get a chance, would you mind
> > giving that a spin?
> >
> > The HEAD commit should be:
> >
> > a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
>
> Thank you! Your series appears to work and fixes the stalls and
> deadlocks (3 trials)! I noticed there are a bunch of warnings in the log
> that might be relevant (see attached).
>
> Note, I also reverted
>
> sched/core: Allow try_invoke_on_locked_down_task() with irqs disabled
>
> and that still works.
This is expected behavior given that there were no RCU CPU stall
warnings. As to the warnings...
[ . . . ]
> [ 91.184432] =============================
> [ 91.188301] WARNING: suspicious RCU usage
> [ 91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G W
> [ 91.197536] -----------------------------
> [ 91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> [ 91.206546]
> [ 91.206546] other info that might help us debug this:
> [ 91.206546]
> [ 91.211790]
> [ 91.211790] rcu_scheduler_active = 2, debug_locks = 0
> [ 91.216454] RCU used illegally from extended quiescent state!
> [ 91.220890] no locks held by swapper/0/0.
> [ 91.224712]
> [ 91.224712] stack backtrace:
> [ 91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> [ 91.234877] Hardware name: linux,dummy-virt (DT)
> [ 91.239032] Call trace:
> [ 91.242587] dump_backtrace+0x0/0x240
> [ 91.246500] show_stack+0x34/0x88
> [ 91.250295] dump_stack+0x140/0x1bc
> [ 91.254159] lockdep_rcu_suspicious+0xe4/0xf8
> [ 91.258332] trace_hardirqs_off+0x214/0x330
> [ 91.262462] trace_graph_return+0x1ac/0x1d8
> [ 91.266564] ftrace_return_to_handler+0xa4/0x170
> [ 91.270809] return_to_handler+0x1c/0x38
> [ 91.274826] default_idle_call+0x94/0x38c
> [ 91.278869] do_idle+0x240/0x290
> [ 91.282633] rest_init+0x1e8/0x2dc
> [ 91.286529] arch_call_rest_init+0x1c/0x28
> [ 91.290585] start_kernel+0x638/0x670
> [ 91.295524] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5279 check_flags.part.0+0x1d4/0x1f8
> [ 91.296302] Modules linked in:
> [ 91.297644] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> [ 91.298317] Hardware name: linux,dummy-virt (DT)
> [ 91.298975] pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
> [ 91.299648] pc : check_flags.part.0+0x1d4/0x1f8
> [ 91.300303] lr : check_flags.part.0+0x1d4/0x1f8
> [ 91.300960] sp : ffffdb60f8d73a50
> [ 91.301628] x29: ffffdb60f8d73a50 x28: ffffdb60f8d84000
> [ 91.303527] x27: ffffdb60f869cbb8 x26: ffffdb60f6835930
> [ 91.305431] x25: 0000000000000000 x24: 0000000000000000
> [ 91.307343] x23: ffffdb60f8daf360 x22: 0000000000000001
> [ 91.309242] x21: 0000000000000000 x20: 0000000000000001
> [ 91.311145] x19: ffffdb60f9bcf000 x18: 00000000749f6e65
> [ 91.313044] x17: 00000000dcd3f761 x16: 0000000000000005
> [ 91.314954] x15: 0000000000000000 x14: 0000000000000028
> [ 91.316854] x13: 000000000000067a x12: 0000000000000028
> [ 91.318753] x11: 0101010101010101 x10: ffffdb60f8d73820
> [ 91.320657] x9 : ffffdb60f6960ff8 x8 : 4e5241575f534b43
> [ 91.322582] x7 : 4f4c5f4755424544 x6 : ffff4454fdbd3667
> [ 91.324486] x5 : 00000000ffffffc8 x4 : ffff4454fdbd2c60
> [ 91.326413] x3 : ffffdb60f6800000 x2 : ffffdb60f7c60000
> [ 91.328308] x1 : 1c0af7741e0f0c00 x0 : 0000000000000000
> [ 91.330227] Call trace:
> [ 91.330880] check_flags.part.0+0x1d4/0x1f8
> [ 91.331547] lock_acquire+0x208/0x508
> [ 91.332200] _raw_spin_lock+0x5c/0x80
> [ 91.332849] vprintk_emit+0xb4/0x380
> [ 91.333528] vprintk_default+0x4c/0x60
> [ 91.334189] vprintk_func+0x120/0x330
> [ 91.334863] printk+0x78/0x9c
> [ 91.335523] lockdep_rcu_suspicious+0x2c/0xf8
> [ 91.336195] trace_hardirqs_off+0x214/0x330
> [ 91.336854] trace_graph_return+0x1ac/0x1d8
> [ 91.337518] ftrace_return_to_handler+0xa4/0x170
> [ 91.338190] return_to_handler+0x1c/0x38
This looks like tracing in the idle loop in a place where RCU is not
watching. Historically, this has been addressed by using _rcuidle()
trace events, but the portion of the idle loop that RCU is watching has
recently increased. Last I checked, there were still a few holdouts (that
would splat like this) in x86, though perhaps those have since been fixed.
> [ 91.338841] default_idle_call+0x94/0x38c
> [ 91.339512] do_idle+0x240/0x290
> [ 91.340166] rest_init+0x1e8/0x2dc
> [ 91.340840] arch_call_rest_init+0x1c/0x28
> [ 91.341499] start_kernel+0x638/0x670
> [ 91.342147] irq event stamp: 1727
> [ 91.342832] hardirqs last enabled at (1727): [<ffffdb60f7c33094>] exit_el1_irq_or_nmi+0x24/0x50
> [ 91.343502] hardirqs last disabled at (1724): [<ffffdb60f7c33060>] enter_el1_irq_or_nmi+0x20/0x30
> [ 91.344193] softirqs last enabled at (1726): [<ffffdb60f6835930>] return_to_handler+0x0/0x38
> [ 91.344866] softirqs last disabled at (1725): [<ffffdb60f68c6880>] irq_enter_rcu+0x88/0xa8
> [ 91.345546] ---[ end trace e131d25144579308 ]---
The other warning looked similar.
Thanx, Paul
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-24 14:03 ` Marco Elver
2020-11-24 15:01 ` Paul E. McKenney
@ 2020-11-24 19:30 ` Mark Rutland
2020-11-25 9:45 ` Marco Elver
1 sibling, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2020-11-24 19:30 UTC (permalink / raw)
To: Marco Elver
Cc: linux-arm-kernel, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, boqun.feng, Lai Jiangshan,
Linux Kernel Mailing List, Steven Rostedt, rcu, Linux-MM,
Alexander Potapenko, kasan-dev, Tejun Heo, Andrew Morton,
Will Deacon, tglx, Dmitry Vyukov
On Tue, Nov 24, 2020 at 03:03:10PM +0100, Marco Elver wrote:
> On Mon, Nov 23, 2020 at 07:32PM +0000, Mark Rutland wrote:
> > On Fri, Nov 20, 2020 at 03:03:32PM +0100, Marco Elver wrote:
> > > On Fri, Nov 20, 2020 at 10:30AM +0000, Mark Rutland wrote:
> > > > On Thu, Nov 19, 2020 at 10:53:53PM +0000, Will Deacon wrote:
> > > > > FWIW, arm64 is known broken wrt lockdep and irq tracing atm. Mark has been
> > > > > looking at that and I think he is close to having something workable.
> > > > >
> > > > > Mark -- is there anything Marco and Paul can try out?
> > > >
> > > > I initially traced some issues back to commit:
> > > >
> > > > 044d0d6de9f50192 ("lockdep: Only trace IRQ edges")
> > > >
> > > > ... and that change of semantic could cause us to miss edges in some
> > > > cases, but IIUC mostly where we haven't done the right thing in
> > > > exception entry/return.
> > > >
> > > > I don't think my patches address this case yet, but my WIP (currently
> > > > just fixing user<->kernel transitions) is at:
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/irq-fixes
> > > >
> > > > I'm looking into the kernel<->kernel transitions now, and I know that we
> > > > mess up RCU management for a small window around arch_cpu_idle, but it's
> > > > not immediately clear to me if either of those cases could cause this
> > > > report.
> > >
> > > Thank you -- I tried your irq-fixes, however that didn't seem to fix the
> > > problem (still get warnings and then a panic). :-/
> >
> > I've just updated that branch with a new version which I hope covers
> > kernel<->kernel transitions too. If you get a chance, would you mind
> > giving that a spin?
> >
> > The HEAD commit should be:
> >
> > a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
>
> Thank you! Your series appears to work and fixes the stalls and
> deadlocks (3 trials)!
Thanks for testing! I'm glad that appears to work, as it suggests
there's not another massive problem lurking in this area.
While cleaning/splitting that up today, I spotted a couple of new
problems I introduced, and I'm part-way through sorting that out, but
it's not quite ready today after all. :/
Fingers crossed for tomorrow...
> I noticed there are a bunch of warnings in the log
> that might be relevant (see attached).
> [ 91.184432] =============================
> [ 91.188301] WARNING: suspicious RCU usage
> [ 91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G W
> [ 91.197536] -----------------------------
> [ 91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> [ 91.206546]
> [ 91.206546] other info that might help us debug this:
> [ 91.206546]
> [ 91.211790]
> [ 91.211790] rcu_scheduler_active = 2, debug_locks = 0
> [ 91.216454] RCU used illegally from extended quiescent state!
> [ 91.220890] no locks held by swapper/0/0.
> [ 91.224712]
> [ 91.224712] stack backtrace:
> [ 91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> [ 91.234877] Hardware name: linux,dummy-virt (DT)
> [ 91.239032] Call trace:
> [ 91.242587] dump_backtrace+0x0/0x240
> [ 91.246500] show_stack+0x34/0x88
> [ 91.250295] dump_stack+0x140/0x1bc
> [ 91.254159] lockdep_rcu_suspicious+0xe4/0xf8
> [ 91.258332] trace_hardirqs_off+0x214/0x330
> [ 91.262462] trace_graph_return+0x1ac/0x1d8
> [ 91.266564] ftrace_return_to_handler+0xa4/0x170
> [ 91.270809] return_to_handler+0x1c/0x38
> [ 91.274826] default_idle_call+0x94/0x38c
> [ 91.278869] do_idle+0x240/0x290
> [ 91.282633] rest_init+0x1e8/0x2dc
> [ 91.286529] arch_call_rest_init+0x1c/0x28
> [ 91.290585] start_kernel+0x638/0x670
Hmm... I suspect that arch_cpu_idle() is being traced here, and I reckon
we have to mark that and its callees as noinstr, since it doesn't seem
sane to have ftrace check whether RCU is watching for every function
call. Maybe Paul or Steve can correct me. ;)
If you still have the binary lying around, can you check whether
default_idle_call+0x94/0x38c is just after the call to arch_cpu_idle()?
If you could dump the asm around that, along with whatever faddr2line
tells you, that'd be a great help.
This looks like it should be reproducible, so I'll enable the boot-time
self tests in my kernel and check whether I see the above too.
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-24 15:01 ` Paul E. McKenney
@ 2020-11-24 19:43 ` Mark Rutland
2020-11-24 20:32 ` Steven Rostedt
0 siblings, 1 reply; 24+ messages in thread
From: Mark Rutland @ 2020-11-24 19:43 UTC (permalink / raw)
To: Paul E. McKenney
Cc: linux-arm-kernel, Marco Elver, Anders Roxell, Jann Horn,
Peter Zijlstra, boqun.feng, Lai Jiangshan,
Linux Kernel Mailing List, Steven Rostedt, rcu, Linux-MM,
Alexander Potapenko, kasan-dev, Tejun Heo, Andrew Morton,
Will Deacon, tglx, Dmitry Vyukov
On Tue, Nov 24, 2020 at 07:01:46AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 24, 2020 at 03:03:10PM +0100, Marco Elver wrote:
> > [ 91.184432] =============================
> > [ 91.188301] WARNING: suspicious RCU usage
> > [ 91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G W
> > [ 91.197536] -----------------------------
> > [ 91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> > [ 91.206546]
> > [ 91.206546] other info that might help us debug this:
> > [ 91.206546]
> > [ 91.211790]
> > [ 91.211790] rcu_scheduler_active = 2, debug_locks = 0
> > [ 91.216454] RCU used illegally from extended quiescent state!
> > [ 91.220890] no locks held by swapper/0/0.
> > [ 91.224712]
> > [ 91.224712] stack backtrace:
> > [ 91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> > [ 91.234877] Hardware name: linux,dummy-virt (DT)
> > [ 91.239032] Call trace:
> > [ 91.242587] dump_backtrace+0x0/0x240
> > [ 91.246500] show_stack+0x34/0x88
> > [ 91.250295] dump_stack+0x140/0x1bc
> > [ 91.254159] lockdep_rcu_suspicious+0xe4/0xf8
> > [ 91.258332] trace_hardirqs_off+0x214/0x330
> > [ 91.262462] trace_graph_return+0x1ac/0x1d8
> > [ 91.266564] ftrace_return_to_handler+0xa4/0x170
> > [ 91.270809] return_to_handler+0x1c/0x38
> > [ 91.274826] default_idle_call+0x94/0x38c
> > [ 91.278869] do_idle+0x240/0x290
> > [ 91.282633] rest_init+0x1e8/0x2dc
> > [ 91.286529] arch_call_rest_init+0x1c/0x28
> > [ 91.290585] start_kernel+0x638/0x670
> This looks like tracing in the idle loop in a place where RCU is not
> watching. Historically, this has been addressed by using _rcuidle()
> trace events, but the portion of the idle loop that RCU is watching has
> recently increased. Last I checked, there were still a few holdouts (that
> would splat like this) in x86, though perhaps those have since been fixed.
Yup! I think this is a latent issue my debug hacks revealed (in addition
to a couple of other issues in the idle path), and still affects x86 and
others. It's only noticeable if you hack trace_hardirqs_{on,off}() to
check rcu_is_watching(), which I had at the tip of my tree.
AFAICT, the issue is that arch_cpu_idle() can be dynamically traced with
ftrace, and hence the tracing code can unexpectedly run without RCU
watching. Since that's dynamic tracing, we can avoid it by marking
arch_cpu_idle() and friends as noinstr.
I'll see about getting this fixed before we upstream the debug hack.
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-24 19:43 ` Mark Rutland
@ 2020-11-24 20:32 ` Steven Rostedt
0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2020-11-24 20:32 UTC (permalink / raw)
To: Mark Rutland
Cc: Marco Elver, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, boqun.feng, Lai Jiangshan,
Linux Kernel Mailing List, kasan-dev, rcu, Linux-MM,
Alexander Potapenko, linux-arm-kernel, Tejun Heo, Andrew Morton,
Will Deacon, tglx, Dmitry Vyukov
On Tue, 24 Nov 2020 19:43:08 +0000
Mark Rutland <mark.rutland@arm.com> wrote:
> AFAICT, the issue is that arch_cpu_idle() can be dynamically traced with
> ftrace, and hence the tracing code can unexpectedly run without RCU
> watching. Since that's dynamic tracing, we can avoid it by marking
> arch_cpu_idle() and friends as noinstr.
Technically, ftrace doesn't care if RCU is watching or not, but the
callbacks might, and they need to do the rcu_is_watching() check if
they do.
Although, there's work to keep those areas from being traced, but to do
so, they really need to be minimal, where you don't ever want to trace
them.
-- Steve
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-24 19:30 ` Mark Rutland
@ 2020-11-25 9:45 ` Marco Elver
2020-11-25 10:28 ` Mark Rutland
0 siblings, 1 reply; 24+ messages in thread
From: Marco Elver @ 2020-11-25 9:45 UTC (permalink / raw)
To: Mark Rutland
Cc: linux-arm-kernel, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, boqun.feng, Lai Jiangshan,
Linux Kernel Mailing List, Steven Rostedt, rcu, Linux-MM,
Alexander Potapenko, kasan-dev, Tejun Heo, Andrew Morton,
Will Deacon, tglx, Dmitry Vyukov
[-- Attachment #1: Type: text/plain, Size: 3102 bytes --]
On Tue, Nov 24, 2020 at 07:30PM +0000, Mark Rutland wrote:
[...]
> > > I've just updated that branch with a new version which I hope covers
> > > kernel<->kernel transitions too. If you get a chance, would you mind
> > > giving that a spin?
> > >
> > > The HEAD commit should be:
> > >
> > > a51334f033f8ee88 ("HACK: check IRQ tracing has RCU watching")
> >
> > Thank you! Your series appears to work and fixes the stalls and
> > deadlocks (3 trials)!
>
> Thanks for testing! I'm glad that appears to work, as it suggests
> there's not another massive problem lurking in this area.
>
> While cleaning/splitting that up today, I spotted a couple of new
> problems I introduced, and I'm part-way through sorting that out, but
> it's not quite ready today after all. :/
>
> Fingers crossed for tomorrow...
>
> > I noticed there are a bunch of warnings in the log
> > that might be relevant (see attached).
>
> > [ 91.184432] =============================
> > [ 91.188301] WARNING: suspicious RCU usage
> > [ 91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G W
> > [ 91.197536] -----------------------------
> > [ 91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> > [ 91.206546]
> > [ 91.206546] other info that might help us debug this:
> > [ 91.206546]
> > [ 91.211790]
> > [ 91.211790] rcu_scheduler_active = 2, debug_locks = 0
> > [ 91.216454] RCU used illegally from extended quiescent state!
> > [ 91.220890] no locks held by swapper/0/0.
> > [ 91.224712]
> > [ 91.224712] stack backtrace:
> > [ 91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> > [ 91.234877] Hardware name: linux,dummy-virt (DT)
> > [ 91.239032] Call trace:
> > [ 91.242587] dump_backtrace+0x0/0x240
> > [ 91.246500] show_stack+0x34/0x88
> > [ 91.250295] dump_stack+0x140/0x1bc
> > [ 91.254159] lockdep_rcu_suspicious+0xe4/0xf8
> > [ 91.258332] trace_hardirqs_off+0x214/0x330
> > [ 91.262462] trace_graph_return+0x1ac/0x1d8
> > [ 91.266564] ftrace_return_to_handler+0xa4/0x170
> > [ 91.270809] return_to_handler+0x1c/0x38
> > [ 91.274826] default_idle_call+0x94/0x38c
> > [ 91.278869] do_idle+0x240/0x290
> > [ 91.282633] rest_init+0x1e8/0x2dc
> > [ 91.286529] arch_call_rest_init+0x1c/0x28
> > [ 91.290585] start_kernel+0x638/0x670
>
> Hmm... I suspect that arch_cpu_idle() is being traced here, and I reckon
> we have to mark that and its callees as noinstr, since it doesn't seem
> sane to have ftrace check whether RCU is watching for every function
> call. Maybe Paul or Steve can correct me. ;)
Yes, it's arch_cpu_idle().
> If you still have the binary lying around, can you check whether
> default_idle_call+0x94/0x38c is just after the call to arch_cpu_idle()?
> If you could dump the asm around that, along with whatever faddr2line
> tells you, that'd be a great help.
I reran to be sure, with similar results. I've attached a
syz-symbolize'd version of the warnings.
Thanks,
-- Marco
[-- Attachment #2: dmesg-symbolized --]
[-- Type: text/plain, Size: 19737 bytes --]
sched: DL replenish lagged too much
PASSED
Testing tracer wakeup_dl: PASSED
Testing tracer function_graph:
=============================
WARNING: suspicious RCU usage
5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G W
-----------------------------
kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 0
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x140/0x1bc lib/dump_stack.c:120
lockdep_rcu_suspicious+0xe4/0xf8 kernel/locking/lockdep.c:6353
trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
default_idle_call+0x94/0x38c kernel/sched/idle.c:112
cpuidle_idle_call kernel/sched/idle.c:194 [inline]
do_idle+0x240/0x290 kernel/sched/idle.c:299
rest_init+0x1e8/0x2dc init/main.c:722
arch_call_rest_init+0x1c/0x28
start_kernel+0x638/0x670 init/main.c:1066
WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:5279 check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO -TCO BTYPE=--)
pc : check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
lr : check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
sp : ffffa97d6cd73a50
x29: ffffa97d6cd73a50 x28: ffffa97d6cd84000
x27: ffffa97d6c69cbb8 x26: ffffa97d6a835930
x25: 0000000000000000 x24: 0000000000000000
x23: ffffa97d6cdaf360 x22: 0000000000000001
x21: 0000000000000000 x20: 0000000000000001
x19: ffffa97d6dbcf000 x18: 00000000d8c875f1
x17: 00000000bf9a23d2 x16: 0000000000000005
x15: 0000000000000000 x14: 0000000000000028
x13: 000000000000067d x12: 0000000000000028
x11: 0101010101010101 x10: ffffa97d6cd73820
x9 : ffffa97d6a960ff8 x8 : 4e5241575f534b43
x7 : 4f4c5f4755424544 x6 : ffff35ce3dbd3667
x5 : 00000000ffffffc8 x4 : ffff35ce3dbd2c60
x3 : ffffa97d6a800000 x2 : ffffa97d6bc60000
x1 : ba5fc2cb163a2c00 x0 : 0000000000000000
Call trace:
check_flags.part.0+0x1d4/0x1f8 kernel/locking/lockdep.c:5279
check_flags kernel/locking/lockdep.c:5275 [inline]
lock_acquire kernel/locking/lockdep.c:5432 [inline]
lock_acquire+0x208/0x508 kernel/locking/lockdep.c:5400
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x5c/0x80 kernel/locking/spinlock.c:151
vprintk_emit+0xb4/0x380 kernel/printk/printk.c:2010
vprintk_default+0x4c/0x60 kernel/printk/printk.c:2045
vprintk_func+0x120/0x330 kernel/printk/printk_safe.c:393
printk+0x78/0x9c kernel/printk/printk.c:2076
lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
default_idle_call+0x94/0x38c kernel/sched/idle.c:112
cpuidle_idle_call kernel/sched/idle.c:194 [inline]
do_idle+0x240/0x290 kernel/sched/idle.c:299
rest_init+0x1e8/0x2dc init/main.c:722
arch_call_rest_init+0x1c/0x28
start_kernel+0x638/0x670 init/main.c:1066
irq event stamp: 1719
hardirqs last enabled at (1719): [<ffffa97d6bc33094>] exit_el1_irq_or_nmi+0x24/0x50 arch/arm64/kernel/entry-common.c:101
hardirqs last disabled at (1716): [<ffffa97d6bc33060>] enter_el1_irq_or_nmi+0x20/0x30 arch/arm64/kernel/entry-common.c:93
softirqs last enabled at (1718): [<ffffa97d6a835930>] return_to_handler+0x0/0x38 arch/arm64/kernel/entry-ftrace.S:314
softirqs last disabled at (1717): [<ffffa97d6a8c6880>] irq_enter_rcu+0x88/0xa8 kernel/softirq.c:363
---[ end trace 06e986ee87545489 ]---
=============================
WARNING: suspicious RCU usage
5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
-----------------------------
include/trace/events/ipi.h:19 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x140/0x1bc lib/dump_stack.c:120
lockdep_rcu_suspicious+0xe4/0xf8 kernel/locking/lockdep.c:6353
trace_ipi_raise include/trace/events/ipi.h:19 [inline]
smp_cross_call+0x220/0x228 arch/arm64/kernel/smp.c:953
arch_irq_work_raise+0x40/0x50 arch/arm64/kernel/smp.c:840
__irq_work_queue_local kernel/irq_work.c:59 [inline]
__irq_work_queue_local+0xb0/0xe8 kernel/irq_work.c:53
irq_work_queue kernel/irq_work.c:75 [inline]
irq_work_queue+0x5c/0xb8 kernel/irq_work.c:67
queue_flush_work kernel/printk/printk_safe.c:56 [inline]
printk_safe_log_store+0x1a0/0x1d8 kernel/printk/printk_safe.c:107
vprintk_safe kernel/printk/printk_safe.c:347 [inline]
vprintk_func+0x23c/0x330 kernel/printk/printk_safe.c:390
printk+0x78/0x9c kernel/printk/printk.c:2076
lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
trace_lock_acquire include/trace/events/lock.h:13 [inline]
lock_acquire+0x340/0x508 kernel/locking/lockdep.c:5406
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x5c/0x80 kernel/locking/spinlock.c:151
vprintk_emit+0xb4/0x380 kernel/printk/printk.c:2010
vprintk_default+0x4c/0x60 kernel/printk/printk.c:2045
vprintk_func+0x120/0x330 kernel/printk/printk_safe.c:393
printk+0x78/0x9c kernel/printk/printk.c:2076
lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
default_idle_call+0x94/0x38c kernel/sched/idle.c:112
cpuidle_idle_call kernel/sched/idle.c:194 [inline]
do_idle+0x240/0x290 kernel/sched/idle.c:299
rest_init+0x1e8/0x2dc init/main.c:722
arch_call_rest_init+0x1c/0x28
start_kernel+0x638/0x670 init/main.c:1066
=============================
WARNING: suspicious RCU usage
5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Not tainted
-----------------------------
include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
RCU used illegally from extended quiescent state!
no locks held by swapper/0/0.
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x0/0x240 arch/arm64/kernel/stacktrace.c:100
show_stack+0x34/0x88 arch/arm64/kernel/stacktrace.c:196
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x140/0x1bc lib/dump_stack.c:120
lockdep_rcu_suspicious+0xe4/0xf8 kernel/locking/lockdep.c:6353
trace_lock_acquire include/trace/events/lock.h:13 [inline]
lock_acquire+0x340/0x508 kernel/locking/lockdep.c:5406
__raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
_raw_spin_lock+0x5c/0x80 kernel/locking/spinlock.c:151
vprintk_emit+0xb4/0x380 kernel/printk/printk.c:2010
vprintk_default+0x4c/0x60 kernel/printk/printk.c:2045
vprintk_func+0x120/0x330 kernel/printk/printk_safe.c:393
printk+0x78/0x9c kernel/printk/printk.c:2076
lockdep_rcu_suspicious+0x2c/0xf8 kernel/locking/lockdep.c:6317
trace_hardirqs_off+0x214/0x330 kernel/trace/trace_preemptirq.c:78
trace_graph_return+0x1ac/0x1d8 kernel/trace/trace_functions_graph.c:253
ftrace_return_to_handler+0xa4/0x170 kernel/trace/fgraph.c:239
return_to_handler+0x1c/0x38 arch/arm64/kernel/entry-ftrace.S:333
default_idle_call+0x94/0x38c kernel/sched/idle.c:112
cpuidle_idle_call kernel/sched/idle.c:194 [inline]
do_idle+0x240/0x290 kernel/sched/idle.c:299
rest_init+0x1e8/0x2dc init/main.c:722
arch_call_rest_init+0x1c/0x28
start_kernel+0x638/0x670 init/main.c:1066
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled())possible reason: unannotated irqs-off.
irq event stamp: 1719
hardirqs last enabled at (1719): [<ffffa97d6bc33094>] exit_el1_irq_or_nmi+0x24/0x50 arch/arm64/kernel/entry-common.c:101
hardirqs last disabled at (1716): [<ffffa97d6bc33060>] enter_el1_irq_or_nmi+0x20/0x30 arch/arm64/kernel/entry-common.c:93
softirqs last enabled at (1718): [<ffffa97d6a835930>] return_to_handler+0x0/0x38 arch/arm64/kernel/entry-ftrace.S:314
softirqs last disabled at (1717): [<ffffa97d6a8c6880>] irq_enter_rcu+0x88/0xa8 kernel/softirq.c:363
PASSED
pinctrl core: initialized pinctrl subsystem
DMI not present or invalid.
NET: Registered protocol family 16
DMA: preallocated 256 KiB GFP_KERNEL pool for atomic allocations
DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
DMA: preallocated 256 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
audit: initializing netlink subsys (disabled)
audit: type=2000 audit(81.200:1): state=initialized audit_enabled=0 res=1
thermal_sys: Registered thermal governor 'step_wise'
thermal_sys: Registered thermal governor 'power_allocator'
cpuidle: using governor menu
hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
ASID allocator initialised with 32768 entries
Serial: AMBA PL011 UART driver
9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 47, base_baud = 0) is a PL011 rev1
printk: console [ttyAMA0] enabled
printk: console [ttyAMA0] enabled
printk: bootconsole [pl11] disabled
printk: bootconsole [pl11] disabled
HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
HugeTLB registered 32.0 MiB page size, pre-allocated 0 pages
HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
HugeTLB registered 64.0 KiB page size, pre-allocated 0 pages
cryptd: max_cpu_qlen set to 1000
ACPI: Interpreter disabled.
iommu: Default domain type: Translated
vgaarb: loaded
SCSI subsystem initialized
libata version 3.00 loaded.
usbcore: registered new interface driver usbfs
usbcore: registered new interface driver hub
usbcore: registered new device driver usb
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
EDAC MC: Ver: 3.0.0
FPGA manager framework
Advanced Linux Sound Architecture Driver Initialized.
clocksource: Switched to clocksource arch_sys_counter
VFS: Disk quotas dquot_6.6.0
VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
pnp: PnP ACPI: disabled
NET: Registered protocol family 2
tcp_listen_portaddr_hash hash table entries: 1024 (order: 4, 81920 bytes, linear)
TCP established hash table entries: 16384 (order: 5, 131072 bytes, linear)
TCP bind hash table entries: 16384 (order: 8, 1179648 bytes, linear)
TCP: Hash tables configured (established 16384 bind 16384)
UDP hash table entries: 1024 (order: 5, 163840 bytes, linear)
UDP-Lite hash table entries: 1024 (order: 5, 163840 bytes, linear)
NET: Registered protocol family 1
RPC: Registered named UNIX socket transport module.
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
RPC: Registered tcp NFSv4.1 backchannel transport module.
PCI: CLS 0 bytes, default 64
hw perfevents: enabled with armv8_pmuv3 PMU driver, 5 counters available
kvm [1]: HYP mode not available
Initialise system trusted keyrings
workingset: timestamp_bits=44 max_order=19 bucket_order=0
squashfs: version 4.0 (2009/01/31) Phillip Lougher
NFS: Registering the id_resolver key type
Key type id_resolver registered
Key type id_legacy registered
nfs4filelayout_init: NFSv4 File Layout Driver Registering...
9p: Installing v9fs 9p2000 file system support
Key type asymmetric registered
Asymmetric key parser 'x509' registered
Block layer SCSI generic (bsg) driver version 0.4 loaded (major 245)
io scheduler mq-deadline registered
io scheduler kyber registered
pl061_gpio 9030000.pl061: PL061 GPIO chip registered
pci-host-generic 4010000000.pcie: host bridge /pcie@10000000 ranges:
pci-host-generic 4010000000.pcie: IO 0x003eff0000..0x003effffff -> 0x0000000000
pci-host-generic 4010000000.pcie: MEM 0x0010000000..0x003efeffff -> 0x0010000000
pci-host-generic 4010000000.pcie: MEM 0x8000000000..0xffffffffff -> 0x8000000000
pci-host-generic 4010000000.pcie: ECAM at [mem 0x4010000000-0x401fffffff] for [bus 00-ff]
pci-host-generic 4010000000.pcie: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [bus 00-ff]
pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
pci_bus 0000:00: root bus resource [mem 0x10000000-0x3efeffff]
pci_bus 0000:00: root bus resource [mem 0x8000000000-0xffffffffff]
pci 0000:00:00.0: [1b36:0008] type 00 class 0x060000
pci 0000:00:01.0: [1af4:1009] type 00 class 0x000200
pci 0000:00:01.0: reg 0x10: [io 0x0000-0x003f]
pci 0000:00:01.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:01.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:02.0: [1af4:1009] type 00 class 0x000200
pci 0000:00:02.0: reg 0x10: [io 0x0000-0x003f]
pci 0000:00:02.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:02.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000
pci 0000:00:03.0: reg 0x10: [io 0x0000-0x001f]
pci 0000:00:03.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:03.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:03.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
pci 0000:00:04.0: [1af4:1004] type 00 class 0x010000
pci 0000:00:04.0: reg 0x10: [io 0x0000-0x003f]
pci 0000:00:04.0: reg 0x14: [mem 0x00000000-0x00000fff]
pci 0000:00:04.0: reg 0x20: [mem 0x00000000-0x00003fff 64bit pref]
pci 0000:00:03.0: BAR 6: assigned [mem 0x10000000-0x1003ffff pref]
pci 0000:00:01.0: BAR 4: assigned [mem 0x8000000000-0x8000003fff 64bit pref]
pci 0000:00:02.0: BAR 4: assigned [mem 0x8000004000-0x8000007fff 64bit pref]
pci 0000:00:03.0: BAR 4: assigned [mem 0x8000008000-0x800000bfff 64bit pref]
pci 0000:00:04.0: BAR 4: assigned [mem 0x800000c000-0x800000ffff 64bit pref]
pci 0000:00:01.0: BAR 1: assigned [mem 0x10040000-0x10040fff]
pci 0000:00:02.0: BAR 1: assigned [mem 0x10041000-0x10041fff]
pci 0000:00:03.0: BAR 1: assigned [mem 0x10042000-0x10042fff]
pci 0000:00:04.0: BAR 1: assigned [mem 0x10043000-0x10043fff]
pci 0000:00:01.0: BAR 0: assigned [io 0x1000-0x103f]
pci 0000:00:02.0: BAR 0: assigned [io 0x1040-0x107f]
pci 0000:00:04.0: BAR 0: assigned [io 0x1080-0x10bf]
pci 0000:00:03.0: BAR 0: assigned [io 0x10c0-0x10df]
EINJ: ACPI disabled.
virtio-pci 0000:00:01.0: enabling device (0000 -> 0003)
virtio-pci 0000:00:02.0: enabling device (0000 -> 0003)
virtio-pci 0000:00:03.0: enabling device (0000 -> 0003)
virtio-pci 0000:00:04.0: enabling device (0000 -> 0003)
Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
SuperH (H)SCI(F) driver initialized
msm_serial: driver initialized
cacheinfo: Unable to detect cache hierarchy for CPU 0
loop: module loaded
megasas: 07.714.04.00-rc1
scsi host0: Virtio SCSI HBA
scsi 0:0:0:0: Direct-Access QEMU QEMU HARDDISK 2.5+ PQ: 0 ANSI: 5
random: fast init done
sd 0:0:0:0: Power-on or device reset occurred
sd 0:0:0:0: [sda] 524288 512-byte logical blocks: (268 MB/256 MiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 63 00 00 08
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sda: detected capacity change from 0 to 268435456
physmap-flash 0.flash: physmap platform flash device: [mem 0x00000000-0x03ffffff]
0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
erase region 0: offset=0x0,size=0x40000,blocks=256
physmap-flash 0.flash: physmap platform flash device: [mem 0x04000000-0x07ffffff]
0.flash: Found 2 x16 devices at 0x0 in 32-bit bank. Manufacturer ID 0x000000 Chip ID 0x000000
Intel/Sharp Extended Query Table at 0x0031
Using buffer write method
erase region 0: offset=0x0,size=0x40000,blocks=256
Concatenating MTD devices:
(0): "0.flash"
(1): "0.flash"
into device "0.flash"
sda: detected capacity change from 0 to 268435456
sd 0:0:0:0: [sda] Attached SCSI disk
libphy: Fixed MDIO Bus: probed
tun: Universal TUN/TAP device driver, 1.6
thunder_xcv, ver 1.0
thunder_bgx, ver 1.0
nicpf, ver 1.0
hclge is initializing
hns3: Hisilicon Ethernet Network Driver for Hip08 Family - version
hns3: Copyright (c) 2017 Huawei Corporation.
e1000: Intel(R) PRO/1000 Network Driver
e1000: Copyright (c) 1999-2006 Intel Corporation.
e1000e: Intel(R) PRO/1000 Network Driver
e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
igb: Intel(R) Gigabit Ethernet Network Driver
igb: Copyright (c) 2007-2014 Intel Corporation.
igbvf: Intel(R) Gigabit Virtual Function Network Driver
igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
sky2: driver version 1.30
VFIO - User Level meta-driver version: 0.3
ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
ehci-pci: EHCI PCI platform driver
ehci-platform: EHCI generic platform driver
ehci-orion: EHCI orion driver
ehci-exynos: EHCI Exynos driver
ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
ohci-pci: OHCI PCI platform driver
ohci-platform: OHCI generic platform driver
ohci-exynos: OHCI Exynos driver
usbcore: registered new interface driver usb-storage
rtc-pl031 9010000.pl031: registered as rtc0
rtc-pl031 9010000.pl031: setting system clock to 2020-11-25T09:39:22 UTC (1606297162)
i2c /dev entries driver
sdhci: Secure Digital Host Controller Interface driver
sdhci: Copyright(c) Pierre Ossman
Synopsys Designware Multimedia Card Interface Driver
sdhci-pltfm: SDHCI platform and OF driver helper
ledtrig-cpu: registered to indicate activity on CPUs
usbcore: registered new interface driver usbhid
usbhid: USB HID core driver
drop_monitor: Initializing network drop monitor service
NET: Registered protocol family 17
9pnet: Installing 9P2000 support
Key type dns_resolver registered
registered taskstats version 1
Running tests on all trace events:
Testing all events: OK
Running tests again, along with the function tracer
Running tests on all trace events:
Testing all events:
hrtimer: interrupt took 11560624 ns
OK
Testing ftrace filter: OK
Loading compiled-in X.509 certificates
input: gpio-keys as /devices/platform/gpio-keys/input/input0
ALSA device list:
No soundcards found.
TAP version 14
1..0
uart-pl011 9000000.pl011: no DMA platform data
EXT4-fs (sda): mounting ext3 file system using the ext4 subsystem
EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: (null)
VFS: Mounted root (ext3 filesystem) readonly on device 8:0.
devtmpfs: mounted
Freeing unused kernel memory: 8896K
Run /sbin/init as init process
with arguments:
/sbin/init
with environment:
HOME=/
TERM=linux
earlyprintk=serial
EXT4-fs (sda): re-mounted. Opts: (null)
ext3 filesystem being remounted at / supports timestamps until 2038 (0x7fffffff)
random: crng init done
[-- Attachment #3: Type: text/plain, Size: 176 bytes --]
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...)
2020-11-25 9:45 ` Marco Elver
@ 2020-11-25 10:28 ` Mark Rutland
0 siblings, 0 replies; 24+ messages in thread
From: Mark Rutland @ 2020-11-25 10:28 UTC (permalink / raw)
To: Marco Elver
Cc: linux-arm-kernel, Anders Roxell, Paul E. McKenney, Jann Horn,
Peter Zijlstra, boqun.feng, Lai Jiangshan,
Linux Kernel Mailing List, Steven Rostedt, rcu, Linux-MM,
Alexander Potapenko, kasan-dev, Tejun Heo, Andrew Morton,
Will Deacon, tglx, Dmitry Vyukov
On Wed, Nov 25, 2020 at 10:45:17AM +0100, Marco Elver wrote:
> On Tue, Nov 24, 2020 at 07:30PM +0000, Mark Rutland wrote:
[...]
> > > I noticed there are a bunch of warnings in the log
> > > that might be relevant (see attached).
> >
> > > [ 91.184432] =============================
> > > [ 91.188301] WARNING: suspicious RCU usage
> > > [ 91.192316] 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25 Tainted: G W
> > > [ 91.197536] -----------------------------
> > > [ 91.201431] kernel/trace/trace_preemptirq.c:78 RCU not watching trace_hardirqs_off()!
> > > [ 91.206546]
> > > [ 91.206546] other info that might help us debug this:
> > > [ 91.206546]
> > > [ 91.211790]
> > > [ 91.211790] rcu_scheduler_active = 2, debug_locks = 0
> > > [ 91.216454] RCU used illegally from extended quiescent state!
> > > [ 91.220890] no locks held by swapper/0/0.
> > > [ 91.224712]
> > > [ 91.224712] stack backtrace:
> > > [ 91.228794] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.10.0-rc4-next-20201119-00002-g51c2bf0ac853 #25
> > > [ 91.234877] Hardware name: linux,dummy-virt (DT)
> > > [ 91.239032] Call trace:
> > > [ 91.242587] dump_backtrace+0x0/0x240
> > > [ 91.246500] show_stack+0x34/0x88
> > > [ 91.250295] dump_stack+0x140/0x1bc
> > > [ 91.254159] lockdep_rcu_suspicious+0xe4/0xf8
> > > [ 91.258332] trace_hardirqs_off+0x214/0x330
> > > [ 91.262462] trace_graph_return+0x1ac/0x1d8
> > > [ 91.266564] ftrace_return_to_handler+0xa4/0x170
> > > [ 91.270809] return_to_handler+0x1c/0x38
> > > [ 91.274826] default_idle_call+0x94/0x38c
> > > [ 91.278869] do_idle+0x240/0x290
> > > [ 91.282633] rest_init+0x1e8/0x2dc
> > > [ 91.286529] arch_call_rest_init+0x1c/0x28
> > > [ 91.290585] start_kernel+0x638/0x670
> >
> > Hmm... I suspect that arch_cpu_idle() is being traced here, and I reckon
> > we have to mark that and its callees as noinstr, since it doesn't seem
> > sane to have ftrace check whether RCU is watching for every function
> > call. Maybe Paul or Steve can correct me. ;)
>
> Yes, it's arch_cpu_idle().
>
> > If you still have the binary lying around, can you check whether
> > default_idle_call+0x94/0x38c is just after the call to arch_cpu_idle()?
> > If you could dump the asm around that, along with whatever faddr2line
> > tells you, that'd be a great help.
>
> I reran to be sure, with similar results. I've attached a
> syz-symbolize'd version of the warnings.
Thanks for confirming, and for the symbolized report.
I'll see about getting this fixed ASAP.
Thanks,
Mark.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2020-11-25 10:30 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20201112175406.GF3249@paulmck-ThinkPad-P72>
[not found] ` <20201113175754.GA6273@paulmck-ThinkPad-P72>
[not found] ` <20201117105236.GA1964407@elver.google.com>
[not found] ` <20201117182915.GM1437@paulmck-ThinkPad-P72>
[not found] ` <20201118225621.GA1770130@elver.google.com>
[not found] ` <20201118233841.GS1437@paulmck-ThinkPad-P72>
[not found] ` <20201119125357.GA2084963@elver.google.com>
[not found] ` <20201119151409.GU1437@paulmck-ThinkPad-P72>
[not found] ` <20201119170259.GA2134472@elver.google.com>
[not found] ` <20201119184854.GY1437@paulmck-ThinkPad-P72>
2020-11-19 19:38 ` linux-next: stall warnings and deadlock on Arm64 (was: [PATCH] kfence: Avoid stalling...) Marco Elver
2020-11-19 21:35 ` Paul E. McKenney
2020-11-19 22:53 ` Will Deacon
2020-11-20 10:30 ` Mark Rutland
2020-11-20 14:03 ` Marco Elver
2020-11-23 19:32 ` Mark Rutland
2020-11-24 14:03 ` Marco Elver
2020-11-24 15:01 ` Paul E. McKenney
2020-11-24 19:43 ` Mark Rutland
2020-11-24 20:32 ` Steven Rostedt
2020-11-24 19:30 ` Mark Rutland
2020-11-25 9:45 ` Marco Elver
2020-11-25 10:28 ` Mark Rutland
2020-11-20 14:19 ` Marco Elver
2020-11-20 14:39 ` Paul E. McKenney
2020-11-20 15:22 ` Mark Rutland
2020-11-20 17:38 ` Paul E. McKenney
2020-11-20 18:02 ` Mark Rutland
2020-11-20 18:57 ` Paul E. McKenney
2020-11-20 15:26 ` Steven Rostedt
2020-11-20 18:17 ` Marco Elver
2020-11-20 18:57 ` Steven Rostedt
2020-11-20 19:16 ` Steven Rostedt
2020-11-20 19:22 ` Marco Elver
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).