[linus:master] [torture] 5d248bb39f: BUG:workqueue

linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [linus:master] [torture]  5d248bb39f: BUG:workqueue_lockup-pool
@ 2023-10-10  7:54 kernel test robot
  0 siblings, 0 replies; only message in thread
From: kernel test robot @ 2023-10-10  7:54 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: oe-lkp, lkp, linux-kernel, Paul E. McKenney, Josh Triplett,
	Joel Fernandes, Juri Lelli, Valentin Schneider, John Stultz,
	Davidlohr Bueso, linux-doc, oliver.sang


hi, Dietmar Eggemann,

we don't have enough knowledge to explain below issue, just report what we
observed in our tests FYI.

since random, we run the jobs for both 5d248bb39f and parent to 999 times:

67d5404d27437689 5d248bb39fe1388943acb6510f8
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :999          6%          56:999   dmesg.BUG:workqueue_lockup-pool   <--------
           :999          0%           1:999   dmesg.INFO:rcu_preempt_detected_stalls_on_CPUs/tasks
           :999          0%           1:999   dmesg.INFO:rcu_tasks_detected_stalls_on_tasks
          2:999          2%          19:999   dmesg.INFO:task_blocked_for_more_than#seconds
          8:999         -1%            :999   dmesg.IP-Config:Auto-configuration_of_network_failed
          2:999          2%          19:999   dmesg.Kernel_panic-not_syncing:hung_task:blocked_tasks
           :999          0%           2:999   dmesg.RIP:__might_resched
           :999          0%           1:999   dmesg.RIP:debug_lockdep_rcu_enabled
           :999          0%           2:999   dmesg.RIP:ftrace_likely_update
          1:999         -0%            :999   dmesg.RIP:kasan_check_range
           :999          0%           1:999   dmesg.RIP:kfree_rcu_monitor
           :999          0%           1:999   dmesg.RIP:lock_acquire
           :999          0%           2:999   dmesg.RIP:lock_is_held_type
           :999          0%           1:999   dmesg.RIP:process_one_work
          1:999         -0%            :999   dmesg.WARNING:inconsistent_lock_state
          1:999         -0%            :999   dmesg.calltrace:irq_exit_rcu
          1:999         -0%            :999   dmesg.inconsistent{IN-HARDIRQ-W}->{HARDIRQ-ON-W}usage

as above, both commits have various random issues.
however, for dmesg.BUG:workqueue_lockup-pool, we only observed it happens for
5d248bb39f with a low rate (56 times out of 999 runs), but never happen on
parent.

below is more details. 

Hello,

kernel test robot noticed "BUG:workqueue_lockup-pool" on:

commit: 5d248bb39fe1388943acb6510f8f48fa5570e0ec ("torture: Add lock_torture writer_fifo module parameter")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 6465e260f48790807eef06b583b38ca9789b6072]
[test failed on linux-next/master 4ae73bba62a367f2314f6ce69e3085a941983d8b]

in testcase: boot

compiler: gcc-9
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202310101456.53664ae-oliver.sang@intel.com


[  456.238517][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 34s!
[  456.263857][    C0] Showing busy workqueues and worker pools:
[  456.283499][    C0] workqueue events_power_efficient: flags=0x80
[  456.299402][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  456.302891][    C0]     pending: check_lifetime
[  456.331425][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  464.399159][    C0] hrtimer: interrupt took 9163921 ns
[  486.958477][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s!
[  486.974447][    C0] Showing busy workqueues and worker pools:
[  486.984953][    C0] workqueue events_power_efficient: flags=0x80
[  486.996029][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  486.996193][    C0]     pending: check_lifetime
[  486.996710][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  517.678504][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 96s!
[  517.694928][    C0] Showing busy workqueues and worker pools:
[  517.706364][    C0] workqueue events_power_efficient: flags=0x80
[  517.716928][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  517.717091][    C0]     pending: check_lifetime
[  517.717615][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  548.398534][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 126s!
[  548.415903][    C0] Showing busy workqueues and worker pools:
[  548.445877][    C0] workqueue events_power_efficient: flags=0x80
[  548.472451][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  548.472645][    C0]     pending: check_lifetime
[  548.473106][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  579.118467][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 157s!
[  579.227632][    C0] Showing busy workqueues and worker pools:
[  579.238527][    C0] workqueue events_power_efficient: flags=0x80
[  579.249190][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  579.249353][    C0]     pending: check_lifetime
[  579.249896][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  609.838492][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 188s!
[  609.854400][    C0] Showing busy workqueues and worker pools:
[  609.869467][    C0] workqueue events_power_efficient: flags=0x80
[  609.884664][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  609.884855][    C0]     pending: check_lifetime
[  609.885412][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  640.558322][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 219s!
[  640.573488][    C0] Showing busy workqueues and worker pools:
[  640.584383][    C0] workqueue events_power_efficient: flags=0x80
[  640.596724][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  640.596891][    C0]     pending: check_lifetime
[  640.597439][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  671.278472][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 249s!
[  671.305078][    C0] Showing busy workqueues and worker pools:
[  671.317258][    C0] workqueue events_power_efficient: flags=0x80
[  671.333460][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  671.333624][    C0]     pending: check_lifetime
[  671.357925][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  701.998584][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 280s!
[  702.019680][    C0] Showing busy workqueues and worker pools:
[  702.030924][    C0] workqueue events_power_efficient: flags=0x80
[  702.042393][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  702.043337][    C0]     pending: check_lifetime
[  702.044403][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  732.718468][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 311s!
[  732.741426][    C0] Showing busy workqueues and worker pools:
[  732.758915][    C0] workqueue events_power_efficient: flags=0x80
[  732.768737][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  732.768923][    C0]     pending: check_lifetime
[  732.806456][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  763.438485][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 342s!
[  763.461942][    C0] Showing busy workqueues and worker pools:
[  763.472996][    C0] workqueue events_power_efficient: flags=0x80
[  763.484138][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  763.484304][    C0]     pending: check_lifetime
[  763.484727][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  794.158571][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 372s!
[  794.174418][    C0] Showing busy workqueues and worker pools:
[  794.188676][    C0] workqueue events_power_efficient: flags=0x80
[  794.199823][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  794.199990][    C0]     pending: check_lifetime
[  794.200477][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  824.878523][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 403s!
[  824.898973][    C0] Showing busy workqueues and worker pools:
[  824.915273][    C0] workqueue events_power_efficient: flags=0x80
[  824.926393][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  824.926581][    C0]     pending: check_lifetime
[  824.927107][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  855.598509][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 434s!
[  855.628235][    C0] Showing busy workqueues and worker pools:
[  855.638889][    C0] workqueue events_power_efficient: flags=0x80
[  855.651558][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  855.651727][    C0]     pending: check_lifetime
[  855.652271][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  886.318451][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 464s!
[  886.343400][    C0] Showing busy workqueues and worker pools:
[  886.353541][    C0] workqueue events_power_efficient: flags=0x80
[  886.372345][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  886.372510][    C0]     pending: check_lifetime
[  886.373057][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  917.038624][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 495s!
[  917.061144][    C0] Showing busy workqueues and worker pools:
[  917.072203][    C0] workqueue events_power_efficient: flags=0x80
[  917.094365][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  917.094591][    C0]     pending: check_lifetime
[  917.095106][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  947.758605][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 526s!
[  947.794471][    C0] Showing busy workqueues and worker pools:
[  947.806830][    C0] workqueue events_power_efficient: flags=0x80
[  947.819558][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  947.819811][    C0]     pending: check_lifetime
[  947.820305][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[  978.478528][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 557s!
[  978.503662][    C0] Showing busy workqueues and worker pools:
[  978.514379][    C0] workqueue events_power_efficient: flags=0x80
[  978.525533][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[  978.525723][    C0]     pending: check_lifetime
[  978.548149][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[ 1009.198483][    C0] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 587s!
[ 1009.221412][    C0] Showing busy workqueues and worker pools:
[ 1009.237792][    C0] workqueue events_power_efficient: flags=0x80
[ 1009.269282][    C0]   pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[ 1009.269477][    C0]     pending: check_lifetime
[ 1009.290321][    C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[ 1029.678914][   T30] INFO: task swapper/0:1 blocked for more than 491 seconds.
[ 1029.699579][   T30]       Not tainted 6.5.0-rc1-00004-g5d248bb39fe1 #1
[ 1029.718868][   T30] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1029.740371][   T30] task:swapper/0       state:D stack:0     pid:1     ppid:0      flags:0x00004000
[ 1029.767948][   T30] Call Trace:
[ 1029.773801][   T30]  <TASK>
[ 1029.778983][ T30] __schedule (kernel/sched/core.c:5381 kernel/sched/core.c:6710) 
[ 1029.791567][ T30] ? io_schedule_timeout (kernel/sched/core.c:6592) 
[ 1029.800242][ T30] ? __lock_acquire (kernel/locking/lockdep.c:5144) 
[ 1029.808382][ T30] schedule (kernel/sched/core.c:6787 (discriminator 1)) 
[ 1029.815057][ T30] schedule_timeout (kernel/time/timer.c:2144) 
[ 1029.822775][ T30] ? mark_lock (arch/x86/include/asm/bitops.h:228 (discriminator 3) arch/x86/include/asm/bitops.h:240 (discriminator 3) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 3) kernel/locking/lockdep.c:228 (discriminator 3) kernel/locking/lockdep.c:4663 (discriminator 3)) 
[ 1029.833948][ T30] ? usleep_range_state (kernel/time/timer.c:2129) 
[ 1029.842598][ T30] ? check_preemption_disabled (lib/smp_processor_id.c:16) 
[ 1029.851647][ T30] ? _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42 arch/x86/include/asm/irqflags.h:77 include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202) 
[ 1029.860296][ T30] ? wait_for_completion (kernel/sched/completion.c:85 kernel/sched/completion.c:106 kernel/sched/completion.c:117 kernel/sched/completion.c:138) 
[ 1029.869302][ T30] wait_for_completion (kernel/sched/completion.c:86 kernel/sched/completion.c:106 kernel/sched/completion.c:117 kernel/sched/completion.c:138) 
[ 1029.877507][ T30] ? wait_for_completion_io (kernel/sched/completion.c:137) 
[ 1029.886807][ T30] __cpuhp_kick_ap (kernel/cpu.c:737) 
[ 1029.894385][ T30] cpuhp_issue_call (kernel/cpu.c:1123 kernel/cpu.c:2324) 
[ 1029.907022][ T30] __cpuhp_setup_state_cpuslocked (kernel/cpu.c:2473) 
[ 1029.917563][ T30] ? rcu_read_lock_held (kernel/rcu/update.c:380) 
[ 1029.926533][ T30] ? xz_dec_bcj_reset (lib/percpu_counter.c:201) 
[ 1029.934476][ T30] __cpuhp_setup_state (include/linux/percpu-rwsem.h:99 kernel/cpu.c:501 kernel/cpu.c:2503) 
[ 1029.942992][ T30] ? libcrc32c_mod_init (lib/percpu_counter.c:258) 
[ 1029.951358][ T30] percpu_counter_startup (lib/percpu_counter.c:263) 
[ 1029.959755][ T30] do_one_initcall (init/main.c:1232) 
[ 1029.967660][ T30] ? trace_event_raw_event_initcall_level (init/main.c:1223) 
[ 1029.978323][ T30] ? parameq (kernel/params.c:171) 
[ 1029.985581][ T30] ? check_preemption_disabled (lib/smp_processor_id.c:16) 
[ 1029.994714][ T30] ? check_preemption_disabled (lib/smp_processor_id.c:16) 
[ 1030.004702][ T30] kernel_init_freeable (init/main.c:1293 init/main.c:1310 init/main.c:1329 init/main.c:1546) 
[ 1030.013031][ T30] ? rest_init (init/main.c:1429) 
[ 1030.020898][ T30] kernel_init (init/main.c:1439) 
[ 1030.027964][ T30] ? rest_init (init/main.c:1429) 
[ 1030.035129][ T30] ret_from_fork (arch/x86/entry/entry_64.S:314) 
[ 1030.047506][   T30]  </TASK>
[ 1030.052818][   T30]
[ 1030.052818][   T30] Showing all locks held in the system:
[ 1030.065360][   T30] 2 locks held by swapper/0/1:
[ 1030.073338][ T30] #0: ffffffff84b35190 (cpu_hotplug_lock){++++}-{0:0}, at: percpu_counter_startup (lib/percpu_counter.c:263) 
[ 1030.091412][ T30] #1: ffffffff84b37848 (cpuhp_state_mutex){+.+.}-{3:3}, at: __cpuhp_setup_state_cpuslocked (kernel/cpu.c:2278 kernel/cpu.c:2449) 
[ 1030.109624][   T30] 1 lock held by rcu_tasks_kthre/11:
[ 1030.117928][ T30] #0: ffffffff84f38e50 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp (kernel/rcu/tasks.h:525) 
[ 1030.134843][   T30] 1 lock held by rcu_tasks_trace/12:
[ 1030.143010][ T30] #0: ffffffff84f38b50 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp (kernel/rcu/tasks.h:525) 
[ 1030.163969][   T30] 1 lock held by khungtaskd/30:
[ 1030.180109][ T30] #0: ffffffff84f39aa0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks (kernel/locking/lockdep.c:6615) 
[ 1030.200232][   T30]
[ 1030.204126][   T30] =============================================
[ 1030.204126][   T30]
[ 1030.217675][   T30] Kernel panic - not syncing: hung_task: blocked tasks
[ 1030.221472][   T30] CPU: 0 PID: 30 Comm: khungtaskd Not tainted 6.5.0-rc1-00004-g5d248bb39fe1 #1
[ 1030.221472][   T30] Call Trace:
[ 1030.221472][   T30]  <TASK>
[ 1030.221472][ T30] dump_stack_lvl (lib/dump_stack.c:107) 
[ 1030.221472][ T30] dump_stack (lib/dump_stack.c:114) 
[ 1030.221472][ T30] panic (kernel/panic.c:340) 
[ 1030.221472][ T30] ? panic_smp_self_stop+0x100/0x100 
[ 1030.221472][ T30] watchdog (kernel/hung_task.c:195 kernel/hung_task.c:379) 
[ 1030.221472][ T30] kthread (kernel/kthread.c:389) 
[ 1030.221472][ T30] ? calculate_sigpending (kernel/signal.c:200) 
[ 1030.221472][ T30] ? proc_dohung_task_timeout_secs (kernel/hung_task.c:362) 
[ 1030.221472][ T30] ? kthread_complete_and_exit (kernel/kthread.c:342) 
[ 1030.221472][ T30] ret_from_fork (arch/x86/entry/entry_64.S:314) 
[ 1030.221472][   T30]  </TASK>
[ 1030.221472][   T30] Kernel Offset: disabled



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231010/202310101456.53664ae-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-10-10  7:54 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-10  7:54 [linus:master] [torture] 5d248bb39f: BUG:workqueue_lockup-pool kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).