Greetings, 0day kernel testing robot got the below dmesg and the first bad commit is git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master commit b22ce2785d97423846206cceec4efee0c4afd980 Author: Tejun Heo AuthorDate: Wed Aug 28 17:33:37 2013 -0400 Commit: Tejun Heo CommitDate: Thu Aug 29 09:19:28 2013 -0400 workqueue: cond_resched() after processing each work item If !PREEMPT, a kworker running work items back to back can hog CPU. This becomes dangerous when a self-requeueing work item which is waiting for something to happen races against stop_machine. Such self-requeueing work item would requeue itself indefinitely hogging the kworker and CPU it's running on while stop_machine would wait for that CPU to enter stop_machine while preventing anything else from happening on all other CPUs. The two would deadlock. Jamie Liu reports that this deadlock scenario exists around scsi_requeue_run_queue() and libata port multiplier support, where one port may exclude command processing from other ports. With the right timing, scsi_requeue_run_queue() can end up requeueing itself trying to execute an IO which is asked to be retried while another device has an exclusive access, which in turn can't make forward progress due to stop_machine. Fix it by invoking cond_resched() after executing each work item. Signed-off-by: Tejun Heo Reported-by: Jamie Liu References: http://thread.gmane.org/gmane.linux.kernel/1552567 Cc: stable@vger.kernel.org -- kernel/workqueue.c | 9 +++++++++ 1 file changed, 9 insertions(+) +---------------------------------------------------------+------------+------------+------------------+ | | c95389b4cd | b22ce2785d | v3.16-rc4_070708 | +---------------------------------------------------------+------------+------------+------------------+ | boot_successes | 900 | 249 | 79 | | boot_failures | 0 | 51 | 2 | | WARNING:CPU:PID:at_drivers/block/floppy.c:schedule_bh() | 0 | 51 | 2 | | backtrace:user_reset_fdc | 0 | 51 | 2 | | backtrace:floppy_async_init | 0 | 51 | 2 | | backtrace:async_run_entry_fn | 0 | 51 | 2 | +---------------------------------------------------------+------------+------------+------------------+ [ 45.121121] PCI: setting IRQ 11 as level-triggered [ 45.407538] e1000 0000:00:03.0: setting latency timer to 64 [ 45.686040] ------------[ cut here ]------------ [ 45.975060] WARNING: CPU: 0 PID: 76 at drivers/block/floppy.c:968 schedule_bh+0x47/0x80() [ 47.312537] CPU: 0 PID: 76 Comm: kworker/u2:1 Not tainted 3.11.0-rc7-00025-gb22ce27 #8 [ 47.848171] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 48.433222] Workqueue: events_unbound async_run_entry_fn [ 48.725491] 00000000 d25d7e0c c2745b10 d25d7e3c c105472d c2ff0b10 00000000 0000004c [ 49.494639] c30df57c 000003c8 c1a2fb57 c1a2fb57 00000001 c1a33aa0 c39a0ca0 d25d7e4c [ 51.548252] c1054812 00000009 00000000 d25d7e5c c1a2fb57 00000000 00000000 d25d7e84 [ 56.319716] Call Trace: [ 58.164457] [] dump_stack+0x16/0x18 [ 61.355641] [] warn_slowpath_common+0x8d/0xb0 [ 61.822789] [] ? schedule_bh+0x47/0x80 [ 62.103589] [] ? schedule_bh+0x47/0x80 [ 62.538170] [] ? floppy_hardint+0x1a0/0x1a0 [ 62.920432] [] warn_slowpath_null+0x22/0x30 [ 63.411821] [] schedule_bh+0x47/0x80 [ 66.271423] [] wait_til_done+0x15/0x230 [ 71.033237] [] ? cascade+0xa0/0xa0 [ 75.990887] [] user_reset_fdc+0xa7/0x110 [ 78.921153] [] floppy_async_init+0x9ee/0x1263 [ 83.512228] [] async_run_entry_fn+0x44/0x1e0 [ 85.519262] [] ? __list_del_entry+0x202/0x220 [ 87.120223] [] ? pwq_dec_nr_in_flight+0x137/0x140 [ 89.499077] [] process_one_work+0x33b/0x840 [ 90.274017] [] ? ttwu_do_wakeup+0x12/0x90 [ 91.294072] [] ? __list_del_entry+0x202/0x220 [ 91.634105] [] worker_thread+0x414/0x890 [ 91.994817] [] ? process_one_work+0x840/0x840 [ 92.265965] [] kthread+0xa6/0xb0 [ 92.658144] [] ret_from_kernel_thread+0x1b/0x28 [ 92.971880] [] ? __kthread_unpark+0x70/0x70 [ 93.254541] ---[ end trace 857f8343e4eb716b ]--- [ 93.508601] floppy0: no floppy controllers found git bisect start v3.11 v3.10 -- git bisect good 89d0abe3d695103505c025dde6e07b9c3dd772f4 # 13:01 300+ 0 Merge tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 git bisect good 8742f229b635bf1c1c84a3dfe5e47c814c20b5c8 # 13:18 300+ 0 userns: limit the maximum depth of user_namespace->parent chain git bisect good 94fc5d9de5bd757ad46f0d94bc4ebf617c4487f6 # 13:35 300+ 0 proc: return on proc_readdir error git bisect good 5e25e4f304ce53004dcd01725938c0bc45c75d35 # 13:48 300+ 0 Merge tag 'usb-3.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb git bisect good 06a557f7a68e126181f09831cb2fac6f7a7f43c2 # 14:22 300+ 1 Merge tag 'nfs-for-3.11-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs git bisect bad 936dbcc3f2fc7314f3c1645fc39ea51117f6d73b # 14:36 15- 2 Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound git bisect bad 155e3a353943a3e713cd33357cfd58e995900a81 # 14:50 13- 1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git bisect bad ff497452636f4687e517964817b7e2bd99f4b44b # 15:06 48- 2 Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq git bisect bad b22ce2785d97423846206cceec4efee0c4afd980 # 15:51 13- 2 workqueue: cond_resched() after processing each work item # first bad commit: [b22ce2785d97423846206cceec4efee0c4afd980] workqueue: cond_resched() after processing each work item git bisect good c95389b4cd6a4b52af78bea706a274453e886251 # 16:51 900+ 0 Merge branch 'akpm' (patches from Andrew Morton) git bisect bad 98610962bbb418f6d847c402637ca14e78c38344 # 16:51 0- 2 0day head guard for 'devel-hourly-2014070708' git bisect bad cd3de83f147601356395b57a8673e9c5ff1e59d1 # 16:51 0- 34 Linux 3.16-rc4 git bisect bad d584f2acf5cc6c4a14c330930222d8e70fe024a6 # 18:02 62- 1 Add linux-next specific files for 20140707 This script may reproduce the error. ----------------------------------------------------------------------------- #!/bin/bash kernel=$1 initrd=yocto-minimal-i386.cgz wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd kvm=( qemu-system-x86_64 -cpu kvm64 -enable-kvm -kernel $kernel -initrd $initrd -smp 2 -m 256M -net nic,vlan=0,macaddr=00:00:00:00:00:00,model=virtio -net user,vlan=0 -net nic,vlan=1,model=e1000 -net user,vlan=1 -boot order=nc -no-reboot -watchdog i6300esb -serial stdio -display none -monitor null ) append=( debug sched_debug apic=debug ignore_loglevel sysrq_always_enabled panic=10 prompt_ramdisk=0 earlyprintk=ttyS0,115200 console=ttyS0,115200 console=tty0 vga=normal root=/dev/ram0 rw ) "${kvm[@]}" --append "${append[*]}" ----------------------------------------------------------------------------- Thanks, Fengguang