* blk-mq + bfq: udevd hang on usb2 storages @ 2017-12-01 17:04 Alban Browaeys 2017-12-01 17:29 ` Ming Lei 2017-12-04 10:57 ` Ming Lei 0 siblings, 2 replies; 5+ messages in thread From: Alban Browaeys @ 2017-12-01 17:04 UTC (permalink / raw) To: Ming Lei; +Cc: Jens Axboe, linux-block I initially reported as https://bugzilla.kernel.org/show_bug.cgi?id=198 023 . I have now bisected this issue to commit a6a252e6491443c1c1 "blk-mq- sched: decide how to handle flush rq via RQF_FLUSH_SEQ". This is with an USB stick Sandisk Cruzer (USB Version: 2.10) I regressed with. systemctl restart systemd-udevd restores sanity. PS: With an USB3 Lexar (USB Version: 3.00) I get more severe an issue (not bisected) where I find no way out of reboot. My report to bugzilla has logs when I was swapping between the these keys. The logs attached there mixes what looks like two different behaviors. Thanks, Alban Logs with this Sandisk device : - after plugging I get (after a few seconds to let detection proceed): nov. 30 18:04:32 cyclope kernel: systemd-udevd D 0 1154 397 0x00000100 nov. 30 18:04:32 cyclope kernel: Call Trace: nov. 30 18:04:32 cyclope kernel: ? __schedule+0x7cc/0x1e10 nov. 30 18:04:32 cyclope kernel: ? find_get_pages_range_tag+0x1520/0x1520 nov. 30 18:04:32 cyclope kernel: ? pci_mmcfg_check_reserved+0x100/0x100 nov. 30 18:04:32 cyclope kernel: ? sched_clock+0x5/0x10 nov. 30 18:04:32 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 30 18:04:32 cyclope kernel: ? __lock_acquire.isra.33+0x35a/0x1c70 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? blk_lld_busy+0x40/0x40 nov. 30 18:04:32 cyclope kernel: ? lockdep_rcu_suspicious+0x150/0x150 nov. 30 18:04:32 cyclope kernel: ? sched_clock+0x5/0x10 nov. 30 18:04:32 cyclope kernel: ? sched_clock+0x5/0x10 nov. 30 18:04:32 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 30 18:04:32 cyclope kernel: schedule+0xe2/0x380 nov. 30 18:04:32 cyclope kernel: ? wait_on_page_bit_common+0x279/0x730 nov. 30 18:04:32 cyclope kernel: ? __schedule+0x1e10/0x1e10 nov. 30 18:04:32 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 30 18:04:32 cyclope kernel: ? lock_release+0x10b0/0x10b0 nov. 30 18:04:32 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 30 18:04:32 cyclope kernel: io_schedule+0x17/0x60 nov. 30 18:04:32 cyclope kernel: wait_on_page_bit_common+0x47d/0x730 nov. 30 18:04:32 cyclope kernel: ? __page_cache_alloc+0x340/0x340 nov. 30 18:04:32 cyclope kernel: ? reuse_swap_page+0x220/0x8f0 nov. 30 18:04:32 cyclope kernel: ? swp_swapcount+0x450/0x450 nov. 30 18:04:32 cyclope kernel: ? lockdep_rcu_suspicious+0x150/0x150 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? sched_clock+0x5/0x10 nov. 30 18:04:32 cyclope kernel: ? page_cache_tree_insert+0x2d0/0x2d0 nov. 30 18:04:32 cyclope kernel: ? force_page_cache_readahead+0x1bd/0x320 nov. 30 18:04:32 cyclope kernel: ? force_page_cache_readahead+0x1bd/0x320 nov. 30 18:04:32 cyclope kernel: generic_file_read_iter+0x8a7/0x2760 nov. 30 18:04:32 cyclope kernel: ? filemap_range_has_page+0x2e0/0x2e0 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 30 18:04:32 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 30 18:04:32 cyclope kernel: ? sched_clock+0x5/0x10 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 30 18:04:32 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? sched_clock+0x5/0x10 nov. 30 18:04:32 cyclope kernel: ? sched_clock+0x5/0x10 nov. 30 18:04:32 cyclope kernel: ? sched_clock_cpu+0x18/0x170 nov. 30 18:04:32 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 30 18:04:32 cyclope kernel: ? lock_acquire+0x190/0x3e0 nov. 30 18:04:32 cyclope kernel: ? aa_file_perm+0xdb/0xb80 nov. 30 18:04:32 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 30 18:04:32 cyclope kernel: ? lock_release+0x10b0/0x10b0 nov. 30 18:04:32 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 30 18:04:32 cyclope kernel: ? __bpf_prog_run32+0xad/0x110 nov. 30 18:04:32 cyclope kernel: ? __bpf_prog_run64+0x110/0x110 nov. 30 18:04:32 cyclope kernel: ? aa_file_perm+0x1db/0xb80 nov. 30 18:04:32 cyclope kernel: ? aa_path_link+0x480/0x480 nov. 30 18:04:32 cyclope kernel: ? seccomp_run_filters+0x132/0x220 nov. 30 18:04:32 cyclope kernel: ? populate_seccomp_data+0xa90/0xa90 nov. 30 18:04:32 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 30 18:04:32 cyclope kernel: ? __fsnotify_update_child_dentry_flags.part.1+0x250/0x250 nov. 30 18:04:32 cyclope kernel: ? __lock_acquire.isra.33+0x35a/0x1c70 nov. 30 18:04:32 cyclope kernel: ? __seccomp_filter+0x94/0xcf0 nov. 30 18:04:32 cyclope kernel: ? do_seccomp+0x2200/0x2200 nov. 30 18:04:32 cyclope kernel: __vfs_read+0x513/0x7d0 nov. 30 18:04:32 cyclope kernel: ? vfs_copy_file_range+0x980/0x980 nov. 30 18:04:32 cyclope kernel: ? block_llseek+0x60/0xc0 nov. 30 18:04:32 cyclope kernel: vfs_read+0xf4/0x2f0 nov. 30 18:04:32 cyclope kernel: ? up_write+0x62/0x110 nov. 30 18:04:32 cyclope kernel: SyS_read+0xc7/0x1d0 nov. 30 18:04:32 cyclope kernel: ? kernel_write+0x130/0x130 nov. 30 18:04:32 cyclope kernel: ? kernel_write+0x130/0x130 nov. 30 18:04:32 cyclope kernel: do_syscall_64+0x218/0x5f0 nov. 30 18:04:32 cyclope kernel: ? syscall_return_slowpath+0x2d0/0x2d0 nov. 30 18:04:32 cyclope kernel: ? syscall_return_slowpath+0x151/0x2d0 nov. 30 18:04:32 cyclope kernel: ? prepare_exit_to_usermode+0x200/0x200 nov. 30 18:04:32 cyclope kernel: ? prepare_exit_to_usermode+0xcd/0x200 nov. 30 18:04:32 cyclope kernel: ? perf_trace_sys_enter+0x1030/0x1030 nov. 30 18:04:32 cyclope kernel: ? __put_user_4+0x1c/0x30 nov. 30 18:04:32 cyclope kernel: entry_SYSCALL64_slow_path+0x25/0x25 nov. 30 18:04:32 cyclope kernel: RIP: 0033:0x7f1fad232330 nov. 30 18:04:32 cyclope kernel: RSP: 002b:00007ffd1d6eca68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 nov. 30 18:04:32 cyclope kernel: RAX: ffffffffffffffda RBX: 0000560d12ebfdf0 RCX: 00007f1fad232330 nov. 30 18:04:32 cyclope kernel: RDX: 0000000000000040 RSI: 0000560d12ebff48 RDI: 000000000000000f nov. 30 18:04:32 cyclope kernel: RBP: 0000000773ef0000 R08: 00007f1fad21c308 R09: 0000000000000070 nov. 30 18:04:32 cyclope kernel: R10: 000000000000006e R11: 0000000000000246 R12: 0000000000000040 nov. 30 18:04:32 cyclope kernel: R13: 0000560d12ebff20 R14: 0000560d12ebfe40 R15: 0000560d12ebff38 - when I unplug (from a previous session): nov. 29 23:28:19 cyclope kernel: sysrq: SysRq : Show Blocked State nov. 29 23:28:19 cyclope kernel: task PC stack pid father nov. 29 23:28:19 cyclope kernel: kworker/0:1 D 0 33 2 0x80000000 nov. 29 23:28:19 cyclope kernel: Workqueue: usb_hub_wq hub_event [usbcore] nov. 29 23:28:19 cyclope kernel: Call Trace: nov. 29 23:28:19 cyclope kernel: ? __schedule+0x7cc/0x1e10 nov. 29 23:28:19 cyclope kernel: ? pci_mmcfg_check_reserved+0x100/0x100 nov. 29 23:28:19 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 29 23:28:19 cyclope kernel: ? prepare_to_wait_event+0x1b2/0x810 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? prepare_to_wait_exclusive+0x380/0x380 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: schedule+0xe2/0x380 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: ? __schedule+0x1e10/0x1e10 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 29 23:28:19 cyclope kernel: ? rq_clock+0xf0/0xf0 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 29 23:28:19 cyclope kernel: blk_mq_freeze_queue_wait+0x18f/0x380 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? blk_mq_poll_stats_fn+0x220/0x220 nov. 29 23:28:19 cyclope kernel: ? __mutex_unlock_slowpath+0xe1/0x880 nov. 29 23:28:19 cyclope kernel: ? wait_for_completion+0x710/0x710 nov. 29 23:28:19 cyclope kernel: ? lock_release+0x10a0/0x10a0 nov. 29 23:28:19 cyclope kernel: ? __blk_mq_delay_run_hw_queue+0x12a/0x1a0 nov. 29 23:28:19 cyclope kernel: ? finish_wait+0x360/0x360 nov. 29 23:28:19 cyclope kernel: blk_cleanup_queue+0x25b/0x540 nov. 29 23:28:19 cyclope kernel: __scsi_remove_device+0xda/0x2a0 [scsi_mod] nov. 29 23:28:19 cyclope kernel: scsi_forget_host+0x58/0x1b0 [scsi_mod] nov. 29 23:28:19 cyclope kernel: scsi_remove_host+0xd0/0x200 [scsi_mod] nov. 29 23:28:19 cyclope kernel: usb_stor_disconnect+0xe5/0x200 [usb_storage] nov. 29 23:28:19 cyclope kernel: usb_unbind_interface+0x1b9/0xa60 [usbcore] nov. 29 23:28:19 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 29 23:28:19 cyclope kernel: ? rpm_suspend+0x12d0/0x12d0 nov. 29 23:28:19 cyclope kernel: ? usb_autoresume_device+0x40/0x40 [usbcore] nov. 29 23:28:19 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 29 23:28:19 cyclope kernel: ? rq_clock+0xf0/0xf0 nov. 29 23:28:19 cyclope kernel: device_release_driver_internal+0x277/0x530 nov. 29 23:28:19 cyclope kernel: bus_remove_device+0x2b1/0x570 nov. 29 23:28:19 cyclope kernel: device_del+0x517/0x9c0 nov. 29 23:28:19 cyclope kernel: ? kfree+0xea/0x210 nov. 29 23:28:19 cyclope kernel: ? __device_links_no_driver+0x280/0x280 nov. 29 23:28:19 cyclope kernel: ? usb_remove_ep_devs+0x32/0x70 [usbcore] nov. 29 23:28:19 cyclope kernel: ? remove_intf_ep_devs+0xe3/0x190 [usbcore] nov. 29 23:28:19 cyclope kernel: usb_disable_device+0x1c6/0x6b0 [usbcore] nov. 29 23:28:19 cyclope kernel: usb_disconnect+0x221/0x760 [usbcore] nov. 29 23:28:19 cyclope kernel: hub_event+0x13e2/0x37d0 [usbcore] nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? hub_port_debounce+0x330/0x330 [usbcore] nov. 29 23:28:19 cyclope kernel: ? __lock_acquire.isra.34+0x35a/0x1c50 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? lockdep_rcu_suspicious+0x150/0x150 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock_cpu+0x18/0x170 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: ? lock_acquire+0x190/0x3e0 nov. 29 23:28:19 cyclope kernel: ? process_one_work+0x9f8/0x1b20 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: ? lock_release+0x10a0/0x10a0 nov. 29 23:28:19 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 29 23:28:19 cyclope kernel: process_one_work+0xae5/0x1b20 nov. 29 23:28:19 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 29 23:28:19 cyclope kernel: ? queue_delayed_work_on+0x40/0x40 nov. 29 23:28:19 cyclope kernel: ? _raw_spin_unlock_irq+0x1f/0x30 nov. 29 23:28:19 cyclope kernel: ? finish_task_switch+0x1d3/0x6f0 nov. 29 23:28:19 cyclope kernel: ? finish_task_switch+0x1aa/0x6f0 nov. 29 23:28:19 cyclope kernel: ? wq_worker_waking_up+0xc0/0xc0 nov. 29 23:28:19 cyclope kernel: ? preempt_notifier_dec+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? __schedule+0x7d4/0x1e10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? __lock_acquire.isra.34+0x35a/0x1c50 nov. 29 23:28:19 cyclope kernel: ? lockdep_rcu_suspicious+0x150/0x150 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: ? worker_thread+0x426/0x1800 nov. 29 23:28:19 cyclope kernel: ? lock_acquire+0x190/0x3e0 nov. 29 23:28:19 cyclope kernel: ? worker_thread+0x426/0x1800 nov. 29 23:28:19 cyclope kernel: ? print_unlock_imbalance_bug+0x140/0x140 nov. 29 23:28:19 cyclope kernel: ? lock_release+0x10a0/0x10a0 nov. 29 23:28:19 cyclope kernel: ? alloc_worker+0x2f0/0x2f0 nov. 29 23:28:19 cyclope kernel: worker_thread+0x208/0x1800 nov. 29 23:28:19 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 29 23:28:19 cyclope kernel: ? process_one_work+0x1b20/0x1b20 nov. 29 23:28:19 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 29 23:28:19 cyclope kernel: ? _raw_spin_unlock_irq+0x1f/0x30 nov. 29 23:28:19 cyclope kernel: ? finish_task_switch+0x1d3/0x6f0 nov. 29 23:28:19 cyclope kernel: ? finish_task_switch+0x1aa/0x6f0 nov. 29 23:28:19 cyclope kernel: ? preempt_notifier_dec+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? __schedule+0x7d4/0x1e10 nov. 29 23:28:19 cyclope kernel: ? pci_mmcfg_check_reserved+0x100/0x100 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock_cpu+0x104/0x170 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: ? schedule+0xe2/0x380 nov. 29 23:28:19 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 29 23:28:19 cyclope kernel: ? __schedule+0x1e10/0x1e10 nov. 29 23:28:19 cyclope kernel: ? do_wait_intr+0x320/0x320 nov. 29 23:28:19 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 29 23:28:19 cyclope kernel: ? process_one_work+0x1b20/0x1b20 nov. 29 23:28:19 cyclope kernel: kthread+0x30d/0x3d0 nov. 29 23:28:19 cyclope kernel: ? kthread_create_worker_on_cpu+0xc0/0xc0 nov. 29 23:28:19 cyclope kernel: ret_from_fork+0x1f/0x30 nov. 29 23:28:19 cyclope kernel: systemd-udevd D 0 1166 398 0x00000100 nov. 29 23:28:19 cyclope kernel: Call Trace: nov. 29 23:28:19 cyclope kernel: ? __schedule+0x7cc/0x1e10 nov. 29 23:28:19 cyclope kernel: ? find_get_pages_tag+0x13c0/0x13c0 nov. 29 23:28:19 cyclope kernel: ? pci_mmcfg_check_reserved+0x100/0x100 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? __lock_acquire.isra.34+0x35a/0x1c50 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? blk_lld_busy+0x40/0x40 nov. 29 23:28:19 cyclope kernel: ? lockdep_rcu_suspicious+0x150/0x150 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: schedule+0xe2/0x380 nov. 29 23:28:19 cyclope kernel: ? wait_on_page_bit_common+0x279/0x730 nov. 29 23:28:19 cyclope kernel: ? __schedule+0x1e10/0x1e10 nov. 29 23:28:19 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 29 23:28:19 cyclope kernel: ? lock_release+0x10a0/0x10a0 nov. 29 23:28:19 cyclope kernel: ? do_raw_spin_trylock+0x180/0x180 nov. 29 23:28:19 cyclope kernel: io_schedule+0x17/0x60 nov. 29 23:28:19 cyclope kernel: wait_on_page_bit_common+0x47d/0x730 nov. 29 23:28:19 cyclope kernel: ? __page_cache_alloc+0x340/0x340 nov. 29 23:28:19 cyclope kernel: ? reuse_swap_page+0x220/0x8f0 nov. 29 23:28:19 cyclope kernel: ? swp_swapcount+0x450/0x450 nov. 29 23:28:19 cyclope kernel: ? lockdep_rcu_suspicious+0x150/0x150 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? page_cache_tree_insert+0x2a0/0x2a0 nov. 29 23:28:19 cyclope kernel: ? force_page_cache_readahead+0x1bd/0x320 nov. 29 23:28:19 cyclope kernel: ? force_page_cache_readahead+0x1bd/0x320 nov. 29 23:28:19 cyclope kernel: generic_file_read_iter+0x8a7/0x2760 nov. 29 23:28:19 cyclope kernel: ? filemap_range_has_page+0x2e0/0x2e0 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock+0x5/0x10 nov. 29 23:28:19 cyclope kernel: ? sched_clock_cpu+0x18/0x170 nov. 29 23:28:19 cyclope kernel: ? find_held_lock+0x39/0x1c0 nov. 29 23:28:19 cyclope kernel: ? lock_acquire+0x190/0x3e0 nov. 29 23:28:19 cyclope kernel: ? aa_file_perm+0xdb/0xb80 nov. 29 23:28:19 cyclope kernel: ? lock_downgrade+0x690/0x690 nov. 29 23:28:19 cyclope kernel: ? lock_release+0x10a0/0x10a0 nov. 29 23:28:19 cyclope kernel: ? tsc_resume+0x10/0x10 nov. 29 23:28:19 cyclope kernel: ? __bpf_prog_run32+0xad/0x110 nov. 29 23:28:19 cyclope kernel: ? __bpf_prog_run64+0x110/0x110 nov. 29 23:28:19 cyclope kernel: ? aa_file_perm+0x1db/0xb80 nov. 29 23:28:19 cyclope kernel: ? aa_path_link+0x480/0x480 nov. 29 23:28:19 cyclope kernel: ? seccomp_run_filters+0x132/0x220 nov. 29 23:28:19 cyclope kernel: ? populate_seccomp_data+0xa90/0xa90 nov. 29 23:28:19 cyclope kernel: ? graph_lock+0xd0/0xd0 nov. 29 23:28:19 cyclope kernel: ? __fsnotify_update_child_dentry_flags.part.1+0x250/0x250 nov. 29 23:28:19 cyclope kernel: ? __lock_acquire.isra.34+0x35a/0x1c50 nov. 29 23:28:19 cyclope kernel: ? __seccomp_filter+0x94/0xcf0 nov. 29 23:28:19 cyclope kernel: ? do_seccomp+0x2200/0x2200 nov. 29 23:28:19 cyclope kernel: __vfs_read+0x513/0x7d0 nov. 29 23:28:19 cyclope kernel: ? vfs_copy_file_range+0x980/0x980 nov. 29 23:28:19 cyclope kernel: ? block_llseek+0x60/0xc0 nov. 29 23:28:19 cyclope kernel: vfs_read+0xf4/0x2f0 nov. 29 23:28:19 cyclope kernel: ? up_write+0x62/0x110 nov. 29 23:28:19 cyclope kernel: SyS_read+0xc7/0x1d0 nov. 29 23:28:19 cyclope kernel: ? kernel_write+0x130/0x130 nov. 29 23:28:19 cyclope kernel: ? kernel_write+0x130/0x130 nov. 29 23:28:19 cyclope kernel: do_syscall_64+0x218/0x5f0 nov. 29 23:28:19 cyclope kernel: ? syscall_return_slowpath+0x2d0/0x2d0 nov. 29 23:28:19 cyclope kernel: ? syscall_return_slowpath+0x151/0x2d0 nov. 29 23:28:19 cyclope kernel: ? prepare_exit_to_usermode+0x200/0x200 nov. 29 23:28:19 cyclope kernel: ? prepare_exit_to_usermode+0xcd/0x200 nov. 29 23:28:19 cyclope kernel: ? perf_trace_sys_enter+0xfe0/0xfe0 nov. 29 23:28:19 cyclope kernel: ? __put_user_4+0x1c/0x30 nov. 29 23:28:19 cyclope kernel: entry_SYSCALL64_slow_path+0x25/0x25 nov. 29 23:28:19 cyclope kernel: RIP: 0033:0x7efc0cd5b330 nov. 29 23:28:19 cyclope kernel: RSP: 002b:00007ffe3770c4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 nov. 29 23:28:19 cyclope kernel: RAX: ffffffffffffffda RBX: 000055bee6b74c60 RCX: 00007efc0cd5b330 nov. 29 23:28:19 cyclope kernel: RDX: 0000000000000040 RSI: 000055bee6b74db8 RDI: 000000000000000f nov. 29 23:28:19 cyclope kernel: RBP: 0000000773ef0000 R08: 00007efc0cd45278 R09: 0000000000000070 nov. 29 23:28:19 cyclope kernel: R10: 000000000000006e R11: 0000000000000246 R12: 0000000000000040 nov. 29 23:28:19 cyclope kernel: R13: 000055bee6b74d90 R14: 000055bee6b74cb0 R15: 000055bee6b74da8 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: blk-mq + bfq: udevd hang on usb2 storages 2017-12-01 17:04 blk-mq + bfq: udevd hang on usb2 storages Alban Browaeys @ 2017-12-01 17:29 ` Ming Lei 2017-12-04 10:57 ` Ming Lei 1 sibling, 0 replies; 5+ messages in thread From: Ming Lei @ 2017-12-01 17:29 UTC (permalink / raw) To: Alban Browaeys; +Cc: Jens Axboe, linux-block On Fri, Dec 01, 2017 at 06:04:29PM +0100, Alban Browaeys wrote: > I initially reported as https://bugzilla.kernel.org/show_bug.cgi?id=198 > 023 . > > I have now bisected this issue to commit a6a252e6491443c1c1 "blk-mq- > sched: decide how to handle flush rq via RQF_FLUSH_SEQ". > > This is with an USB stick Sandisk Cruzer (USB Version: 2.10) I > regressed with. > systemctl restart systemd-udevd restores sanity. >From my investigation, looks the issue is in BFQ: https://marc.info/?l=linux-block&m=151214241518562&w=2 Could you apply the attached debug patch, run your test and provide us the log? Thanks, Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: blk-mq + bfq: udevd hang on usb2 storages 2017-12-01 17:04 blk-mq + bfq: udevd hang on usb2 storages Alban Browaeys 2017-12-01 17:29 ` Ming Lei @ 2017-12-04 10:57 ` Ming Lei 2017-12-07 18:04 ` Paolo Valente 1 sibling, 1 reply; 5+ messages in thread From: Ming Lei @ 2017-12-04 10:57 UTC (permalink / raw) To: Alban Browaeys, Paolo Valente; +Cc: Jens Axboe, linux-block On Fri, Dec 01, 2017 at 06:04:29PM +0100, Alban Browaeys wrote: > I initially reported as https://bugzilla.kernel.org/show_bug.cgi?id=198 > 023 . > > I have now bisected this issue to commit a6a252e6491443c1c1 "blk-mq- > sched: decide how to handle flush rq via RQF_FLUSH_SEQ". > > This is with an USB stick Sandisk Cruzer (USB Version: 2.10) I > regressed with. > systemctl restart systemd-udevd restores sanity. > > PS: With an USB3 Lexar (USB Version: 3.00) I get more severe an issue > (not bisected) where I find no way out of reboot. My report to bugzilla > has logs when I was swapping between the these keys. The logs attached > there mixes what looks like two different behaviors. Hi Paolo, >From both Alban's trace and my trace, looks this issue is in BFQ, since request can't be retrieved via e->type->ops.mq.dispatch_request() in blk_mq_do_dispatch_sched() after it is inserted into BFQ's queue. https://bugzilla.kernel.org/show_bug.cgi?id=198023#c4 https://marc.info/?l=linux-block&m=151214241518562&w=2 BTW, I have tried to reproduce the issue with scsi_debug, but not succeed, and it can't be reproduced with other schedulers(mq-deadline, none) too. So could you take a look? Thanks, Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: blk-mq + bfq: udevd hang on usb2 storages 2017-12-04 10:57 ` Ming Lei @ 2017-12-07 18:04 ` Paolo Valente 2017-12-08 1:28 ` Ming Lei 0 siblings, 1 reply; 5+ messages in thread From: Paolo Valente @ 2017-12-07 18:04 UTC (permalink / raw) To: Ming Lei Cc: Alban Browaeys, Jens Axboe, linux-block, SERENA ZIVIANI, Ulf Hansson, Linus Walleij > Il giorno 04 dic 2017, alle ore 11:57, Ming Lei <ming.lei@redhat.com> = ha scritto: >=20 > On Fri, Dec 01, 2017 at 06:04:29PM +0100, Alban Browaeys wrote: >> I initially reported as = https://bugzilla.kernel.org/show_bug.cgi?id=3D198 >> 023 . >>=20 >> I have now bisected this issue to commit a6a252e6491443c1c1 "blk-mq- >> sched: decide how to handle flush rq via RQF_FLUSH_SEQ". >>=20 >> This is with an USB stick Sandisk Cruzer (USB Version: 2.10) I >> regressed with. >> systemctl restart systemd-udevd restores sanity. >>=20 >> PS: With an USB3 Lexar (USB Version: 3.00) I get more severe an = issue >> (not bisected) where I find no way out of reboot. My report to = bugzilla >> has logs when I was swapping between the these keys. The logs = attached >> there mixes what looks like two different behaviors. >=20 > Hi Paolo, >=20 > =46rom both Alban's trace and my trace, looks this issue is in BFQ, > since request can't be retrieved via = e->type->ops.mq.dispatch_request() > in blk_mq_do_dispatch_sched() after it is inserted into BFQ's queue. >=20 > https://bugzilla.kernel.org/show_bug.cgi?id=3D198023#c4 > https://marc.info/?l=3Dlinux-block&m=3D151214241518562&w=3D2 >=20 > BTW, I have tried to reproduce the issue with scsi_debug, but not = succeed, > and it can't be reproduced with other schedulers(mq-deadline, none) = too. >=20 > So could you take a look? >=20 Hi Ming, all, sorry for the delay, but we preferred to reply directly after finding the cause of the problem. And the cause is that gdisk makes an I/O request that is dispatched to the drive, but apparently never completed (as Serena, in CC discovered). Or, at least, the execution of completed_request in bfq is never triggered. In more detail: disk is a process for which bfq performs device idling (for good reasons), and, for one such process, bfq does not switch to serving another process until the last pending request of the process is completed, after which device idling is started, to wait for the next request of the process. So, if such a last request is never completed, bfq remains forever waiting for such an event, and then refuses forever to deliver requests of other queues. As for why bfq_completed_request is not executed for the above, dispatched request, the reason is either that the bfq_finish_request hook is not invoked at all, or that it is invoked, but the request does not have the RQF_STARTED flag set. Discovering which event occurs is our next step. We'll let you know. Thanks, Paolo > Thanks, > Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: blk-mq + bfq: udevd hang on usb2 storages 2017-12-07 18:04 ` Paolo Valente @ 2017-12-08 1:28 ` Ming Lei 0 siblings, 0 replies; 5+ messages in thread From: Ming Lei @ 2017-12-08 1:28 UTC (permalink / raw) To: Paolo Valente Cc: Alban Browaeys, Jens Axboe, linux-block, SERENA ZIVIANI, Ulf Hansson, Linus Walleij Hi Paolo, On Thu, Dec 07, 2017 at 07:04:54PM +0100, Paolo Valente wrote: > > > Il giorno 04 dic 2017, alle ore 11:57, Ming Lei <ming.lei@redhat.com> ha scritto: > > > > On Fri, Dec 01, 2017 at 06:04:29PM +0100, Alban Browaeys wrote: > >> I initially reported as https://bugzilla.kernel.org/show_bug.cgi?id=198 > >> 023 . > >> > >> I have now bisected this issue to commit a6a252e6491443c1c1 "blk-mq- > >> sched: decide how to handle flush rq via RQF_FLUSH_SEQ". > >> > >> This is with an USB stick Sandisk Cruzer (USB Version: 2.10) I > >> regressed with. > >> systemctl restart systemd-udevd restores sanity. > >> > >> PS: With an USB3 Lexar (USB Version: 3.00) I get more severe an issue > >> (not bisected) where I find no way out of reboot. My report to bugzilla > >> has logs when I was swapping between the these keys. The logs attached > >> there mixes what looks like two different behaviors. > > > > Hi Paolo, > > > > From both Alban's trace and my trace, looks this issue is in BFQ, > > since request can't be retrieved via e->type->ops.mq.dispatch_request() > > in blk_mq_do_dispatch_sched() after it is inserted into BFQ's queue. > > > > https://bugzilla.kernel.org/show_bug.cgi?id=198023#c4 > > https://marc.info/?l=linux-block&m=151214241518562&w=2 > > > > BTW, I have tried to reproduce the issue with scsi_debug, but not succeed, > > and it can't be reproduced with other schedulers(mq-deadline, none) too. > > > > So could you take a look? > > > > Hi Ming, all, > sorry for the delay, but we preferred to reply directly after finding > the cause of the problem. And the cause is that gdisk makes an I/O Not a problem, :-) In the previous mail, I just want to share you our findings. > request that is dispatched to the drive, but apparently never > completed (as Serena, in CC discovered). Or, at least, the execution > of completed_request in bfq is never triggered. I can understand the case a bit, and the following info may be helpful for you: 1) USB's queue depth is one 2) the only pending request is completed, and scsi_finish_command() is called 3) inside scsi_finish_command(), scsi_device_unbusy() is called at the beginning, once it is done, blk_mq_get_dispatch_budget() in blk_mq_do_dispatch_sched() returns true, then we can start to try to dispatch request 4) e->type->ops.mq.dispatch_request() is called, but the request in 2) isn't completed yet, completed_request in bfq isn't be run yet because it is called later from scsi_end_request()(<-scsi_io_completion()<-scsi_finish_command()) Then no request can be dispatched any more, and hang happens, but finally completed_request should be run later. > > In more detail: disk is a process for which bfq performs device idling > (for good reasons), and, for one such process, bfq does not switch to > serving another process until the last pending request of the process > is completed, after which device idling is started, to wait for the > next request of the process. So, if such a last request is never > completed, bfq remains forever waiting for such an event, and then > refuses forever to deliver requests of other queues. > > As for why bfq_completed_request is not executed for the above, It should be run. > dispatched request, the reason is either that the bfq_finish_request > hook is not invoked at all, or that it is invoked, but the request > does not have the RQF_STARTED flag set. Discovering which event The flag of RQF_STARTED is set only if there is one request found by __bfq_dispatch_request(), which can never happen in this case, since we observed no request is found by __bfq_dispatch_request() even though it has been inserted to BFQ queue already. > occurs is our next step. > > We'll let you know. Thanks, Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-12-08 1:28 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-12-01 17:04 blk-mq + bfq: udevd hang on usb2 storages Alban Browaeys 2017-12-01 17:29 ` Ming Lei 2017-12-04 10:57 ` Ming Lei 2017-12-07 18:04 ` Paolo Valente 2017-12-08 1:28 ` Ming Lei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox