From mboxrd@z Thu Jan 1 00:00:00 1970 From: swise@opengridcomputing.com (Steve Wise) Date: Thu, 14 Jul 2016 09:59:05 -0500 Subject: [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl In-Reply-To: <57875835.5050001@grimberg.me> References: <1468445196-6915-1-git-send-email-mlin@kernel.org> <1468445196-6915-3-git-send-email-mlin@kernel.org> <57875835.5050001@grimberg.me> Message-ID: <011201d1dde0$444943c0$ccdbcb40$@opengridcomputing.com> > This patch introduces asymmetry between create and destroy > of the admin queue. Does this alternative patch solve > the problem? > > The patch changes the order of device removal flow from: > 1. delete controller > 2. destroy queue > > to: > 1. destroy queue > 2. delete controller > > Or more specifically: > 1. own the controller deletion (make sure we are not > competing with anyone) > 2. get rid of inflight reconnects (which also destroy and > create queues) > 3. destroy the queue > 4. safely queue controller deletion > > Thoughts? > Your patch causes a deadlock during device removal. The controller delete work thread is stuck in c4iw_destroy_qp waiting on all references to go away. Either nvmf/rdma or the rdma-cm or both. (gdb) list *c4iw_destroy_qp+0x155 0x15af5 is in c4iw_destroy_qp (drivers/infiniband/hw/cxgb4/qp.c:1596). 1591 c4iw_modify_qp(rhp, qhp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0); 1592 wait_event(qhp->wait, !qhp->ep); 1593 1594 remove_handle(rhp, &rhp->qpidr, qhp->wq.sq.qid); 1595 atomic_dec(&qhp->refcnt); 1596 wait_event(qhp->wait, !atomic_read(&qhp->refcnt)); 1597 1598 spin_lock_irq(&rhp->lock); 1599 if (!list_empty(&qhp->db_fc_entry)) 1600 list_del_init(&qhp->db_fc_entry); The rdma-cm work thread is stuck trying to grab the cm_id mutex: (gdb) list *cma_disable_callback+0x2e 0x29e is in cma_disable_callback (drivers/infiniband/core/cma.c:715). 710 711 static int cma_disable_callback(struct rdma_id_private *id_priv, 712 enum rdma_cm_state state) 713 { 714 mutex_lock(&id_priv->handler_mutex); 715 if (id_priv->state != state) { 716 mutex_unlock(&id_priv->handler_mutex); 717 return -EINVAL; 718 } 719 return 0; And the nvmf cm event handler is stuck waiting for the controller delete to finish: (gdb) list *nvme_rdma_device_unplug+0x97 0x1027 is in nvme_rdma_device_unplug (drivers/nvme/host/rdma.c:1358). warning: Source file is more recent than executable. 1353 queue_delete: 1354 /* queue controller deletion */ 1355 queue_work(nvme_rdma_wq, &ctrl->delete_work); 1356 flush_work(&ctrl->delete_work); 1357 return ret; 1358 } 1359 1360 static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id, 1361 struct rdma_cm_event *ev) 1362 { Seems like the rdma-cm work thread is trying to grab the cm_id lock for the cm_id that is handling the DEVICE_REMOVAL event. Here is the detailed log: [ 1484.568354] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 10.0.1.14:4420 [ 1495.939850] nvme nvme1: creating 32 I/O queues. [ 1496.635114] nvme nvme1: new ctrl: NQN "test-ram0", addr 10.0.1.14:4420 [ 1502.813698] nvme nvme1: Got rdma device removal event, deleting ctrl [ 1683.259465] INFO: task kworker/0:0:4 blocked for more than 120 seconds. [ 1683.266147] Tainted: G E 4.7.0-rc2-block-for-next+ #78 [ 1683.272986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1683.280830] kworker/0:0 D ffff8810281579c8 0 4 2 0x10000000 [ 1683.287997] Workqueue: nvme_rdma_wq nvme_rdma_del_ctrl_work [nvme_rdma] [ 1683.294658] ffff8810281579c8 ffffffff81c0d4c0 ffff88102809c6c0 ffff880fff04e898 [ 1683.302252] ffffffff81d448b4 ffff88102809d840 ffff881028154008 ffff88007853b3f8 [ 1683.309763] ffff881028157ac8 ffff880f67ea5758 ffff880f67ea5760 ffff881028157a18 [ 1683.317281] Call Trace: [ 1683.319752] [] schedule+0x40/0xb0 [ 1683.324757] [] c4iw_destroy_qp+0x155/0x2b0 [iw_cxgb4] [ 1683.331481] [] ? schedule+0x40/0xb0 [ 1683.336639] [] ? lock_timer_base+0x5a/0x80 [ 1683.342402] [] ? schedule_timeout+0x1c9/0x220 [ 1683.348438] [] ? wait_woken+0x80/0x80 [ 1683.353804] [] ? ib_mr_pool_destroy+0x79/0x90 [ib_core] [ 1683.360699] [] ib_destroy_qp+0xbe/0x160 [ib_core] [ 1683.367085] [] rdma_destroy_qp+0x31/0x50 [rdma_cm] [ 1683.373538] [] nvme_rdma_destroy_queue_ib+0x29/0x90 [nvme_rdma] [ 1683.381127] [] nvme_rdma_stop_and_free_queue+0x2d/0x40 [nvme_rdma] [ 1683.388967] [] nvme_rdma_shutdown_ctrl+0x87/0xe0 [nvme_rdma] [ 1683.396290] [] nvme_rdma_del_ctrl_work+0x34/0x50 [nvme_rdma] [ 1683.403629] [] process_one_work+0x183/0x4d0 [ 1683.409485] [] ? __schedule+0x1f0/0x5b0 [ 1683.414984] [] ? schedule+0x40/0xb0 [ 1683.420142] [] ? dequeue_entity+0x120/0x250 [ 1683.426009] [] worker_thread+0x16d/0x530 [ 1683.431597] [] ? __switch_to+0x1cd/0x5e0 [ 1683.437176] [] ? __schedule+0x1f0/0x5b0 [ 1683.442674] [] ? __wake_up_common+0x56/0x90 [ 1683.448511] [] ? maybe_create_worker+0x120/0x120 [ 1683.454803] [] ? schedule+0x40/0xb0 [ 1683.459951] [] ? maybe_create_worker+0x120/0x120 [ 1683.466223] [] kthread+0xcc/0xf0 [ 1683.471106] [] ? schedule_tail+0x1e/0xc0 [ 1683.476699] [] ? smpboot_create_threads+0x80/0x80 [ 1683.483053] [] ret_from_fork+0x1f/0x40 [ 1683.488470] [] ? kthread_freezable_should_stop+0x70/0x70 [ 1683.495476] INFO: task kworker/u64:5:716 blocked for more than 120 seconds. [ 1683.502441] Tainted: G E 4.7.0-rc2-block-for-next+ #78 [ 1683.509237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1683.517096] kworker/u64:5 D ffff88101d1ff858 0 716 2 0x10000000 [ 1683.524223] Workqueue: iw_cm_wq cm_work_handler [iw_cm] [ 1683.529467] ffff88101d1ff858 ffff88102819c6c0 ffff88101d200040 ffff880fff04e898 [ 1683.536954] ffff88101d200040 ffff88101d200040 ffff88101d1fc008 ffff880f67a0720c [ 1683.544445] ffff88101d200040 ffff880f67a07210 ffff88101d1ff918 ffff88101d1ff8a8 [ 1683.551932] Call Trace: [ 1683.554390] [] schedule+0x40/0xb0 [ 1683.559358] [] ? is_module_text_address+0xe/0x20 [ 1683.565630] [] schedule_preempt_disabled+0xe/0x10 [ 1683.571990] [] __mutex_lock_slowpath+0x93/0x120 [ 1683.578200] [] ? find_next_bit+0xb/0x10 [ 1683.583694] [] ? cpumask_next_and+0x26/0x50 [ 1683.589529] [] ? find_busiest_group+0x14a/0xa00 [ 1683.595713] [] mutex_lock+0x23/0x40 [ 1683.600871] [] cma_disable_callback+0x2e/0x60 [rdma_cm] [ 1683.607760] [] cma_iw_handler+0x2b/0x1b0 [rdma_cm] [ 1683.614207] [] cm_close_handler+0x93/0xc0 [iw_cm] [ 1683.620561] [] process_event+0xd7/0xf0 [iw_cm] [ 1683.626665] [] cm_work_handler+0x147/0x1c0 [iw_cm] [ 1683.633121] [] ? trace_event_raw_event_workqueue_execute_start+0x66/0xa0 [ 1683.641487] [] process_one_work+0x183/0x4d0 [ 1683.647334] [] ? __schedule+0x1f0/0x5b0 [ 1683.652831] [] ? schedule+0x40/0xb0 [ 1683.657977] [] ? schedule+0x1/0xb0 [ 1683.663038] [] worker_thread+0x16d/0x530 [ 1683.668612] [] ? __switch_to+0x1cd/0x5e0 [ 1683.674197] [] ? __schedule+0x1f0/0x5b0 [ 1683.679690] [] ? __wake_up_common+0x56/0x90 [ 1683.685534] [] ? maybe_create_worker+0x120/0x120 [ 1683.691821] [] ? schedule+0x40/0xb0 [ 1683.696968] [] ? maybe_create_worker+0x120/0x120 [ 1683.703230] [] kthread+0xcc/0xf0 [ 1683.708126] [] ? schedule_tail+0x1e/0xc0 [ 1683.713708] [] ret_from_fork+0x1f/0x40 [ 1683.719118] [] ? kthread_freezable_should_stop+0x70/0x70 [ 1683.726126] INFO: task rmmod:6006 blocked for more than 120 seconds. [ 1683.732483] Tainted: G E 4.7.0-rc2-block-for-next+ #78 [ 1683.739278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1683.747111] rmmod D ffff880f647479f8 0 6006 4227 0x10000080 [ 1683.754210] ffff880f647479f8 ffff88102819af40 ffff880f647017c0 ffff880fff04e898 [ 1683.761729] ffffffff8116c656 0000000000000001 ffff880f64744008 ffff880f64747b98 [ 1683.769244] 7fffffffffffffff ffff880f647017c0 7fffffffffffffff ffff880f64747a48 [ 1683.776768] Call Trace: [ 1683.779229] [] ? trace_event_buffer_commit+0x146/0x1d0 [ 1683.786037] [] schedule+0x40/0xb0 [ 1683.791009] [] ? __switch_to+0x1cd/0x5e0 [ 1683.796585] [] schedule_timeout+0x1c9/0x220 [ 1683.802430] [] ? __schedule+0x1f0/0x5b0 [ 1683.807922] [] ? insert_work+0x53/0xb0 [ 1683.813341] [] wait_for_completion+0xde/0x110 [ 1683.819386] [] ? try_to_wake_up+0x230/0x230 [ 1683.825236] [] flush_work+0x2b/0x40 [ 1683.830379] [] ? worker_detach_from_pool+0xa0/0xa0 [ 1683.836830] [] nvme_rdma_device_unplug+0x97/0xc0 [nvme_rdma] [ 1683.844142] [] nvme_rdma_cm_handler+0x72/0x2f0 [nvme_rdma] [ 1683.851297] [] ? kmem_cache_free+0x1f4/0x210 [ 1683.857230] [] ? cma_comp+0x49/0x60 [rdma_cm] [ 1683.863247] [] cma_remove_id_dev+0x8f/0xa0 [rdma_cm] [ 1683.869865] [] cma_process_remove+0xb7/0x100 [rdma_cm] [ 1683.876681] [] ? __kernfs_remove+0x114/0x1d0 [ 1683.882608] [] cma_remove_one+0x3e/0x60 [rdma_cm] [ 1683.888983] [] ib_unregister_device+0xb0/0x150 [ib_core] [ 1683.895957] [] c4iw_unregister_device+0x64/0x90 [iw_cxgb4] [ 1683.903109] [] c4iw_remove+0x27/0x60 [iw_cxgb4] [ 1683.909303] [] c4iw_exit_module+0x31/0x934 [iw_cxgb4] [ 1683.916018] [] SyS_delete_module+0x183/0x1d0 [ 1683.921955] [] ? do_audit_syscall_entry+0x66/0x70 [ 1683.928314] [] ? trace_event_raw_event_sys_exit+0x6c/0xa0 [ 1683.935357] [] ? syscall_trace_enter+0x65/0x70 [ 1683.941501] [] do_syscall_64+0x78/0x1d0 [ 1683.947003] [] ? do_page_fault+0x37/0x90 [ 1683.952587] [] entry_SYSCALL64_slow_path+0x25/0x25