All of lore.kernel.org
 help / color / mirror / Atom feed
From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl
Date: Thu, 14 Jul 2016 09:59:05 -0500	[thread overview]
Message-ID: <011201d1dde0$444943c0$ccdbcb40$@opengridcomputing.com> (raw)
In-Reply-To: <57875835.5050001@grimberg.me>

> This patch introduces asymmetry between create and destroy
> of the admin queue. Does this alternative patch solve
> the problem?
> 
> The patch changes the order of device removal flow from:
> 1. delete controller
> 2. destroy queue
> 
> to:
> 1. destroy queue
> 2. delete controller
> 
> Or more specifically:
> 1. own the controller deletion (make sure we are not
> competing with anyone)
> 2. get rid of inflight reconnects (which also destroy and
> create queues)
> 3. destroy the queue
> 4. safely queue controller deletion
> 
> Thoughts?
> 

Your patch causes a deadlock during device removal.  

The controller delete work thread is stuck in c4iw_destroy_qp waiting on all
references to go away.  Either nvmf/rdma or the rdma-cm or both.

(gdb) list *c4iw_destroy_qp+0x155
0x15af5 is in c4iw_destroy_qp (drivers/infiniband/hw/cxgb4/qp.c:1596).
1591                    c4iw_modify_qp(rhp, qhp, C4IW_QP_ATTR_NEXT_STATE,
&attrs, 0);
1592            wait_event(qhp->wait, !qhp->ep);
1593
1594            remove_handle(rhp, &rhp->qpidr, qhp->wq.sq.qid);
1595            atomic_dec(&qhp->refcnt);
1596            wait_event(qhp->wait, !atomic_read(&qhp->refcnt));
1597
1598            spin_lock_irq(&rhp->lock);
1599            if (!list_empty(&qhp->db_fc_entry))
1600                    list_del_init(&qhp->db_fc_entry);

The rdma-cm work thread is stuck trying to grab the cm_id mutex:

(gdb) list *cma_disable_callback+0x2e
0x29e is in cma_disable_callback (drivers/infiniband/core/cma.c:715).
710
711     static int cma_disable_callback(struct rdma_id_private *id_priv,
712                                     enum rdma_cm_state state)
713     {
714             mutex_lock(&id_priv->handler_mutex);
715             if (id_priv->state != state) {
716                     mutex_unlock(&id_priv->handler_mutex);
717                     return -EINVAL;
718             }
719             return 0;

And the nvmf cm event handler is stuck waiting for the controller delete to
finish:

(gdb) list *nvme_rdma_device_unplug+0x97
0x1027 is in nvme_rdma_device_unplug (drivers/nvme/host/rdma.c:1358).
warning: Source file is more recent than executable.
1353    queue_delete:
1354            /* queue controller deletion */
1355            queue_work(nvme_rdma_wq, &ctrl->delete_work);
1356            flush_work(&ctrl->delete_work);
1357            return ret;
1358    }
1359
1360    static int nvme_rdma_cm_handler(struct rdma_cm_id *cm_id,
1361                    struct rdma_cm_event *ev)
1362    {

Seems like the rdma-cm work thread is trying to grab the cm_id lock for the
cm_id that is handling the DEVICE_REMOVAL event.


Here is the detailed log:

[ 1484.568354] nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery",
addr 10.0.1.14:4420
[ 1495.939850] nvme nvme1: creating 32 I/O queues.
[ 1496.635114] nvme nvme1: new ctrl: NQN "test-ram0", addr 10.0.1.14:4420
[ 1502.813698] nvme nvme1: Got rdma device removal event, deleting ctrl
[ 1683.259465] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[ 1683.266147]       Tainted: G            E   4.7.0-rc2-block-for-next+ #78
[ 1683.272986] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1683.280830] kworker/0:0     D ffff8810281579c8     0     4      2 0x10000000
[ 1683.287997] Workqueue: nvme_rdma_wq nvme_rdma_del_ctrl_work [nvme_rdma]
[ 1683.294658]  ffff8810281579c8 ffffffff81c0d4c0 ffff88102809c6c0
ffff880fff04e898
[ 1683.302252]  ffffffff81d448b4 ffff88102809d840 ffff881028154008
ffff88007853b3f8
[ 1683.309763]  ffff881028157ac8 ffff880f67ea5758 ffff880f67ea5760
ffff881028157a18
[ 1683.317281] Call Trace:
[ 1683.319752]  [<ffffffff816eaed0>] schedule+0x40/0xb0
[ 1683.324757]  [<ffffffffa036bac5>] c4iw_destroy_qp+0x155/0x2b0 [iw_cxgb4]
[ 1683.331481]  [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.336639]  [<ffffffff810f515a>] ? lock_timer_base+0x5a/0x80
[ 1683.342402]  [<ffffffff816ede49>] ? schedule_timeout+0x1c9/0x220
[ 1683.348438]  [<ffffffff810ccd80>] ? wait_woken+0x80/0x80
[ 1683.353804]  [<ffffffffa03a25a9>] ? ib_mr_pool_destroy+0x79/0x90 [ib_core]
[ 1683.360699]  [<ffffffffa039893e>] ib_destroy_qp+0xbe/0x160 [ib_core]
[ 1683.367085]  [<ffffffffa0706011>] rdma_destroy_qp+0x31/0x50 [rdma_cm]
[ 1683.373538]  [<ffffffffa038ed99>] nvme_rdma_destroy_queue_ib+0x29/0x90
[nvme_rdma]
[ 1683.381127]  [<ffffffffa038f04d>] nvme_rdma_stop_and_free_queue+0x2d/0x40
[nvme_rdma]
[ 1683.388967]  [<ffffffffa038f157>] nvme_rdma_shutdown_ctrl+0x87/0xe0
[nvme_rdma]
[ 1683.396290]  [<ffffffffa038f224>] nvme_rdma_del_ctrl_work+0x34/0x50
[nvme_rdma]
[ 1683.403629]  [<ffffffff810a4d03>] process_one_work+0x183/0x4d0
[ 1683.409485]  [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.414984]  [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.420142]  [<ffffffff810c4130>] ? dequeue_entity+0x120/0x250
[ 1683.426009]  [<ffffffff810a59bd>] worker_thread+0x16d/0x530
[ 1683.431597]  [<ffffffff8102eb1d>] ? __switch_to+0x1cd/0x5e0
[ 1683.437176]  [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.442674]  [<ffffffff810ccbc6>] ? __wake_up_common+0x56/0x90
[ 1683.448511]  [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.454803]  [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.459951]  [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.466223]  [<ffffffff810aa5dc>] kthread+0xcc/0xf0
[ 1683.471106]  [<ffffffff810b4ffe>] ? schedule_tail+0x1e/0xc0
[ 1683.476699]  [<ffffffff810ae3b0>] ? smpboot_create_threads+0x80/0x80
[ 1683.483053]  [<ffffffff816eed3f>] ret_from_fork+0x1f/0x40
[ 1683.488470]  [<ffffffff810aa510>] ? kthread_freezable_should_stop+0x70/0x70
[ 1683.495476] INFO: task kworker/u64:5:716 blocked for more than 120 seconds.
[ 1683.502441]       Tainted: G            E   4.7.0-rc2-block-for-next+ #78
[ 1683.509237] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1683.517096] kworker/u64:5   D ffff88101d1ff858     0   716      2 0x10000000
[ 1683.524223] Workqueue: iw_cm_wq cm_work_handler [iw_cm]
[ 1683.529467]  ffff88101d1ff858 ffff88102819c6c0 ffff88101d200040
ffff880fff04e898
[ 1683.536954]  ffff88101d200040 ffff88101d200040 ffff88101d1fc008
ffff880f67a0720c
[ 1683.544445]  ffff88101d200040 ffff880f67a07210 ffff88101d1ff918
ffff88101d1ff8a8
[ 1683.551932] Call Trace:
[ 1683.554390]  [<ffffffff816eaed0>] schedule+0x40/0xb0
[ 1683.559358]  [<ffffffff811119ae>] ? is_module_text_address+0xe/0x20
[ 1683.565630]  [<ffffffff816eb12e>] schedule_preempt_disabled+0xe/0x10
[ 1683.571990]  [<ffffffff816ecbe3>] __mutex_lock_slowpath+0x93/0x120
[ 1683.578200]  [<ffffffff813789bb>] ? find_next_bit+0xb/0x10
[ 1683.583694]  [<ffffffff81362816>] ? cpumask_next_and+0x26/0x50
[ 1683.589529]  [<ffffffff810c4baa>] ? find_busiest_group+0x14a/0xa00
[ 1683.595713]  [<ffffffff816ecc93>] mutex_lock+0x23/0x40
[ 1683.600871]  [<ffffffffa070326e>] cma_disable_callback+0x2e/0x60 [rdma_cm]
[ 1683.607760]  [<ffffffffa0707b5b>] cma_iw_handler+0x2b/0x1b0 [rdma_cm]
[ 1683.614207]  [<ffffffffa06f61b3>] cm_close_handler+0x93/0xc0 [iw_cm]
[ 1683.620561]  [<ffffffffa06f80e7>] process_event+0xd7/0xf0 [iw_cm]
[ 1683.626665]  [<ffffffffa06f8247>] cm_work_handler+0x147/0x1c0 [iw_cm]
[ 1683.633121]  [<ffffffff810a54f6>] ?
trace_event_raw_event_workqueue_execute_start+0x66/0xa0
[ 1683.641487]  [<ffffffff810a4d03>] process_one_work+0x183/0x4d0
[ 1683.647334]  [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.652831]  [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.657977]  [<ffffffff816eae91>] ? schedule+0x1/0xb0
[ 1683.663038]  [<ffffffff810a59bd>] worker_thread+0x16d/0x530
[ 1683.668612]  [<ffffffff8102eb1d>] ? __switch_to+0x1cd/0x5e0
[ 1683.674197]  [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.679690]  [<ffffffff810ccbc6>] ? __wake_up_common+0x56/0x90
[ 1683.685534]  [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.691821]  [<ffffffff816eaed0>] ? schedule+0x40/0xb0
[ 1683.696968]  [<ffffffff810a5850>] ? maybe_create_worker+0x120/0x120
[ 1683.703230]  [<ffffffff810aa5dc>] kthread+0xcc/0xf0
[ 1683.708126]  [<ffffffff810b4ffe>] ? schedule_tail+0x1e/0xc0
[ 1683.713708]  [<ffffffff816eed3f>] ret_from_fork+0x1f/0x40
[ 1683.719118]  [<ffffffff810aa510>] ? kthread_freezable_should_stop+0x70/0x70
[ 1683.726126] INFO: task rmmod:6006 blocked for more than 120 seconds.
[ 1683.732483]       Tainted: G            E   4.7.0-rc2-block-for-next+ #78
[ 1683.739278] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1683.747111] rmmod           D ffff880f647479f8     0  6006   4227 0x10000080
[ 1683.754210]  ffff880f647479f8 ffff88102819af40 ffff880f647017c0
ffff880fff04e898
[ 1683.761729]  ffffffff8116c656 0000000000000001 ffff880f64744008
ffff880f64747b98
[ 1683.769244]  7fffffffffffffff ffff880f647017c0 7fffffffffffffff
ffff880f64747a48
[ 1683.776768] Call Trace:
[ 1683.779229]  [<ffffffff8116c656>] ? trace_event_buffer_commit+0x146/0x1d0
[ 1683.786037]  [<ffffffff816eaed0>] schedule+0x40/0xb0
[ 1683.791009]  [<ffffffff8102eb1d>] ? __switch_to+0x1cd/0x5e0
[ 1683.796585]  [<ffffffff816ede49>] schedule_timeout+0x1c9/0x220
[ 1683.802430]  [<ffffffff816eaa10>] ? __schedule+0x1f0/0x5b0
[ 1683.807922]  [<ffffffff810a3603>] ? insert_work+0x53/0xb0
[ 1683.813341]  [<ffffffff816ebede>] wait_for_completion+0xde/0x110
[ 1683.819386]  [<ffffffff810b6b00>] ? try_to_wake_up+0x230/0x230
[ 1683.825236]  [<ffffffff810a472b>] flush_work+0x2b/0x40
[ 1683.830379]  [<ffffffff810a1810>] ? worker_detach_from_pool+0xa0/0xa0
[ 1683.836830]  [<ffffffffa038eff7>] nvme_rdma_device_unplug+0x97/0xc0
[nvme_rdma]
[ 1683.844142]  [<ffffffffa0390782>] nvme_rdma_cm_handler+0x72/0x2f0 [nvme_rdma]
[ 1683.851297]  [<ffffffff812085c4>] ? kmem_cache_free+0x1f4/0x210
[ 1683.857230]  [<ffffffffa0703669>] ? cma_comp+0x49/0x60 [rdma_cm]
[ 1683.863247]  [<ffffffffa070810f>] cma_remove_id_dev+0x8f/0xa0 [rdma_cm]
[ 1683.869865]  [<ffffffffa07081d7>] cma_process_remove+0xb7/0x100 [rdma_cm]
[ 1683.876681]  [<ffffffff812b2a14>] ? __kernfs_remove+0x114/0x1d0
[ 1683.882608]  [<ffffffffa070825e>] cma_remove_one+0x3e/0x60 [rdma_cm]
[ 1683.888983]  [<ffffffffa039dca0>] ib_unregister_device+0xb0/0x150 [ib_core]
[ 1683.895957]  [<ffffffffa0363034>] c4iw_unregister_device+0x64/0x90 [iw_cxgb4]
[ 1683.903109]  [<ffffffffa0356357>] c4iw_remove+0x27/0x60 [iw_cxgb4]
[ 1683.909303]  [<ffffffffa036e6fd>] c4iw_exit_module+0x31/0x934 [iw_cxgb4]
[ 1683.916018]  [<ffffffff811120e3>] SyS_delete_module+0x183/0x1d0
[ 1683.921955]  [<ffffffff81003476>] ? do_audit_syscall_entry+0x66/0x70
[ 1683.928314]  [<ffffffff8100378c>] ? trace_event_raw_event_sys_exit+0x6c/0xa0
[ 1683.935357]  [<ffffffff81003b45>] ? syscall_trace_enter+0x65/0x70
[ 1683.941501]  [<ffffffff81003f08>] do_syscall_64+0x78/0x1d0
[ 1683.947003]  [<ffffffff8106e367>] ? do_page_fault+0x37/0x90
[ 1683.952587]  [<ffffffff816eebe1>] entry_SYSCALL64_slow_path+0x25/0x25

  parent reply	other threads:[~2016-07-14 14:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-13 21:26 [PATCH 0/2] nvme-rdma: device removal crash fixes Ming Lin
2016-07-13 21:26 ` [PATCH 1/2] nvme-rdma: grab reference for device removal event Ming Lin
2016-07-13 21:33   ` Steve Wise
2016-07-13 21:26 ` [PATCH 2/2] nvme-rdma: move admin queue cleanup to nvme_rdma_free_ctrl Ming Lin
2016-07-13 21:33   ` Steve Wise
2016-07-13 23:19   ` J Freyensee
2016-07-13 23:36     ` Ming Lin
2016-07-13 23:59       ` J Freyensee
2016-07-14  6:39         ` Ming Lin
2016-07-14 17:09           ` J Freyensee
2016-07-14 18:04             ` Ming Lin
2016-07-14  9:15   ` Sagi Grimberg
2016-07-14  9:17     ` Sagi Grimberg
2016-07-14 14:30       ` Steve Wise
2016-07-14 14:44         ` Sagi Grimberg
2016-07-14 14:59     ` Steve Wise [this message]
     [not found]     ` <011301d1dde0$4450e4e0$ccf2aea0$@opengridcomputing.com>
2016-07-14 15:02       ` Steve Wise
2016-07-14 15:26         ` Steve Wise
2016-07-14 21:27           ` Steve Wise
2016-07-15 15:52             ` Steve Wise
2016-07-17  6:01               ` Sagi Grimberg
2016-07-18 14:55                 ` Steve Wise
2016-07-18 15:47                   ` Steve Wise
2016-07-18 16:34                     ` Steve Wise
2016-07-18 18:04                       ` Steve Wise
2016-07-13 21:58 ` [PATCH 0/2] nvme-rdma: device removal crash fixes Steve Wise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='011201d1dde0$444943c0$ccdbcb40$@opengridcomputing.com' \
    --to=swise@opengridcomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.