do nested flushing only if the device isn't a child Signed-off-by: Or Gerlitz ---- setting CONFIG_DEBUG_MUTEXES I see the below warning, however, for some reason, I didn't manage to trigger it without my other patch that adds the clones, I don't see how that patch could be the reason for the warning, as the code always goes nested, I've instrumented the flush code to dump its caller/stack and indeed, you can see that the flushing code is called recursively and should have that warning, but it doesn't... ib0.8001: downing ib_dev ib0: downing ib_dev ib0: ipoib_ib_dev_flush_light called ib0: __ipoib_ib_dev_flush pid 29251 Pid: 29251, comm: kworker/u:1 Not tainted 3.2.0-06106-g75f0703-dirty #16 Call Trace: [] __ipoib_ib_dev_flush+0x57/0x204 [ib_ipoib] [] ? ipoib_ib_dev_flush_normal+0x46/0x46 [ib_ipoib] [] ipoib_ib_dev_flush_light+0x3f/0x43 [ib_ipoib] [] process_one_work+0x2bd/0x4a6 [] ? process_one_work+0x210/0x4a6 [] worker_thread+0x1d6/0x350 [] ? rescuer_thread+0x241/0x241 [] kthread+0x84/0x8c [] kernel_thread_helper+0x4/0x10 [] ? finish_task_switch+0x154/0x156 [] ? _raw_spin_unlock_irq+0x2b/0x40 [] ? retint_restore_args+0xe/0xe [] ? __init_kthread_worker+0x56/0x56 [] ? gs_change+0xb/0xb ib0.8001: __ipoib_ib_dev_flush pid 29251 Pid: 29251, comm: kworker/u:1 Not tainted 3.2.0-06106-g75f0703-dirty #16 Call Trace: [] __ipoib_ib_dev_flush+0x57/0x204 [ib_ipoib] [] __ipoib_ib_dev_flush+0x87/0x204 [ib_ipoib] [] ? ipoib_ib_dev_flush_normal+0x46/0x46 [ib_ipoib] [] ipoib_ib_dev_flush_light+0x3f/0x43 [ib_ipoib] [] process_one_work+0x2bd/0x4a6 [] ? process_one_work+0x210/0x4a6 [] worker_thread+0x1d6/0x350 [] ? rescuer_thread+0x241/0x241 [] kthread+0x84/0x8c [] kernel_thread_helper+0x4/0x10 [] ? finish_task_switch+0x154/0x156 [] ? _raw_spin_unlock_irq+0x2b/0x40 [] ? retint_restore_args+0xe/0xe [] ? __init_kthread_worker+0x56/0x56 [] ? gs_change+0xb/0xb --- ============================================= [ INFO: possible recursive locking detected ] 3.2.0-06106-g75f0703-dirty #16 Not tainted --------------------------------------------- kworker/u:2/1578 is trying to acquire lock: (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] but task is already holding lock: (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&priv->vlan_mutex); lock(&priv->vlan_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/u:2/1578: #0: (ipoib){.+.+.+}, at: [] process_one_work+0x210/0x4a6 #1: ((&priv->flush_heavy)){+.+...}, at: [] process_one_work+0x210/0x4a6 #2: (&priv->vlan_mutex){+.+.+.}, at: [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] stack backtrace: Pid: 1578, comm: kworker/u:2 Not tainted 3.2.0-06106-g75f0703-dirty #16 Call Trace: [] ? console_unlock+0x10c/0x207 [] __lock_acquire+0x16b5/0x174e [] ? save_stack_trace+0x2a/0x47 [] lock_acquire+0xf0/0x116 [] ? __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] [] mutex_lock_nested+0x64/0x2e6 [] ? __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] [] ? trace_hardirqs_on_caller+0x11e/0x155 [] __ipoib_ib_dev_flush+0x2c/0x1cf [ib_ipoib] [] __ipoib_ib_dev_flush+0x52/0x1cf [ib_ipoib] [] ? trace_hardirqs_on_caller+0x11e/0x155 [] ? __ipoib_ib_dev_flush+0x1cf/0x1cf [ib_ipoib] [] ipoib_ib_dev_flush_heavy+0x15/0x17 [ib_ipoib] [] process_one_work+0x2bd/0x4a6 [] ? process_one_work+0x210/0x4a6 [] ? _raw_spin_unlock_irq+0x2b/0x40 [] worker_thread+0x1d6/0x350 [] ? rescuer_thread+0x241/0x241 [] kthread+0x84/0x8c [] kernel_thread_helper+0x4/0x10 [] ? retint_restore_args+0xe/0xe [] ? __init_kthread_worker+0x56/0x56 [] ? gs_change+0xb/0xb ADDRCONF(NETDEV_CHANGE): ib0.8001: link becomes ready ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready drivers/infiniband/ulp/ipoib/ipoib_ib.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 5c1bc99..cac2b71 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -934,16 +934,18 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, struct net_device *dev = priv->dev; u16 new_index; - mutex_lock(&priv->vlan_mutex); + if (!priv->parent) { + mutex_lock(&priv->vlan_mutex); - /* - * Flush any child interfaces too -- they might be up even if - * the parent is down. - */ - list_for_each_entry(cpriv, &priv->child_intfs, list) - __ipoib_ib_dev_flush(cpriv, level); + /* + * Flush any child interfaces too -- they might be up even if + * the parent is down. + */ + list_for_each_entry(cpriv, &priv->child_intfs, list) + __ipoib_ib_dev_flush(cpriv, level); - mutex_unlock(&priv->vlan_mutex); + mutex_unlock(&priv->vlan_mutex); + } if (!test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags)) { ipoib_dbg(priv, "Not flushing - IPOIB_FLAG_INITIALIZED not set.\n");