From: Jason Gunthorpe <jgg@nvidia.com>
To: Kamal Heib <kamalheib1@gmail.com>
Cc: <linux-rdma@vger.kernel.org>, Doug Ledford <dledford@redhat.com>,
Kamal Heib <kheib@redhat.com>
Subject: Re: [PATCH for-next] RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah()
Date: Thu, 2 Jul 2020 11:16:16 -0300 [thread overview]
Message-ID: <20200702141616.GA698336@nvidia.com> (raw)
In-Reply-To: <20200625174219.290842-1-kamalheib1@gmail.com>
On Thu, Jun 25, 2020 at 08:42:19PM +0300, Kamal Heib wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> ipoib_mcast_carrier_on_task() insanely open codes a rtnl_lock() such that
> the ony time flush_workqueue() can be called is if it also clears
> IPOIB_FLAG_OPER_UP.
>
> Thus the flush inside ipoib_flush_ah() will deadlock if it gets unlucky
> enough, and lockdep doesn't help us to find it early:
>
> CPU0 CPU1 CPU2
> __ipoib_ib_dev_flush()
> down_read(vlan_rwsem)
>
> ipoib_vlan_add()
> rtnl_trylock()
> down_write(vlan_rwsem)
>
> ipoib_mcast_carrier_on_task()
> while (!rtnl_trylock())
> msleep(20);
>
> ipoib_flush_ah()
> flush_workqueue(priv->wq)
>
> Clean up the ah_reaper related functions and lifecycle to make sense:
>
> - Start/Stop of the reaper should only be done in open/stop NDOs, not in
> any other places
>
> - cancel and flush of the reaper should only happen in the stop NDO.
> cancel is only functional when combined with IPOIB_STOP_REAPER.
>
> - Non-stop places were flushing the AH's just need to flush out dead AH's
> synchronously and ignore the background task completely. It is fully
> locked and harmless to leave running.
>
> Which ultimately fixes the ABBA deadlock by removing the unnecessary
> flush_workqueue() from the problematic place under the vlan_rwsem.
>
> Fixes: efc82eeeae4e ("IB/ipoib: No longer use flush as a parameter")
> Reported-by: Kamal Heib <kheib@redhat.com>
> Tested-by: Kamal Heib <kheib@redhat.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
> drivers/infiniband/ulp/ipoib/ipoib_ib.c | 65 ++++++++++-------------
> drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +
> 2 files changed, 31 insertions(+), 36 deletions(-)
Applied to for-next, thanks
Jason
prev parent reply other threads:[~2020-07-02 14:16 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-25 17:42 [PATCH for-next] RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah() Kamal Heib
2020-07-02 14:16 ` Jason Gunthorpe [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200702141616.GA698336@nvidia.com \
--to=jgg@nvidia.com \
--cc=dledford@redhat.com \
--cc=kamalheib1@gmail.com \
--cc=kheib@redhat.com \
--cc=linux-rdma@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.