linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
To: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Somnath Kotur
	<somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>,
	Sriharsha Basavapatna
	<sriharsha.basavapatna-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH for-next 03/14] RDMA/bnxt_re: Fix race between netdev register and unregister events
Date: Thu, 18 May 2017 07:31:00 +0300	[thread overview]
Message-ID: <20170518043100.GV3616@mtr-leonro.local> (raw)
In-Reply-To: <CA+sbYW0LLrE7EAsa1C-Bza1e6ug2MYntk8YVdOgm=iBLqDKBHw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 11153 bytes --]

On Mon, May 15, 2017 at 04:24:54PM +0530, Selvin Xavier wrote:
> On Sun, May 14, 2017 at 12:19 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > On Wed, May 10, 2017 at 03:45:28AM -0700, Selvin Xavier wrote:
> >> From: Somnath Kotur <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> >>
> >> Upon receipt of the NETDEV_REGISTER event from the netdev notifier
> >> chain, the IB stack registration is spawned off to a workqueue
> >> since that also requires an rtnl lock.
> >> There could be 2 kinds of races between the NETDEV_REGISTER and the
> >> NETDEV_UNREGISTER event handling.
> >> a)The NETDEV_UNREGISTER event is received in rapid succession after
> >> the NETDEV_REGISTER event even before the work queue got a chance
> >> to run
> >> b)The NETDEV_UNREGISTER event is received while the workqueue that
> >> handles registration with the IB stack is still in progress.
> >> Handle both the races with a bit flag that is set as part of the
> >> NETDEV_REGISTER event just before the work item is queued and cleared
> >> in the workqueue after the registration with the IB stack is complete.
> >> Use a barrier to ensure the bit is cleared only after the IB stack
> >> registration code is completed.
> >
> > I have a very strong feeling that this patch doesn't solve race, but
> > makes it is hard to reproduce. Why don't you use locks to protect flows?
> >
>
> IB registration cannot be invoked from the netdev event because the IB stack
> tries to acquire the rtnl_lock and netdev event is already in rtnl
> lock context. So we
> are scheduling a worker in case of NETDEV_REGISTER.
> From the worker task, bnxt_re driver tries to acquire rtnl_lock to access
> the L2 driver for creating the initial resources.
>
> Having a driver lock to synchronize does not solve the possible dead
> lock situation.
> Say, in case the driver acquired the driver_lock in the registration task and a
> NETDEV_UNREGISTER is received which comes with rtnl_lock is held.
> Driver tries to acquire the driver_lock during un-reg. However,
> registration task will hold
> the driver_lock and wait for rtnl_lock to access the L2 driver. netdev
> unreg event will not
> get the driver_lock.
>
> To avoid this dead lock, we are using BNXT_RE_FLAG_NETDEV_REG_IN_PROG and
> driver doesn't decrement the netdev ref count if the registration task
> is running. Once registration
> is done, we will reset BNXT_RE_FLAG_NETDEV_REG_IN_PROG flag and we will proceed
> with un-registration.

It doesn't answer to my question, but whatever, it is not critical.

Thanks

>
>
> >>
> >> Removes the link speed query from the query_port as it causes
> >> deadlock while trying to acquire rtnl_lock. Now querying the
> >> speed from the nedev event itself.
> >>
> >> Also, added a code to wait for resources(like CQ) to be freed by the
> >> ULPs, during driver unload
> >>
> >> Signed-off-by: Somnath Kotur <somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> >> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> >> Signed-off-by: Selvin Xavier <selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> >> ---
> >>  drivers/infiniband/hw/bnxt_re/bnxt_re.h  | 13 +++++++----
> >>  drivers/infiniband/hw/bnxt_re/ib_verbs.c | 21 +++--------------
> >>  drivers/infiniband/hw/bnxt_re/main.c     | 39 ++++++++++++++++++++++++++++++++
> >>  3 files changed, 50 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/drivers/infiniband/hw/bnxt_re/bnxt_re.h b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
> >> index ebf7be8..fad04b2 100644
> >> --- a/drivers/infiniband/hw/bnxt_re/bnxt_re.h
> >> +++ b/drivers/infiniband/hw/bnxt_re/bnxt_re.h
> >> @@ -80,11 +80,13 @@ struct bnxt_re_dev {
> >>       struct ib_device                ibdev;
> >>       struct list_head                list;
> >>       unsigned long                   flags;
> >> -#define BNXT_RE_FLAG_NETDEV_REGISTERED       0
> >> -#define BNXT_RE_FLAG_IBDEV_REGISTERED        1
> >> -#define BNXT_RE_FLAG_GOT_MSIX                2
> >> -#define BNXT_RE_FLAG_RCFW_CHANNEL_EN 8
> >> -#define BNXT_RE_FLAG_QOS_WORK_REG    16
> >> +#define BNXT_RE_FLAG_NETDEV_REGISTERED         0
> >> +#define BNXT_RE_FLAG_IBDEV_REGISTERED          1
> >> +#define BNXT_RE_FLAG_GOT_MSIX                  2
> >> +#define BNXT_RE_FLAG_RCFW_CHANNEL_EN           4
> >> +#define BNXT_RE_FLAG_QOS_WORK_REG              5
> >> +#define BNXT_RE_FLAG_NETDEV_REG_IN_PROG        6
> >> +
> >>       struct net_device               *netdev;
> >>       unsigned int                    version, major, minor;
> >>       struct bnxt_en_dev              *en_dev;
> >> @@ -127,6 +129,7 @@ struct bnxt_re_dev {
> >>       struct bnxt_re_qp               *qp1_sqp;
> >>       struct bnxt_re_ah               *sqp_ah;
> >>       struct bnxt_re_sqp_entries sqp_tbl[1024];
> >> +     u32 espeed;
> >>  };
> >>
> >>  #define to_bnxt_re_dev(ptr, member)  \
> >> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> >> index 6385213..c717d5d 100644
> >> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> >> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
> >> @@ -223,20 +223,8 @@ int bnxt_re_modify_device(struct ib_device *ibdev,
> >>       return 0;
> >>  }
> >>
> >> -static void __to_ib_speed_width(struct net_device *netdev, u8 *speed, u8 *width)
> >> +static void __to_ib_speed_width(u32 espeed, u8 *speed, u8 *width)
> >>  {
> >> -     struct ethtool_link_ksettings lksettings;
> >> -     u32 espeed;
> >> -
> >> -     if (netdev->ethtool_ops && netdev->ethtool_ops->get_link_ksettings) {
> >> -             memset(&lksettings, 0, sizeof(lksettings));
> >> -             rtnl_lock();
> >> -             netdev->ethtool_ops->get_link_ksettings(netdev, &lksettings);
> >> -             rtnl_unlock();
> >> -             espeed = lksettings.base.speed;
> >> -     } else {
> >> -             espeed = SPEED_UNKNOWN;
> >> -     }
> >>       switch (espeed) {
> >>       case SPEED_1000:
> >>               *speed = IB_SPEED_SDR;
> >> @@ -303,12 +291,9 @@ int bnxt_re_query_port(struct ib_device *ibdev, u8 port_num,
> >>       port_attr->sm_sl = 0;
> >>       port_attr->subnet_timeout = 0;
> >>       port_attr->init_type_reply = 0;
> >> -     /* call the underlying netdev's ethtool hooks to query speed settings
> >> -      * for which we acquire rtnl_lock _only_ if it's registered with
> >> -      * IB stack to avoid race in the NETDEV_UNREG path
> >> -      */
> >> +
> >>       if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags))
> >> -             __to_ib_speed_width(rdev->netdev, &port_attr->active_speed,
> >> +             __to_ib_speed_width(rdev->espeed, &port_attr->active_speed,
> >>                                   &port_attr->active_width);
> >>       return 0;
> >>  }
> >> diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
> >> index 5d35540..99a54fd 100644
> >> --- a/drivers/infiniband/hw/bnxt_re/main.c
> >> +++ b/drivers/infiniband/hw/bnxt_re/main.c
> >> @@ -48,6 +48,8 @@
> >>  #include <net/ipv6.h>
> >>  #include <net/addrconf.h>
> >>  #include <linux/if_ether.h>
> >> +#include <linux/atomic.h>
> >> +#include <asm/barrier.h>
> >>
> >>  #include <rdma/ib_verbs.h>
> >>  #include <rdma/ib_user_verbs.h>
> >> @@ -926,6 +928,10 @@ static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev, bool lock_wait)
> >>       if (test_and_clear_bit(BNXT_RE_FLAG_QOS_WORK_REG, &rdev->flags))
> >>               cancel_delayed_work(&rdev->worker);
> >>
> >> +     /* Wait for ULPs to release references */
> >> +     while (atomic_read(&rdev->cq_count))
> >> +             usleep_range(500, 1000);
> >> +
> >
> > It can change immediately after your check.
> [Selvin] Yeah. We might end up waiting for 1ms in the unload path. We
> have kept this change, since it
> is a control path operation.
> >
> >>       bnxt_re_cleanup_res(rdev);
> >>       bnxt_re_free_res(rdev, lock_wait);
> >>
> >> @@ -1152,6 +1158,22 @@ static void bnxt_re_remove_one(struct bnxt_re_dev *rdev)
> >>       pci_dev_put(rdev->en_dev->pdev);
> >>  }
> >>
> >> +static void bnxt_re_get_link_speed(struct bnxt_re_dev *rdev)
> >> +{
> >> +     struct ethtool_link_ksettings lksettings;
> >> +     struct net_device *netdev = rdev->netdev;
> >> +
> >> +     if (netdev->ethtool_ops && netdev->ethtool_ops->get_link_ksettings) {
> >> +             memset(&lksettings, 0, sizeof(lksettings));
> >> +             if (rtnl_trylock()) {
> >> +                     netdev->ethtool_ops->get_link_ksettings(netdev,
> >> +                                                             &lksettings);
> >> +                     rdev->espeed = lksettings.base.speed;
> >> +                     rtnl_unlock();
> >> +             }
> >> +     }
> >> +}
> >> +
> >>  /* Handle all deferred netevents tasks */
> >>  static void bnxt_re_task(struct work_struct *work)
> >>  {
> >> @@ -1169,6 +1191,10 @@ static void bnxt_re_task(struct work_struct *work)
> >>       switch (re_work->event) {
> >>       case NETDEV_REGISTER:
> >>               rc = bnxt_re_ib_reg(rdev);
> >> +             /* Use memory barrier to sync the rdev->flags */
> >> +             smp_mb();
> >> +             clear_bit(BNXT_RE_FLAG_NETDEV_REG_IN_PROG,
> >> +                       &rdev->flags);
> >>               if (rc)
> >>                       dev_err(rdev_to_dev(rdev),
> >>                               "Failed to register with IB: %#x", rc);
> >> @@ -1186,6 +1212,7 @@ static void bnxt_re_task(struct work_struct *work)
> >>               else if (netif_carrier_ok(rdev->netdev))
> >>                       bnxt_re_dispatch_event(&rdev->ibdev, NULL, 1,
> >>                                              IB_EVENT_PORT_ACTIVE);
> >> +             bnxt_re_get_link_speed(rdev);
> >>               break;
> >>       default:
> >>               break;
> >> @@ -1244,10 +1271,22 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
> >>                       break;
> >>               }
> >>               bnxt_re_init_one(rdev);
> >> +             /*
> >> +              *  Query the link speed at the time of registration.
> >> +              *  Already in rtnl_lock context
> >> +              */
> >> +             bnxt_re_get_link_speed(rdev);
> >> +
> >> +             set_bit(BNXT_RE_FLAG_NETDEV_REG_IN_PROG, &rdev->flags);
> >>               sch_work = true;
> >>               break;
> >>
> >>       case NETDEV_UNREGISTER:
> >> +             /* netdev notifier will call NETDEV_UNREGISTER again later since
> >> +              * we are still holding the reference to the netdev
> >> +              */
> >> +             if (test_bit(BNXT_RE_FLAG_NETDEV_REG_IN_PROG, &rdev->flags))
> >> +                     goto exit;
> >>               bnxt_re_ib_unreg(rdev, false);
> >>               bnxt_re_remove_one(rdev);
> >>               bnxt_re_dev_unreg(rdev);
> >> --
> >> 2.5.5
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> >> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2017-05-18  4:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-10 10:45 [PATCH for-next 00/14] RDMA/bnxt_re: Bug fixes Selvin Xavier
     [not found] ` <1494413139-11883-1-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-05-10 10:45   ` [PATCH for-next 01/14] RDMA/bnxt_re: Fixing the Control path command and response handling Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 02/14] RDMA/bnxt_re: HW workarounds for handling specific conditions Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 03/14] RDMA/bnxt_re: Fix race between netdev register and unregister events Selvin Xavier
     [not found]     ` <1494413139-11883-4-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-05-14  6:49       ` Leon Romanovsky
     [not found]         ` <20170514064904.GR3616-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-05-15 10:54           ` Selvin Xavier
     [not found]             ` <CA+sbYW0LLrE7EAsa1C-Bza1e6ug2MYntk8YVdOgm=iBLqDKBHw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-18  4:31               ` Leon Romanovsky [this message]
2017-05-10 10:45   ` [PATCH for-next 04/14] RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_list Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 05/14] RDMA/bnxt_re: Free doorbell page index (DPI) during dealloc ucontext Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 06/14] RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPs Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 07/14] RDMA/bnxt_re: Fix WQE Size posted to HW to prevent it from throwing error Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 08/14] RDMA/bnxt_re: Add vlan tag for untagged RoCE traffic when PFC is configured Selvin Xavier
     [not found]     ` <1494413139-11883-9-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-05-14  6:23       ` Leon Romanovsky
     [not found]         ` <20170514062349.GQ3616-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-05-15 10:13           ` Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 09/14] RDMA/bnxt_re: Do not free the ctx_tbl entry if delete GID fails Selvin Xavier
     [not found]     ` <1494413139-11883-10-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-05-14  6:10       ` Leon Romanovsky
     [not found]         ` <20170514061015.GP3616-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-05-15 10:54           ` Selvin Xavier
     [not found]             ` <CA+sbYW2LtbCf=DLXfyTKBUZJh5t7ZEBNYGHwJp1tFX49p5QL9g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-18  4:32               ` Leon Romanovsky
2017-05-10 10:45   ` [PATCH for-next 10/14] RDMA/bnxt_re: Fix RQE posting logic Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 11/14] RDMA/bnxt_re: Report supported value to IB stack in query_device Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 12/14] RDMA/bnxt_re: Fixed the max_rd_atomic support for initiator and destination QP Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 13/14] RDMA/bnxt_re: Specify RDMA component when allocating stats context Selvin Xavier
2017-05-10 10:45   ` [PATCH for-next 14/14] RDMA/bnxt_re: Update the driver version Selvin Xavier
     [not found]     ` <1494413139-11883-15-git-send-email-selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
2017-05-10 15:06       ` Dennis Dalessandro
     [not found]         ` <84d9a0ce-0e5c-fc50-237f-b7ad9d541324-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
2017-05-10 15:54           ` Leon Romanovsky
     [not found]             ` <20170510155430.GC1839-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-05-11  5:22               ` Selvin Xavier
     [not found]                 ` <CA+sbYW2ObMdDOdhPL9GMe5YNk7P2MsG5q=f+Ke5e4MU6A=WV6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-11  8:27                   ` Leon Romanovsky
     [not found]                     ` <20170511082700.GA3616-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-05-11 13:55                       ` Dennis Dalessandro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170518043100.GV3616@mtr-leonro.local \
    --to=leon-dgejt+ai2ygdnm+yrofe0a@public.gmane.org \
    --cc=dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=selvin.xavier-dY08KVG/lbpWk0Htik3J/w@public.gmane.org \
    --cc=somnath.kotur-dY08KVG/lbpWk0Htik3J/w@public.gmane.org \
    --cc=sriharsha.basavapatna-dY08KVG/lbpWk0Htik3J/w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).