All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Kalesh Anakkur Purayil <kalesh-anakkur.purayil@broadcom.com>
Cc: Selvin Xavier <selvin.xavier@broadcom.com>,
	jgg@ziepe.ca, linux-rdma@vger.kernel.org,
	andrew.gospodarek@broadcom.com
Subject: Re: [PATCH rdma-rc 2/4] RDMA/bnxt_re: Add sanity checks on rdev validity
Date: Wed, 5 Feb 2025 11:52:15 +0200	[thread overview]
Message-ID: <20250205095215.GN74886@unreal> (raw)
In-Reply-To: <CAH-L+nM6+-v6dvgzA6UDhgbhjysr55BJ859N5aWWXNEm4k+EDw@mail.gmail.com>

On Wed, Feb 05, 2025 at 01:54:14PM +0530, Kalesh Anakkur Purayil wrote:
> Hi Leon,
> 
> On Wed, Feb 5, 2025 at 12:47 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Feb 04, 2025 at 10:10:38PM +0530, Kalesh Anakkur Purayil wrote:
> > > Hi Leon,
> > >
> > > On Tue, Feb 4, 2025 at 5:14 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Tue, Feb 04, 2025 at 12:21:23AM -0800, Selvin Xavier wrote:
> > > > > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > > > >
> > > > > There is a possibility that ulp_irq_stop and ulp_irq_start
> > > > > callbacks will be called when the device is in detached state.
> > > > > This can cause a crash due to NULL pointer dereference as
> > > > > the rdev is already freed.
> > > >
> > > > Description and code doesn't match. If "possibility" exists, there is
> > > > no protection from concurrent detach and ulp_irq_stop. If there is such
> > > > protection, they can't race.
> > > >
> > > > The main idea of auxiliary bus is to remove the need to implement driver
> > > > specific ops.
> > >
> > > There is no race condition here.
> > >
> > > Let me explain the scenario.
> > >
> > > User is doing a devlink reload reinit. As part of that, the Ethernet
> > > driver first invokes the auxiliary bus suspend callback. The RDMA driver
> > > will do the unwinding operation and hence rdev will be freed.
> > >
> > > After that, during the devlink sequence, Ethernet driver invokes the
> > > ulp_irq_stop() callback and this resulted in the NULL pointer
> > > dereference as the RDMA driver is in detached state and the rdev is
> > > already freed.
> >
> > Shouldn't devlink reload completely release all auxiliary drivers?
> > Why are you keeping BNXT RDMA driver during reload?
> 
> That is the current design. BNXT core Ethernet driver will not destroy
> the auxiliary device for RDMA, but just calls the suspend callback. As
> a result, RDMA driver will remains loaded and remains registered with
> the Ethernet driver instance.

This is wrong.

> > BNXT core driver controls reload, it shouldn't call to drivers which
> > doesn't exist.
> Since the RDMA driver instance is registered with Ethernet driver,
> core Ethernet driver invokes the callback.
> >
> > >
> > > We are trying to address the NULL pointer dereference issue here.
> >
> > You are hiding bugs and not fixing them.
> 
> Yes, but this change is critical for the current design of the driver.

Please fix it once and for all by doing proper reload sequence.
I warned you that setting NULLs to pointers hide bugs.
https://lore.kernel.org/linux-rdma/20250114112555.GG3146852@unreal/

Thanks

> >
> > >
> > > The driver specific ops, ulp_irq_stop and ulp_irq_start are required.
> > > Broadcom Ethernet and RDMA drivers are designed like that to manage
> > > IRQs between them.
> > >
> > > Hope this clarifies your question.
> > > >
> > > > >
> > > > > Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected")
> > > > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
> > > > > Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
> > > > > ---
> > > > >  drivers/infiniband/hw/bnxt_re/main.c | 5 +++++
> > > > >  1 file changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
> > > > > index c4c3d67..89ac5c2 100644
> > > > > --- a/drivers/infiniband/hw/bnxt_re/main.c
> > > > > +++ b/drivers/infiniband/hw/bnxt_re/main.c
> > > > > @@ -438,6 +438,8 @@ static void bnxt_re_stop_irq(void *handle, bool reset)
> > > > >       int indx;
> > > > >
> > > > >       rdev = en_info->rdev;
> > > > > +     if (!rdev)
> > > > > +             return;
> > > >
> > > > This can be seen as an example why I'm so negative about assigning NULL
> > > > to the pointers after object is destroyed. Such assignment makes layer
> > > > violation much easier job to do.
> > > >
> > > > Thanks
> > > >
> > > > >       rcfw = &rdev->rcfw;
> > > > >
> > > > >       if (reset) {
> > > > > @@ -466,6 +468,8 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent)
> > > > >       int indx, rc;
> > > > >
> > > > >       rdev = en_info->rdev;
> > > > > +     if (!rdev)
> > > > > +             return;
> > > > >       msix_ent = rdev->nqr->msix_entries;
> > > > >       rcfw = &rdev->rcfw;
> > > > >       if (!ent) {
> > > > > @@ -2438,6 +2442,7 @@ static int bnxt_re_suspend(struct auxiliary_device *adev, pm_message_t state)
> > > > >       ibdev_info(&rdev->ibdev, "%s: L2 driver notified to stop en_state 0x%lx",
> > > > >                  __func__, en_dev->en_state);
> > > > >       bnxt_re_remove_device(rdev, BNXT_RE_PRE_RECOVERY_REMOVE, adev);
> > > > > +     bnxt_re_update_en_info_rdev(NULL, en_info, adev);
> > > > >       mutex_unlock(&bnxt_re_mutex);
> > > > >
> > > > >       return 0;
> > > > > --
> > > > > 2.5.5
> > > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Kalesh AP
> >
> >
> 
> 
> -- 
> Regards,
> Kalesh AP



  reply	other threads:[~2025-02-05  9:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-04  8:21 [PATCH rdma-rc 0/4] RDMA/bnxt_re: Bug fixes for 6.14 Selvin Xavier
2025-02-04  8:21 ` [PATCH rdma-rc 1/4] RDMA/bnxt_re: Fix an issue in bnxt_re_async_notifier Selvin Xavier
2025-02-04  8:21 ` [PATCH rdma-rc 2/4] RDMA/bnxt_re: Add sanity checks on rdev validity Selvin Xavier
2025-02-04 11:44   ` Leon Romanovsky
2025-02-04 16:40     ` Kalesh Anakkur Purayil
2025-02-05  7:17       ` Leon Romanovsky
2025-02-05  8:24         ` Kalesh Anakkur Purayil
2025-02-05  9:52           ` Leon Romanovsky [this message]
2025-02-05 16:15             ` Kalesh Anakkur Purayil
2025-02-09 17:18               ` Selvin Xavier
2025-02-09 20:05                 ` Leon Romanovsky
2025-02-05 14:47           ` Jason Gunthorpe
2025-02-05 16:20             ` Kalesh Anakkur Purayil
2025-02-04  8:21 ` [PATCH rdma-rc 3/4] RDMA/bnxt_re: Fix issue in the unload path Selvin Xavier
2025-02-25 18:42   ` Jason Gunthorpe
2025-02-26  5:39     ` Selvin Xavier
2025-02-04  8:21 ` [PATCH rdma-rc 4/4] RDMA/bnxt_re: Fix the statistics for Gen P7 VF Selvin Xavier
2025-02-10  8:40 ` [PATCH rdma-rc 0/4] RDMA/bnxt_re: Bug fixes for 6.14 Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250205095215.GN74886@unreal \
    --to=leon@kernel.org \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=jgg@ziepe.ca \
    --cc=kalesh-anakkur.purayil@broadcom.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=selvin.xavier@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.