From: "Leon Romanovsky" <leon@kernel.org>
To: "Selvin Xavier" <selvin.xavier@broadcom.com>,
"Kalesh Anakkur Purayil" <kalesh-anakkur.purayil@broadcom.com>
Cc: "Jason Gunthorpe" <jgg@ziepe.ca>,
linux-rdma@vger.kernel.org,
"Andy Gospodarek" <andrew.gospodarek@broadcom.com>
Subject: Re: [PATCH rdma-rc 2/4] RDMA/bnxt_re: Add sanity checks on rdev validity
Date: Sun, 09 Feb 2025 22:05:09 +0200 [thread overview]
Message-ID: <a5343f38-1806-4d64-a757-6ece1ff4ec0d@app.fastmail.com> (raw)
In-Reply-To: <CA+sbYW2nMOxkSxSLk2W1vgDoYVSctHMKs7FCshEagpnqNFLayA@mail.gmail.com>
On Sun, Feb 9, 2025, at 19:18, Selvin Xavier wrote:
> On Wed, Feb 5, 2025 at 9:45 PM Kalesh Anakkur Purayil
> <kalesh-anakkur.purayil@broadcom.com> wrote:
>>
>> Hi Leon,
>>
>> On Wed, Feb 5, 2025 at 3:22 PM Leon Romanovsky <leon@kernel.org> wrote:
>> >
>> > On Wed, Feb 05, 2025 at 01:54:14PM +0530, Kalesh Anakkur Purayil wrote:
>> > > Hi Leon,
>> > >
>> > > On Wed, Feb 5, 2025 at 12:47 PM Leon Romanovsky <leon@kernel.org> wrote:
>> > > >
>> > > > On Tue, Feb 04, 2025 at 10:10:38PM +0530, Kalesh Anakkur Purayil wrote:
>> > > > > Hi Leon,
>> > > > >
>> > > > > On Tue, Feb 4, 2025 at 5:14 PM Leon Romanovsky <leon@kernel.org> wrote:
>> > > > > >
>> > > > > > On Tue, Feb 04, 2025 at 12:21:23AM -0800, Selvin Xavier wrote:
>> > > > > > > From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>> > > > > > >
>> > > > > > > There is a possibility that ulp_irq_stop and ulp_irq_start
>> > > > > > > callbacks will be called when the device is in detached state.
>> > > > > > > This can cause a crash due to NULL pointer dereference as
>> > > > > > > the rdev is already freed.
>> > > > > >
>> > > > > > Description and code doesn't match. If "possibility" exists, there is
>> > > > > > no protection from concurrent detach and ulp_irq_stop. If there is such
>> > > > > > protection, they can't race.
>> > > > > >
>> > > > > > The main idea of auxiliary bus is to remove the need to implement driver
>> > > > > > specific ops.
>> > > > >
>> > > > > There is no race condition here.
>> > > > >
>> > > > > Let me explain the scenario.
>> > > > >
>> > > > > User is doing a devlink reload reinit. As part of that, the Ethernet
>> > > > > driver first invokes the auxiliary bus suspend callback. The RDMA driver
>> > > > > will do the unwinding operation and hence rdev will be freed.
>> > > > >
>> > > > > After that, during the devlink sequence, Ethernet driver invokes the
>> > > > > ulp_irq_stop() callback and this resulted in the NULL pointer
>> > > > > dereference as the RDMA driver is in detached state and the rdev is
>> > > > > already freed.
>> > > >
>> > > > Shouldn't devlink reload completely release all auxiliary drivers?
>> > > > Why are you keeping BNXT RDMA driver during reload?
>> > >
>> > > That is the current design. BNXT core Ethernet driver will not destroy
>> > > the auxiliary device for RDMA, but just calls the suspend callback. As
>> > > a result, RDMA driver will remains loaded and remains registered with
>> > > the Ethernet driver instance.
>> >
>> > This is wrong.
>>
>> We understand that. BNXT core driver team has already started working
>> on the auxiliary device removal instead of invoking auxdrv->suspend
>> callback in the devlink relaod path. That will avoid these NULL checks
>> in RDMA driver. for time being we need these NULL checks.
>> That will be posted to net-next tree once internal testing and review is done.
>> >
>> > > > BNXT core driver controls reload, it shouldn't call to drivers which
>> > > > doesn't exist.
>> > > Since the RDMA driver instance is registered with Ethernet driver,
>> > > core Ethernet driver invokes the callback.
>> > > >
>> > > > >
>> > > > > We are trying to address the NULL pointer dereference issue here.
>> > > >
>> > > > You are hiding bugs and not fixing them.
>> > >
>> > > Yes, but this change is critical for the current design of the driver.
>> >
>> > Please fix it once and for all by doing proper reload sequence.
>> > I warned you that setting NULLs to pointers hide bugs.
>> > https://lore.kernel.org/linux-rdma/20250114112555.GG3146852@unreal/
>>
>> Yes, I understand. We will work on the suggestion that you had given
>> based on the new design mentioned in last comment.
> Hi Leon,
> Is it okay to merge this change along with the other fixes in the
> series? Its important
> for the current driver design.
Yes, because it is current situation, i will merge it tomorrow.
Thanks
>
> Thanks,
>
>> >
>> > Thanks
>> >
>> > > >
>> > > > >
>> > > > > The driver specific ops, ulp_irq_stop and ulp_irq_start are required.
>> > > > > Broadcom Ethernet and RDMA drivers are designed like that to manage
>> > > > > IRQs between them.
>> > > > >
>> > > > > Hope this clarifies your question.
>> > > > > >
>> > > > > > >
>> > > > > > > Fixes: cc5b9b48d447 ("RDMA/bnxt_re: Recover the device when FW error is detected")
>> > > > > > > Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>> > > > > > > Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
>> > > > > > > ---
>> > > > > > > drivers/infiniband/hw/bnxt_re/main.c | 5 +++++
>> > > > > > > 1 file changed, 5 insertions(+)
>> > > > > > >
>> > > > > > > diff --git a/drivers/infiniband/hw/bnxt_re/main.c b/drivers/infiniband/hw/bnxt_re/main.c
>> > > > > > > index c4c3d67..89ac5c2 100644
>> > > > > > > --- a/drivers/infiniband/hw/bnxt_re/main.c
>> > > > > > > +++ b/drivers/infiniband/hw/bnxt_re/main.c
>> > > > > > > @@ -438,6 +438,8 @@ static void bnxt_re_stop_irq(void *handle, bool reset)
>> > > > > > > int indx;
>> > > > > > >
>> > > > > > > rdev = en_info->rdev;
>> > > > > > > + if (!rdev)
>> > > > > > > + return;
>> > > > > >
>> > > > > > This can be seen as an example why I'm so negative about assigning NULL
>> > > > > > to the pointers after object is destroyed. Such assignment makes layer
>> > > > > > violation much easier job to do.
>> > > > > >
>> > > > > > Thanks
>> > > > > >
>> > > > > > > rcfw = &rdev->rcfw;
>> > > > > > >
>> > > > > > > if (reset) {
>> > > > > > > @@ -466,6 +468,8 @@ static void bnxt_re_start_irq(void *handle, struct bnxt_msix_entry *ent)
>> > > > > > > int indx, rc;
>> > > > > > >
>> > > > > > > rdev = en_info->rdev;
>> > > > > > > + if (!rdev)
>> > > > > > > + return;
>> > > > > > > msix_ent = rdev->nqr->msix_entries;
>> > > > > > > rcfw = &rdev->rcfw;
>> > > > > > > if (!ent) {
>> > > > > > > @@ -2438,6 +2442,7 @@ static int bnxt_re_suspend(struct auxiliary_device *adev, pm_message_t state)
>> > > > > > > ibdev_info(&rdev->ibdev, "%s: L2 driver notified to stop en_state 0x%lx",
>> > > > > > > __func__, en_dev->en_state);
>> > > > > > > bnxt_re_remove_device(rdev, BNXT_RE_PRE_RECOVERY_REMOVE, adev);
>> > > > > > > + bnxt_re_update_en_info_rdev(NULL, en_info, adev);
>> > > > > > > mutex_unlock(&bnxt_re_mutex);
>> > > > > > >
>> > > > > > > return 0;
>> > > > > > > --
>> > > > > > > 2.5.5
>> > > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Regards,
>> > > > > Kalesh AP
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > Kalesh AP
>> >
>> >
>>
>>
>> --
>> Regards,
>> Kalesh AP
>
> Attachments:
> * smime.p7s
next prev parent reply other threads:[~2025-02-09 20:05 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-04 8:21 [PATCH rdma-rc 0/4] RDMA/bnxt_re: Bug fixes for 6.14 Selvin Xavier
2025-02-04 8:21 ` [PATCH rdma-rc 1/4] RDMA/bnxt_re: Fix an issue in bnxt_re_async_notifier Selvin Xavier
2025-02-04 8:21 ` [PATCH rdma-rc 2/4] RDMA/bnxt_re: Add sanity checks on rdev validity Selvin Xavier
2025-02-04 11:44 ` Leon Romanovsky
2025-02-04 16:40 ` Kalesh Anakkur Purayil
2025-02-05 7:17 ` Leon Romanovsky
2025-02-05 8:24 ` Kalesh Anakkur Purayil
2025-02-05 9:52 ` Leon Romanovsky
2025-02-05 16:15 ` Kalesh Anakkur Purayil
2025-02-09 17:18 ` Selvin Xavier
2025-02-09 20:05 ` Leon Romanovsky [this message]
2025-02-05 14:47 ` Jason Gunthorpe
2025-02-05 16:20 ` Kalesh Anakkur Purayil
2025-02-04 8:21 ` [PATCH rdma-rc 3/4] RDMA/bnxt_re: Fix issue in the unload path Selvin Xavier
2025-02-25 18:42 ` Jason Gunthorpe
2025-02-26 5:39 ` Selvin Xavier
2025-02-04 8:21 ` [PATCH rdma-rc 4/4] RDMA/bnxt_re: Fix the statistics for Gen P7 VF Selvin Xavier
2025-02-10 8:40 ` [PATCH rdma-rc 0/4] RDMA/bnxt_re: Bug fixes for 6.14 Leon Romanovsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a5343f38-1806-4d64-a757-6ece1ff4ec0d@app.fastmail.com \
--to=leon@kernel.org \
--cc=andrew.gospodarek@broadcom.com \
--cc=jgg@ziepe.ca \
--cc=kalesh-anakkur.purayil@broadcom.com \
--cc=linux-rdma@vger.kernel.org \
--cc=selvin.xavier@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox