All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: David Thompson <davthompson@nvidia.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, u.kleine-koenig@pengutronix.de,
	leon@kernel.org, asmaa@nvidia.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net v2] mlxbf_gige: call request_irq() after NAPI initialized
Date: Wed, 20 Mar 2024 11:27:36 +0000	[thread overview]
Message-ID: <20240320112736.GS185808@kernel.org> (raw)
In-Reply-To: <20240319181732.12878-1-davthompson@nvidia.com>

On Tue, Mar 19, 2024 at 02:17:32PM -0400, David Thompson wrote:
> The mlxbf_gige driver encounters a NULL pointer exception in
> mlxbf_gige_open() when kdump is enabled.  The sequence to reproduce
> the exception is as follows:
> a) enable kdump
> b) trigger kdump via "echo c > /proc/sysrq-trigger"
> c) kdump kernel executes
> d) kdump kernel loads mlxbf_gige module
> e) the mlxbf_gige module runs its open() as the
>    the "oob_net0" interface is brought up
> f) mlxbf_gige module will experience an exception
>    during its open(), something like:
> 
>      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>      Mem abort info:
>        ESR = 0x0000000086000004
>        EC = 0x21: IABT (current EL), IL = 32 bits
>        SET = 0, FnV = 0
>        EA = 0, S1PTW = 0
>        FSC = 0x04: level 0 translation fault
>      user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
>      [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>      Internal error: Oops: 0000000086000004 [#1] SMP
>      CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield #37-Ubuntu
>      Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
>      pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>      pc : 0x0
>      lr : __napi_poll+0x40/0x230
>      sp : ffff800008003e00
>      x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
>      x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
>      x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
>      x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
>      x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
>      x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
>      x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
>      x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
>      x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
>      x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
>      Call trace:
>       0x0
>       net_rx_action+0x178/0x360
>       __do_softirq+0x15c/0x428
>       __irq_exit_rcu+0xac/0xec
>       irq_exit+0x18/0x2c
>       handle_domain_irq+0x6c/0xa0
>       gic_handle_irq+0xec/0x1b0
>       call_on_irq_stack+0x20/0x2c
>       do_interrupt_handler+0x5c/0x70
>       el1_interrupt+0x30/0x50
>       el1h_64_irq_handler+0x18/0x2c
>       el1h_64_irq+0x7c/0x80
>       __setup_irq+0x4c0/0x950
>       request_threaded_irq+0xf4/0x1bc
>       mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
>       mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
>       __dev_open+0x100/0x220
>       __dev_change_flags+0x16c/0x1f0
>       dev_change_flags+0x2c/0x70
>       do_setlink+0x220/0xa40
>       __rtnl_newlink+0x56c/0x8a0
>       rtnl_newlink+0x58/0x84
>       rtnetlink_rcv_msg+0x138/0x3c4
>       netlink_rcv_skb+0x64/0x130
>       rtnetlink_rcv+0x20/0x30
>       netlink_unicast+0x2ec/0x360
>       netlink_sendmsg+0x278/0x490
>       __sock_sendmsg+0x5c/0x6c
>       ____sys_sendmsg+0x290/0x2d4
>       ___sys_sendmsg+0x84/0xd0
>       __sys_sendmsg+0x70/0xd0
>       __arm64_sys_sendmsg+0x2c/0x40
>       invoke_syscall+0x78/0x100
>       el0_svc_common.constprop.0+0x54/0x184
>       do_el0_svc+0x30/0xac
>       el0_svc+0x48/0x160
>       el0t_64_sync_handler+0xa4/0x12c
>       el0t_64_sync+0x1a4/0x1a8
>      Code: bad PC value
>      ---[ end trace 7d1c3f3bf9d81885 ]---
>      Kernel panic - not syncing: Oops: Fatal exception in interrupt
>      Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
>      PHYS_OFFSET: 0x80000000
>      CPU features: 0x0,000005c1,a3332a5a
>      Memory Limit: none
>      ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
> 
> The exception happens because there is a pending RX interrupt before the
> call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
> immediately after this request_irq() completes. The RX IRQ handler runs
> "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
> and "napi_enable()", both which happen later in the open() logic.
> 
> The logic in mlxbf_gige_open() has been re-ordered so that the
> request_irq() calls execute after NAPI is fully initialized.
> 
> Also, the logic in mlxbf_gige_open() was missing a call to phy_stop()
> in the error path, so that has been added.
> 
> Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver")
> Signed-off-by: David Thompson <davthompson@nvidia.com>
> Reviewed-by: Asmaa Mnebhi <asmaa@nvidia.com>
> ---
> v2
> - re-worded commit message and subject for clarity
> - updated commit message to mention that phy_stop() was added
>   to the error path in mlxbf_gige_open()

Thanks,

this patch looks good to me and appears to addresses the review provided by
others of v1.

Reviewed-by: Simon Horman <horms@kernel.org>


  reply	other threads:[~2024-03-20 11:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-19 18:17 [PATCH net v2] mlxbf_gige: call request_irq() after NAPI initialized David Thompson
2024-03-20 11:27 ` Simon Horman [this message]
2024-03-20 12:31 ` Jiri Pirko
2024-03-20 17:50   ` David Thompson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240320112736.GS185808@kernel.org \
    --to=horms@kernel.org \
    --cc=asmaa@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=davthompson@nvidia.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.