From: Simon Horman <horms@kernel.org>
To: Brett Creeley <brett.creeley@amd.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, shannon.nelson@amd.com
Subject: Re: [PATCH net] pds_core: Fix pdsc_check_pci_health function to print warning
Date: Fri, 22 Mar 2024 11:09:59 +0000 [thread overview]
Message-ID: <20240322110959.GA372561@kernel.org> (raw)
In-Reply-To: <20240321063954.18711-1-brett.creeley@amd.com>
On Wed, Mar 20, 2024 at 11:39:54PM -0700, Brett Creeley wrote:
> When the driver notices fw_status == 0xff it tries to perform a PCI
> reset on itself via pci_reset_function() in the context of the driver's
> health thread. However, pdsc_reset_prepare calls
> pdsc_stop_health_thread(), which attempts to stop/flush the health
> thread. This results in a deadlock because the stop/flush will never
> complete since the driver called pci_reset_function() from the health
> thread context. Fix this by changing the pdsc_check_pci_health_function()
> to print a dev_warn() once every fw_down/fw_up cycle and requiring the
> user to perform a reset on the device via sysfs's reset interface,
> reloading the driver, rebinding the device, etc.
>
> Unloading the driver in the fw_down/dead state uncovered another issue,
> which can be seen in the following trace:
>
> WARNING: CPU: 51 PID: 6914 at kernel/workqueue.c:1450 __queue_work+0x358/0x440
> [...]
> RIP: 0010:__queue_work+0x358/0x440
> [...]
> Call Trace:
> <TASK>
> ? __warn+0x85/0x140
> ? __queue_work+0x358/0x440
> ? report_bug+0xfc/0x1e0
> ? handle_bug+0x3f/0x70
> ? exc_invalid_op+0x17/0x70
> ? asm_exc_invalid_op+0x1a/0x20
> ? __queue_work+0x358/0x440
> queue_work_on+0x28/0x30
> pdsc_devcmd_locked+0x96/0xe0 [pds_core]
> pdsc_devcmd_reset+0x71/0xb0 [pds_core]
> pdsc_teardown+0x51/0xe0 [pds_core]
> pdsc_remove+0x106/0x200 [pds_core]
> pci_device_remove+0x37/0xc0
> device_release_driver_internal+0xae/0x140
> driver_detach+0x48/0x90
> bus_remove_driver+0x6d/0xf0
> pci_unregister_driver+0x2e/0xa0
> pdsc_cleanup_module+0x10/0x780 [pds_core]
> __x64_sys_delete_module+0x142/0x2b0
> ? syscall_trace_enter.isra.18+0x126/0x1a0
> do_syscall_64+0x3b/0x90
> entry_SYSCALL_64_after_hwframe+0x72/0xdc
> RIP: 0033:0x7fbd9d03a14b
> [...]
>
> Fix this by preventing the devcmd reset if the FW is not running.
>
> Fixes: d9407ff11809 ("pds_core: Prevent health thread from running during reset/remove")
> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
> Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Simon Horman <horms@kernel.org>
next prev parent reply other threads:[~2024-03-22 11:10 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-21 6:39 [PATCH net] pds_core: Fix pdsc_check_pci_health function to print warning Brett Creeley
2024-03-22 11:09 ` Simon Horman [this message]
2024-03-23 1:02 ` Jakub Kicinski
2024-04-02 17:02 ` Brett Creeley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240322110959.GA372561@kernel.org \
--to=horms@kernel.org \
--cc=brett.creeley@amd.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shannon.nelson@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.