From: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
To: Matthew Leach <matthew.leach@collabora.com>,
Jeff Johnson <jjohnson@kernel.org>
Cc: linux-wireless@vger.kernel.org, ath11k@lists.infradead.org,
linux-kernel@vger.kernel.org, kernel@collabora.com
Subject: Re: [PATCH RESEND RFC 3/3] net: ath11k: add lockup simulation via debugfs
Date: Tue, 12 May 2026 16:19:04 -0700 [thread overview]
Message-ID: <2edfe25b-2190-4463-bbba-b2468d872a4c@oss.qualcomm.com> (raw)
In-Reply-To: <20260330-ath11k-lockup-fixes-v1-3-7ed21095c2c4@collabora.com>
On 3/30/2026 3:05 AM, Matthew Leach wrote:
> Add a debugfs command to simulate a firmware lockup.
>
> This does not hang the hardware. Instead, it forces the driver down an
> error path that reproduces the sequence observed during real lockups:
>
> ath11k_pci 0000:03:00.0: failed to transmit frame -12
> ath11k_pci 0000:03:00.0: failed to transmit frame -12
> ath11k_pci 0000:03:00.0: failed to transmit frame -12
> ...
> ath11k_pci 0000:03:00.0: wmi command 28680 timeout
> ath11k_pci 0000:03:00.0: failed to submit WMI_MGMT_TX_SEND_CMDID cmd
> ath11k_pci 0000:03:00.0: failed to send mgmt frame: -11
>
> This allows validation of the firmware lockup detection and recovery
> mechanism without requiring a real hardware failure.
>
> Signed-off-by: Matthew Leach <matthew.leach@collabora.com>
> ---
> drivers/net/wireless/ath/ath11k/core.h | 1 +
> drivers/net/wireless/ath/ath11k/debugfs.c | 7 ++++++-
> drivers/net/wireless/ath/ath11k/hal.c | 7 +++++--
> drivers/net/wireless/ath/ath11k/htc.c | 2 +-
> drivers/net/wireless/ath/ath11k/wmi.c | 6 +++++-
> 5 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/wireless/ath/ath11k/core.h b/drivers/net/wireless/ath/ath11k/core.h
> index 221dcd23b3dd..44b02ae1e85b 100644
> --- a/drivers/net/wireless/ath/ath11k/core.h
> +++ b/drivers/net/wireless/ath/ath11k/core.h
> @@ -1041,6 +1041,7 @@ struct ath11k_base {
> struct ath11k_dbring_cap *db_caps;
> u32 num_db_cap;
> u64 last_frame_tx_error_jiffies;
> + bool simulate_lockup;
>
> /* To synchronize 11d scan vdev id */
> struct mutex vdev_id_11d_lock;
> diff --git a/drivers/net/wireless/ath/ath11k/debugfs.c b/drivers/net/wireless/ath/ath11k/debugfs.c
> index 0c1138407838..ca0b72a3e0b0 100644
> --- a/drivers/net/wireless/ath/ath11k/debugfs.c
> +++ b/drivers/net/wireless/ath/ath11k/debugfs.c
> @@ -356,7 +356,8 @@ static ssize_t ath11k_read_simulate_fw_crash(struct file *file,
> const char buf[] =
> "To simulate firmware crash write one of the keywords to this file:\n"
> "`assert` - this will send WMI_FORCE_FW_HANG_CMDID to firmware to cause assert.\n"
> - "`hw-restart` - this will simply queue hw restart without fw/hw actually crashing.\n";
> + "`hw-restart` - this will simply queue hw restart without fw/hw actually crashing.\n"
> + "`lockup` - simulate a firmware lockup without the h/w actually hanging.\n";
>
> return simple_read_from_buffer(user_buf, count, ppos, buf, strlen(buf));
> }
> @@ -413,6 +414,10 @@ static ssize_t ath11k_write_simulate_fw_crash(struct file *file,
> ath11k_info(ab, "user requested hw restart\n");
> queue_work(ab->workqueue_aux, &ab->reset_work);
> ret = 0;
> + } else if (!strcmp(buf, "lockup")) {
> + ath11k_info(ab, "simulating lockup\n");
> + ab->simulate_lockup = true;
> + ret = 0;
> } else {
> ret = -EINVAL;
> goto exit;
> diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
> index e821e5a62c1c..e01fb17a4734 100644
> --- a/drivers/net/wireless/ath/ath11k/hal.c
> +++ b/drivers/net/wireless/ath/ath11k/hal.c
> @@ -691,7 +691,7 @@ int ath11k_hal_srng_dst_num_free(struct ath11k_base *ab, struct hal_srng *srng,
>
> tp = srng->u.dst_ring.tp;
>
> - if (sync_hw_ptr) {
> + if (sync_hw_ptr && !ab->simulate_lockup) {
> hp = *srng->u.dst_ring.hp_addr;
> srng->u.dst_ring.cached_hp = hp;
> } else {
> @@ -743,7 +743,7 @@ u32 *ath11k_hal_srng_src_get_next_entry(struct ath11k_base *ab,
> */
> next_hp = (srng->u.src_ring.hp + srng->entry_size) % srng->ring_size;
>
> - if (next_hp == srng->u.src_ring.cached_tp)
> + if (next_hp == srng->u.src_ring.cached_tp || ab->simulate_lockup)
> return NULL;
>
> desc = srng->ring_base_vaddr + srng->u.src_ring.hp;
> @@ -828,6 +828,9 @@ void ath11k_hal_srng_access_begin(struct ath11k_base *ab, struct hal_srng *srng)
>
> lockdep_assert_held(&srng->lock);
>
> + if (ab->simulate_lockup)
> + return;
> +
> if (srng->ring_dir == HAL_SRNG_DIR_SRC) {
> srng->u.src_ring.cached_tp =
> *(volatile u32 *)srng->u.src_ring.tp_addr;
> diff --git a/drivers/net/wireless/ath/ath11k/htc.c b/drivers/net/wireless/ath/ath11k/htc.c
> index 4571d01cc33d..b05d04a1f5e8 100644
> --- a/drivers/net/wireless/ath/ath11k/htc.c
> +++ b/drivers/net/wireless/ath/ath11k/htc.c
> @@ -208,7 +208,7 @@ static int ath11k_htc_process_trailer(struct ath11k_htc *htc,
> break;
> }
>
> - if (ab->hw_params.credit_flow) {
> + if (ab->hw_params.credit_flow && !ab->simulate_lockup) {
> switch (record->hdr.id) {
> case ATH11K_HTC_RECORD_CREDITS:
> len = sizeof(struct ath11k_htc_credit_report);
> diff --git a/drivers/net/wireless/ath/ath11k/wmi.c b/drivers/net/wireless/ath/ath11k/wmi.c
> index 7d9f0bcbb3b0..27d6d4a2f803 100644
> --- a/drivers/net/wireless/ath/ath11k/wmi.c
> +++ b/drivers/net/wireless/ath/ath11k/wmi.c
> @@ -345,9 +345,13 @@ int ath11k_wmi_cmd_send(struct ath11k_pdev_wmi *wmi, struct sk_buff *skb,
>
> if (time_in_range64(ab->last_frame_tx_error_jiffies,
> range_start, jiffies_64) &&
> - queue_work(ab->workqueue_aux, &ab->reset_work))
> + queue_work(ab->workqueue_aux, &ab->reset_work)) {
> ath11k_err(wmi_ab->ab,
> "Firmware lockup detected. Resetting.");
> +
> + /* Assume that reset gets us out of lockup. */
> + ab->simulate_lockup = false;
> + }
> }
>
> if (ret == -ENOBUFS)
>
My 1st impression of this patch is that the datapath folks are not going to
like the ab->simulate_lockup checks in the hot path. But I'll let the
engineers speak for themselves.
/jeff
prev parent reply other threads:[~2026-05-12 23:19 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-30 10:05 [RFC PATCH RESEND 0/3] net: ath11k: Firmware lockup detection & mitigation Matthew Leach
2026-03-30 10:05 ` [PATCH RESEND RFC 1/3] net: ath11k: fix redundant reset from stale pending workqueue bit Matthew Leach
2026-05-12 23:09 ` Jeff Johnson
2026-03-30 10:05 ` [PATCH RESEND RFC 2/3] net: ath11k: add firmware lockup detection and recovery Matthew Leach
2026-03-30 10:05 ` [PATCH RESEND RFC 3/3] net: ath11k: add lockup simulation via debugfs Matthew Leach
2026-05-12 23:19 ` Jeff Johnson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2edfe25b-2190-4463-bbba-b2468d872a4c@oss.qualcomm.com \
--to=jeff.johnson@oss.qualcomm.com \
--cc=ath11k@lists.infradead.org \
--cc=jjohnson@kernel.org \
--cc=kernel@collabora.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-wireless@vger.kernel.org \
--cc=matthew.leach@collabora.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox