public inbox for linux-wireless@vger.kernel.org
 help / color / mirror / Atom feed
From: Christian Lamparter <chunkeey@gmail.com>
To: iamdevnull <mas-i@hotmail.de>,
	Christian Lamparter <chunkeey@googlemail.com>
Cc: linux-wireless@vger.kernel.org
Subject: Re: [PATCH 1/3] carlfw: add stability fixes for AR9170 USB adapters
Date: Sat, 21 Mar 2026 14:29:21 +0100	[thread overview]
Message-ID: <52b60edf-c571-426d-947c-11654f09224c@gmail.com> (raw)
In-Reply-To: <AM7PPF5613FA0B6784EE6B1CE5A663476329441A@AM7PPF5613FA0B6.EURP251.PROD.OUTLOOK.COM>

On 3/17/26 10:11 AM, iamdevnull wrote:
> From: Masi Osmani <mas-i@hotmail.de>
> 
> Five targeted fixes for stability issues observed on AR9170 USB hardware
> (AVM Fritz!WLAN USB Stick N) during extended operation:
> 
> 1. cam.c: Add timeout to CAM busy-wait loops. The original infinite
>     loops block the entire SH-2 processor if the MAC CAM engine stalls,
>     making the firmware unresponsive to USB commands. Add a 10000-cycle
>     timeout to prevent firmware lockup.

Hmm. The firmware sets up a watchdog (hardware timer). It barks (as in: it triggers
the reset vector) when not serviced.

Have you observed that the CAM busy-wait loops can prevent even the watchdog
to go off?


> 2. main.c (tally): Use hardware cycle counter AR9170_MAC_REG_CHANNEL_BUSY
>     for CCA busy time instead of single-bit polling via AR9170_MAC_BACKOFF_CCA.
>     The register is a read-and-clear counter like AR9170_MAC_REG_RX_TOTAL,
>     giving accurate channel utilization data to the driver's survey.


There were reasons why this was chosen. But sure, we can decide if it was a good one
or not.

You see, the problem with AR9170_MAC_REG_CHANNEL_BUSY... that there's also EXT_BUSY
#define AR9170_MAC_REG_CHANNEL_BUSY             (AR9170_MAC_REG_BASE + 0x6e8)
#define AR9170_MAC_REG_EXT_BUSY                 (AR9170_MAC_REG_BASE + 0x6ec)

Don't you need to do something (add it to the other channel struct) when in HT40 mode?
What would this look like?

The other reason was, that I don't know of any hardware counter that keeps track of
the tx-time (counter when hardware is transmitting).

I don't think we should let the hardware run the busy tally but let the software trying
to keep up and do the tx-transmit time tally (from my test back then, it couldn't).

So, we would need to remove the AR9170_MAC_BACKOFF_TX_PE together with fw.tally.tx_time.
And then you loose hostapd's automatic channel selection:
https://wireless.docs.kernel.org/en/latest/en/users/documentation/acs.html
(as it needs tx_time)

> 3. rf.c: On AGC calibration timeout, disable the baseband via
>     AR9170_PHY_ACTIVE_DIS so the driver sees a clean failure instead of
>     operating with a half-initialized PHY that produces corrupted frames.

Sure, if you can make this in a separate commit. I'll merge it.

> 
> 4. wlan.c (RX overrun): Lower the MAC reset threshold from 100% frame
>     loss to >50% frame loss. The original check (overruns == total) only
>     triggered at complete RX blindness, leaving the adapter nearly
>     non-functional at 95% loss without any recovery.

Sure, why not.

> 
> 5. wlan.c (PSM wake): After rf_psm() transitions from PHY_OFF to
>     PHY_ON, re-trigger TX DMA for queued frames. While the PHY was off,
>     hardware could not transmit and DMA trigger bits were consumed
>     without effect.

Yeah, no real surprise here. That said, I to this day disable PS

> 
> 6. wlantx.c: Move wlan_tx_ampdu_reset() after retry queue drain to
>     prevent clearing AMPDU state before retransmission completes.

Sure.

> 
> 7. usb/main.c: Implement usb_warm_reset() for USB bus reset handling.
>     Unlike reboot() which destroys USB PHY state requiring re-plug,
>     warm reset preserves the USB connection and jumps to start() for
>     full firmware re-init. Includes WLAN MAC, DMA, baseband, and PTA
>     reset with BB_WARM_RESET to prevent PHY lockups after repeated
>     warm resets.

The reason why reboot get called was that there seems to be some special
sauce in the sticks bootcode that could get wedged/stalled endpoints to
function again. I believe there's more magic involved which is not implemented
in the carl9170fw, that is necessary for this. One hint might be that I removed the
code that did usb1.1 <-> usb2.0 switching. This was because the original ar9170 was
such a #define nest that I didn't want to touch and the stick seemed fine without it.
(But then, I never observed usb1.1 <-> usb2.0 switching with the original otus driver
+firmware)


Cheers,
Christian

> Tested on AVM Fritz!WLAN N (AR9170, AR9001U) with kernel 6.18.12.
> 
> Signed-off-by: Masi Osmani <mas-i@hotmail.de>
> ---
>   carlfw/src/cam.c    | 17 +++++++----
>   carlfw/src/main.c   |  7 ++++-
>   carlfw/src/rf.c     |  9 ++++++
>   carlfw/src/wlan.c   | 33 +++++++++++++++------
>   carlfw/src/wlantx.c |  3 +-
>   carlfw/usb/main.c   | 63 +++++++++++++++++++++++++++++++++++++++--
>   6 files changed, 113 insertions(+), 19 deletions(-)
> 
> diff --git a/carlfw/src/cam.c b/carlfw/src/cam.c
> index 44cbddd..5273031 100644
> --- a/carlfw/src/cam.c
> +++ b/carlfw/src/cam.c
> @@ -42,31 +42,36 @@ static void enable_cam_user(const uint16_t userId)
>   		orl(AR9170_MAC_REG_CAM_ROLL_CALL_TBL_H, (((uint32_t) 1) << (userId - 32)));
>   }
> 
> +#define CAM_TIMEOUT	10000
> +
>   static void wait_for_cam_read_ready(void)
>   {
> -	while ((get(AR9170_MAC_REG_CAM_STATE) & AR9170_MAC_CAM_STATE_READ_PENDING) == 0) {
> -		/*
> -		 * wait
> -		 */
> +	unsigned int timeout = CAM_TIMEOUT;
> +
> +	while (((get(AR9170_MAC_REG_CAM_STATE) & AR9170_MAC_CAM_STATE_READ_PENDING) == 0) &&
> +	       timeout--) {
> +		/* wait */
>   	}
>   }
> 
>   static void wait_for_cam_write_ready(void)
>   {
> -	while ((get(AR9170_MAC_REG_CAM_STATE) & AR9170_MAC_CAM_STATE_WRITE_PENDING) == 0) {
> -		/*
> -		 * wait some more
> -		 */
> +	unsigned int timeout = CAM_TIMEOUT;
> +
> +	while (((get(AR9170_MAC_REG_CAM_STATE) & AR9170_MAC_CAM_STATE_WRITE_PENDING) == 0) &&
> +	       timeout--) {
> +		/* wait */
>   	}
>   }
> 
>   static void HW_CAM_Avail(void)
>   {
> +	unsigned int timeout = CAM_TIMEOUT;
>   	uint32_t tmpValue;
> 
>   	do {
>   		tmpValue = get(AR9170_MAC_REG_CAM_MODE);
> -	} while (tmpValue & AR9170_MAC_CAM_HOST_PENDING);
> +	} while ((tmpValue & AR9170_MAC_CAM_HOST_PENDING) && timeout--);
>   }
> 
>   static void HW_CAM_Write128(const uint32_t address, const uint32_t *data)
> diff --git a/carlfw/src/main.c b/carlfw/src/main.c
> index 8c13bf8..addc883 100644
> --- a/carlfw/src/main.c
> +++ b/carlfw/src/main.c
> @@ -94,11 +94,16 @@ static void tally_update(void)
> 
>   		fw.tally.active += delta;
> 
> +		/*
> +		 * Use HW cycle counter for CCA busy time instead of
> +		 * single-bit polling. AR9170_MAC_REG_CHANNEL_BUSY is
> +		 * a read-and-clear counter like AR9170_MAC_REG_RX_TOTAL.
> +		 */
> +		fw.tally.cca += get(AR9170_MAC_REG_CHANNEL_BUSY);
> +
>   		boff = get(AR9170_MAC_REG_BACKOFF_STATUS);
>   		if (boff & AR9170_MAC_BACKOFF_TX_PE)
>   			fw.tally.tx_time += delta;
> -		if (boff & AR9170_MAC_BACKOFF_CCA)
> -			fw.tally.cca += delta;
>   	}
>   #endif /* CONFIG_CARL9170FW_RADIO_FUNCTIONS */
>   	fw.tally_clock = time;
> diff --git a/carlfw/src/rf.c b/carlfw/src/rf.c
> index 5e8d3d8..d695742 100644
> --- a/carlfw/src/rf.c
> +++ b/carlfw/src/rf.c
> @@ -190,6 +190,15 @@ static uint32_t rf_init(const uint32_t delta_slope_coeff_exp,
> 
>   	ret = AGC_calibration(finiteLoopCount);
> 
> +	if (ret) {
> +		/*
> +		 * Calibration timed out — PHY is in an undefined state.
> +		 * Disable baseband so the driver sees a clean failure
> +		 * instead of operating with a half-initialized PHY.
> +		 */
> +		set(AR9170_PHY_REG_ACTIVE, AR9170_PHY_ACTIVE_DIS);
> +	}
> +
>   	set_channel_end();
>   	return ret;
>   }
> diff --git a/carlfw/src/wlan.c b/carlfw/src/wlan.c
> index 4e73a2b..7a01b09 100644
> --- a/carlfw/src/wlan.c
> +++ b/carlfw/src/wlan.c
> @@ -77,7 +77,13 @@ static void wlan_check_rx_overrun(void)
>   	fw.tally.rx_total += total = get(AR9170_MAC_REG_RX_TOTAL);
>   	fw.tally.rx_overrun += overruns = get(AR9170_MAC_REG_RX_OVERRUN);
>   	if (unlikely(overruns)) {
> -		if (overruns == total) {
> +		/*
> +		 * Trigger MAC reset when more than half of received
> +		 * frames are dropped.  The original check (overruns ==
> +		 * total) only fired at 100 % loss, leaving the adapter
> +		 * nearly blind at 95 % loss without any recovery.
> +		 */
> +		if (total && overruns > (total >> 1)) {
>   			DBG("RX Overrun");
>   			fw.wlan.mac_reset++;
>   		}
> @@ -100,10 +106,33 @@ static void handle_pretbtt(void)
>   	fw.wlan.cab_flush_time = get_clock_counter();
> 
>   #ifdef CONFIG_CARL9170FW_RADIO_FUNCTIONS
> -	rf_psm();
> +	{
> +		unsigned int prev_phy_state = fw.phy.state;
> +
> +		rf_psm();
> +
> +		/*
> +		 * After PSM wake, re-trigger TX DMA for queued frames.
> +		 * While the PHY was off, the hardware could not transmit
> +		 * and DMA trigger bits were consumed without effect.
> +		 */
> +		if (prev_phy_state == CARL9170_PHY_OFF &&
> +		    fw.phy.state == CARL9170_PHY_ON) {
> +			int i;
> +			uint32_t trigger = 0;
> +
> +			for (i = AR9170_TXQ0; i <= AR9170_TXQ_SPECIAL; i++) {
> +				if (!queue_empty(&fw.wlan.tx_queue[i]))
> +					trigger |= BIT(i);
> +			}
> +
> +			if (trigger)
> +				wlan_trigger(trigger);
> +		}
> 
> -	send_cmd_to_host(4, CARL9170_RSP_PRETBTT, 0x00,
> -			 (uint8_t *) &fw.phy.psm.state);
> +		send_cmd_to_host(4, CARL9170_RSP_PRETBTT, 0x00,
> +				 (uint8_t *) &fw.phy.psm.state);
> +	}
>   #endif /* CONFIG_CARL9170FW_RADIO_FUNCTIONS */
>   }
> 
> diff --git a/carlfw/src/wlantx.c b/carlfw/src/wlantx.c
> index a8d0952..2db111e 100644
> --- a/carlfw/src/wlantx.c
> +++ b/carlfw/src/wlantx.c
> @@ -471,11 +471,10 @@ void handle_wlan_tx_completion(void)
>   			}
>   		}
> 
> -		wlan_tx_ampdu_reset(i);
> -
>   		for_each_desc(desc, &fw.wlan.tx_retry)
>   			__wlan_tx(desc);
> 
> +		wlan_tx_ampdu_reset(i);
>   		wlan_tx_ampdu_end(i);
>   		if (!queue_empty(&fw.wlan.tx_queue[i]))
>   			wlan_trigger(BIT(i));
> diff --git a/carlfw/usb/main.c b/carlfw/usb/main.c
> index 4199a21..9147c80 100644
> --- a/carlfw/usb/main.c
> +++ b/carlfw/usb/main.c
> @@ -295,6 +295,63 @@ static void disable_watchdog(void)
>   	set(AR9170_TIMER_REG_WATCH_DOG, 0xffff);
>   }
> 
> +/*
> + * Warm reset: re-initialize firmware without destroying USB PHY state.
> + * This allows the host to re-enumerate the device after a USB bus reset
> + * without requiring a physical re-plug.
> + *
> + * Unlike reboot() which calls turn_power_off() and jump_to_bootcode(),
> + * this preserves the USB connection and jumps directly to start().
> + */
> +static void __noreturn usb_warm_reset(void)
> +{
> +	disable_watchdog();
> +
> +	/* Disable baseband to stop PHY activity */
> +	set(AR9170_PHY_REG_ACTIVE, AR9170_PHY_ACTIVE_DIS);
> +
> +	/* Stop WLAN DMA */
> +	set(AR9170_MAC_REG_DMA_TRIGGER, 0);
> +
> +	/* Stop USB DMA without full power-off */
> +	andl(AR9170_USB_REG_DMA_CTL, ~(AR9170_USB_DMA_CTL_ENABLE_TO_DEVICE |
> +					AR9170_USB_DMA_CTL_ENABLE_FROM_DEVICE));
> +
> +	/* Reset PTA component */
> +	orl(AR9170_PTA_REG_DMA_MODE_CTRL, AR9170_PTA_DMA_MODE_CTRL_RESET);
> +	andl(AR9170_PTA_REG_DMA_MODE_CTRL, ~AR9170_PTA_DMA_MODE_CTRL_RESET);
> +
> +	/* Reset MAC power state */
> +	set(AR9170_MAC_REG_POWER_STATE_CTRL,
> +	    AR9170_MAC_POWER_STATE_CTRL_RESET);
> +
> +	/*
> +	 * Hardware reset: WLAN MAC, DMA engine, and baseband.
> +	 * Without this, the PHY/RF can lock up after repeated
> +	 * warm resets, causing -ETIMEDOUT on register writes
> +	 * and cascading driver reloads (phy0 -> phy29 -> crash).
> +	 *
> +	 * BB_WARM_RESET resets PHY logic while preserving
> +	 * calibration-friendly state. start() -> init() will
> +	 * reconfigure everything via the driver anyway.
> +	 */
> +	set(AR9170_PWR_REG_RESET, AR9170_PWR_RESET_COMMIT_RESET_MASK |
> +				  AR9170_PWR_RESET_WLAN_MASK |
> +				  AR9170_PWR_RESET_DMA_MASK |
> +				  AR9170_PWR_RESET_BB_WARM_RESET);
> +	set(AR9170_PWR_REG_RESET, 0x0);
> +
> +	/* Clean DMA memory */
> +	memset(&dma_mem, 0, sizeof(dma_mem));
> +
> +	/* Clear firmware state */
> +	memset(&fw, 0, sizeof(fw));
> +
> +	/* Re-enter firmware from start() which does full init
> +	 * and sends CARL9170_RSP_BOOT to the host. */
> +	start();
> +}
> +
>   void __noreturn reboot(void)
>   {
>   	disable_watchdog();
> @@ -377,7 +434,7 @@ static void usb_handler(uint8_t usb_interrupt_level1)
>   		if (usb_interrupt_level2 & AR9170_USB_INTR_SRC7_USB_RESET) {
>   			usb_reset_ack();
>   			usb_reset_eps();
> -			reboot();
> +			usb_warm_reset();
>   		}
> 
>   		if (usb_interrupt_level2 & AR9170_USB_INTR_SRC7_USB_SUSPEND) {
> @@ -409,7 +466,7 @@ static void usb_handler(uint8_t usb_interrupt_level1)
>   			fw.suspend_mode = CARL9170_HOST_AWAKE;
>   			set(AR9170_USB_REG_WAKE_UP, 0);
> 
> -			reboot();
> +			usb_warm_reset();
>   		}
>   	}
>   }


      reply	other threads:[~2026-03-21 13:29 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17  9:11 [PATCH 1/3] carlfw: add stability fixes for AR9170 USB adapters iamdevnull
2026-03-21 13:29 ` Christian Lamparter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52b60edf-c571-426d-947c-11654f09224c@gmail.com \
    --to=chunkeey@gmail.com \
    --cc=chunkeey@googlemail.com \
    --cc=linux-wireless@vger.kernel.org \
    --cc=mas-i@hotmail.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox