linux-wireless.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ath10k "failed to install key for vdev 0 peer <mac>: -110"
@ 2024-07-12 13:11 James Prestwood
  2024-07-15 11:54 ` James Prestwood
  2024-08-16 10:19 ` Baochen Qiang
  0 siblings, 2 replies; 18+ messages in thread
From: James Prestwood @ 2024-07-12 13:11 UTC (permalink / raw)
  To: open list:MEDIATEK MT76 WIRELESS LAN DRIVER, ath10k

Hi,

I've seen this error mentioned on random forum posts, but its always 
associated with a kernel crash/warning or some very obvious negative 
behavior. I've noticed this occasionally and at one location very 
frequently during FT roaming, specifically just after CMD_ASSOCIATE is 
issued. For our company run networks I'm not seeing any negative 
behavior apart from a 3 second delay in sending the re-association frame 
since the kernel waits for this timeout. But we have some networks our 
clients run on that we do not own (different vendor), and we are seeing 
association timeouts after this error occurs and in some cases the AP is 
sending a deauthentication with reason code 8 instead of replying with a 
reassociation reply and an error status, which is quite odd.

We are chasing down this with the vendor of these APs as well, but the 
behavior always happens after we see this key removal failure/timeout on 
the client side. So it would appear there is potentially a problem on 
both the client and AP. My guess is _something_ about the re-association 
frame changes when this error is encountered, but I cannot see how that 
would be the case. We are working to get PCAPs now, but its through a 
3rd party, so that timing is out of my control.

 From the kernel code this error would appear innocuous, the old key is 
failing to be removed but it gets immediately replaced by the new key. 
And we don't see that addition failing. Am I understanding that logic 
correctly? I.e. this logic:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503

Below are a few kernel logs of the issue happening, some with the deauth 
being sent by the AP, some with just timeouts:

--- No deauth frame sent, just association timeouts after the error ---

Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new 
assoc to <new BSS>
Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key 
for vdev 0 peer <previous BSS>: -110
Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) 
from hardware (-110)
Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
Jul 11 00:05:36 kernel: wlan0: authenticated
Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 
status=0 aid=16)
Jul 11 00:05:36 kernel: wlan0: associated

--- Deauth frame sent amidst the association timeouts ---

Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new 
assoc to <new BSS>
Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key 
for vdev 0 peer <previous BSS>: -110
Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) 
from hardware (-110)
Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while 
associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
Jul 11 00:43:24 kernel: wlan0: authenticated
Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 
status=0 aid=101)
Jul 11 00:43:24 kernel: wlan0: associated


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-07-12 13:11 ath10k "failed to install key for vdev 0 peer <mac>: -110" James Prestwood
@ 2024-07-15 11:54 ` James Prestwood
  2024-08-12 17:33   ` James Prestwood
  2024-08-16 10:19 ` Baochen Qiang
  1 sibling, 1 reply; 18+ messages in thread
From: James Prestwood @ 2024-07-15 11:54 UTC (permalink / raw)
  To: open list:MEDIATEK MT76 WIRELESS LAN DRIVER, ath10k

I forgot to mention:

QCA6174 hw3.0 firmware WLAN.RM.4.4.1-00288-

The higher rate of frequency is happening on kernel 5.15, although as I 
said only at one location with a different AP vendor. We have many other 
5.15 devices with significantly less instances of this happening. I also 
checked a few of our newer software releases using kernel 6.2, and the 
timeout occurred there as well, but no real impact (no disconnect, no 
assoc timeout).

On 7/12/24 6:11 AM, James Prestwood wrote:
> Hi,
>
> I've seen this error mentioned on random forum posts, but its always 
> associated with a kernel crash/warning or some very obvious negative 
> behavior. I've noticed this occasionally and at one location very 
> frequently during FT roaming, specifically just after CMD_ASSOCIATE is 
> issued. For our company run networks I'm not seeing any negative 
> behavior apart from a 3 second delay in sending the re-association 
> frame since the kernel waits for this timeout. But we have some 
> networks our clients run on that we do not own (different vendor), and 
> we are seeing association timeouts after this error occurs and in some 
> cases the AP is sending a deauthentication with reason code 8 instead 
> of replying with a reassociation reply and an error status, which is 
> quite odd.
>
> We are chasing down this with the vendor of these APs as well, but the 
> behavior always happens after we see this key removal failure/timeout 
> on the client side. So it would appear there is potentially a problem 
> on both the client and AP. My guess is _something_ about the 
> re-association frame changes when this error is encountered, but I 
> cannot see how that would be the case. We are working to get PCAPs 
> now, but its through a 3rd party, so that timing is out of my control.
>
> From the kernel code this error would appear innocuous, the old key is 
> failing to be removed but it gets immediately replaced by the new key. 
> And we don't see that addition failing. Am I understanding that logic 
> correctly? I.e. this logic:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503 
>
>
> Below are a few kernel logs of the issue happening, some with the 
> deauth being sent by the AP, some with just timeouts:
>
> --- No deauth frame sent, just association timeouts after the error ---
>
> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for 
> new assoc to <new BSS>
> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key 
> for vdev 0 peer <previous BSS>: -110
> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous 
> BSS>) from hardware (-110)
> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
> Jul 11 00:05:36 kernel: wlan0: authenticated
> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> 
> (capab=0x1111 status=0 aid=16)
> Jul 11 00:05:36 kernel: wlan0: associated
>
> --- Deauth frame sent amidst the association timeouts ---
>
> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for 
> new assoc to <new BSS>
> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key 
> for vdev 0 peer <previous BSS>: -110
> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous 
> BSS>) from hardware (-110)
> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while 
> associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
> Jul 11 00:43:24 kernel: wlan0: authenticated
> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> 
> (capab=0x1111 status=0 aid=101)
> Jul 11 00:43:24 kernel: wlan0: associated
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-07-15 11:54 ` James Prestwood
@ 2024-08-12 17:33   ` James Prestwood
  2024-08-15 14:03     ` Kalle Valo
  0 siblings, 1 reply; 18+ messages in thread
From: James Prestwood @ 2024-08-12 17:33 UTC (permalink / raw)
  To: open list:MEDIATEK MT76 WIRELESS LAN DRIVER, ath10k

Hi,

So I have no resolution to this (trying to get the AP vendor to chase it 
down), but I'm toying with the idea of trying to work around whatever 
issue the AP is having when this occurs. The only thing I can think of 
is that there is a 3 second delay between the authentication and 
reassociation, and perhaps this is causing some timeout in the AP and in 
turn the deauth.

I'm wondering how long it should take to add/remove a key from the 
firmware? 3 seconds seems very long, and I question if this timeout is 
really necessary or was just chosen arbitrarily? Is this something that 
could be lowered down to e.g. 1 second without negative impacts? The 
code in question is in ath10k_install_key:

ret = ath10k_send_key(arvif, key, cmd, macaddr, flags);
if (ret)
     return ret;

time_left = wait_for_completion_timeout(&ar->install_key_done, 3 * HZ);
if (time_left == 0)
     return -ETIMEDOUT;

Thanks,

James

On 7/15/24 4:54 AM, James Prestwood wrote:
> I forgot to mention:
>
> QCA6174 hw3.0 firmware WLAN.RM.4.4.1-00288-
>
> The higher rate of frequency is happening on kernel 5.15, although as 
> I said only at one location with a different AP vendor. We have many 
> other 5.15 devices with significantly less instances of this 
> happening. I also checked a few of our newer software releases using 
> kernel 6.2, and the timeout occurred there as well, but no real impact 
> (no disconnect, no assoc timeout).
>
> On 7/12/24 6:11 AM, James Prestwood wrote:
>> Hi,
>>
>> I've seen this error mentioned on random forum posts, but its always 
>> associated with a kernel crash/warning or some very obvious negative 
>> behavior. I've noticed this occasionally and at one location very 
>> frequently during FT roaming, specifically just after CMD_ASSOCIATE 
>> is issued. For our company run networks I'm not seeing any negative 
>> behavior apart from a 3 second delay in sending the re-association 
>> frame since the kernel waits for this timeout. But we have some 
>> networks our clients run on that we do not own (different vendor), 
>> and we are seeing association timeouts after this error occurs and in 
>> some cases the AP is sending a deauthentication with reason code 8 
>> instead of replying with a reassociation reply and an error status, 
>> which is quite odd.
>>
>> We are chasing down this with the vendor of these APs as well, but 
>> the behavior always happens after we see this key removal 
>> failure/timeout on the client side. So it would appear there is 
>> potentially a problem on both the client and AP. My guess is 
>> _something_ about the re-association frame changes when this error is 
>> encountered, but I cannot see how that would be the case. We are 
>> working to get PCAPs now, but its through a 3rd party, so that timing 
>> is out of my control.
>>
>> From the kernel code this error would appear innocuous, the old key 
>> is failing to be removed but it gets immediately replaced by the new 
>> key. And we don't see that addition failing. Am I understanding that 
>> logic correctly? I.e. this logic:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503 
>>
>>
>> Below are a few kernel logs of the issue happening, some with the 
>> deauth being sent by the AP, some with just timeouts:
>>
>> --- No deauth frame sent, just association timeouts after the error ---
>>
>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for 
>> new assoc to <new BSS>
>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install 
>> key for vdev 0 peer <previous BSS>: -110
>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous 
>> BSS>) from hardware (-110)
>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>> Jul 11 00:05:36 kernel: wlan0: authenticated
>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> 
>> (capab=0x1111 status=0 aid=16)
>> Jul 11 00:05:36 kernel: wlan0: associated
>>
>> --- Deauth frame sent amidst the association timeouts ---
>>
>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for 
>> new assoc to <new BSS>
>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install 
>> key for vdev 0 peer <previous BSS>: -110
>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous 
>> BSS>) from hardware (-110)
>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while 
>> associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>> Jul 11 00:43:24 kernel: wlan0: authenticated
>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> 
>> (capab=0x1111 status=0 aid=101)
>> Jul 11 00:43:24 kernel: wlan0: associated
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-08-12 17:33   ` James Prestwood
@ 2024-08-15 14:03     ` Kalle Valo
  2024-08-15 15:47       ` James Prestwood
  0 siblings, 1 reply; 18+ messages in thread
From: Kalle Valo @ 2024-08-15 14:03 UTC (permalink / raw)
  To: James Prestwood; +Cc: linux-wireless, ath10k

James Prestwood <prestwoj@gmail.com> writes:

> Hi,
>
> So I have no resolution to this (trying to get the AP vendor to chase
> it down), but I'm toying with the idea of trying to work around
> whatever issue the AP is having when this occurs. The only thing I can
> think of is that there is a 3 second delay between the authentication
> and reassociation, and perhaps this is causing some timeout in the AP
> and in turn the deauth.
>
> I'm wondering how long it should take to add/remove a key from the
> firmware? 3 seconds seems very long, and I question if this timeout is
> really necessary or was just chosen arbitrarily? Is this something
> that could be lowered down to e.g. 1 second without negative impacts?
> The code in question is in ath10k_install_key:
>
> ret = ath10k_send_key(arvif, key, cmd, macaddr, flags);
> if (ret)
>     return ret;
>
> time_left = wait_for_completion_timeout(&ar->install_key_done, 3 * HZ);
> if (time_left == 0)
>     return -ETIMEDOUT;

I can't remember anymore but I'm guessing the 3s delay was chosen
arbitrarily just to be on the safe side and not get unnecessary
timeouts.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-08-15 14:03     ` Kalle Valo
@ 2024-08-15 15:47       ` James Prestwood
  2024-08-15 15:58         ` Kalle Valo
  0 siblings, 1 reply; 18+ messages in thread
From: James Prestwood @ 2024-08-15 15:47 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless, ath10k

Hi Kalle,

On 8/15/24 7:03 AM, Kalle Valo wrote:
> James Prestwood <prestwoj@gmail.com> writes:
>
>> Hi,
>>
>> So I have no resolution to this (trying to get the AP vendor to chase
>> it down), but I'm toying with the idea of trying to work around
>> whatever issue the AP is having when this occurs. The only thing I can
>> think of is that there is a 3 second delay between the authentication
>> and reassociation, and perhaps this is causing some timeout in the AP
>> and in turn the deauth.
>>
>> I'm wondering how long it should take to add/remove a key from the
>> firmware? 3 seconds seems very long, and I question if this timeout is
>> really necessary or was just chosen arbitrarily? Is this something
>> that could be lowered down to e.g. 1 second without negative impacts?
>> The code in question is in ath10k_install_key:
>>
>> ret = ath10k_send_key(arvif, key, cmd, macaddr, flags);
>> if (ret)
>>      return ret;
>>
>> time_left = wait_for_completion_timeout(&ar->install_key_done, 3 * HZ);
>> if (time_left == 0)
>>      return -ETIMEDOUT;
> I can't remember anymore but I'm guessing the 3s delay was chosen
> arbitrarily just to be on the safe side and not get unnecessary
> timeouts.
>
Thanks, I have reduced this to 1 second and have had it running on a 
client for ~19 hours. Still am seeing the timeouts, but no more than 
prior. And even with the timeouts the roams are successful.

After doing more looking in the spec I did see that there is 
dot11ReassociationDeadline which may be coming into play here. Of course 
these APs aren't advertising any TIE or even support FT resource 
requests that so its impossible to know for sure, and hostapd AFAICT 
doesn't enforce any deadlines even if you set it... But in any case the 
timeout reduction is helping immensely and avoiding a disconnect.

Thanks,

James




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-08-15 15:47       ` James Prestwood
@ 2024-08-15 15:58         ` Kalle Valo
  2024-08-15 16:38           ` James Prestwood
  0 siblings, 1 reply; 18+ messages in thread
From: Kalle Valo @ 2024-08-15 15:58 UTC (permalink / raw)
  To: James Prestwood; +Cc: linux-wireless, ath10k

James Prestwood <prestwoj@gmail.com> writes:

> On 8/15/24 7:03 AM, Kalle Valo wrote:
>> James Prestwood <prestwoj@gmail.com> writes:
>>
>>> Hi,
>>>
>>> So I have no resolution to this (trying to get the AP vendor to chase
>>> it down), but I'm toying with the idea of trying to work around
>>> whatever issue the AP is having when this occurs. The only thing I can
>>> think of is that there is a 3 second delay between the authentication
>>> and reassociation, and perhaps this is causing some timeout in the AP
>>> and in turn the deauth.
>>>
>>> I'm wondering how long it should take to add/remove a key from the
>>> firmware? 3 seconds seems very long, and I question if this timeout is
>>> really necessary or was just chosen arbitrarily? Is this something
>>> that could be lowered down to e.g. 1 second without negative impacts?
>>> The code in question is in ath10k_install_key:
>>>
>>> ret = ath10k_send_key(arvif, key, cmd, macaddr, flags);
>>> if (ret)
>>>      return ret;
>>>
>>> time_left = wait_for_completion_timeout(&ar->install_key_done, 3 * HZ);
>>> if (time_left == 0)
>>>      return -ETIMEDOUT;
>> I can't remember anymore but I'm guessing the 3s delay was chosen
>> arbitrarily just to be on the safe side and not get unnecessary
>> timeouts.
>
> Thanks, I have reduced this to 1 second and have had it running on a
> client for ~19 hours. Still am seeing the timeouts, but no more than
> prior. And even with the timeouts the roams are successful.
>
> After doing more looking in the spec I did see that there is
> dot11ReassociationDeadline which may be coming into play here. Of
> course these APs aren't advertising any TIE or even support FT
> resource requests that so its impossible to know for sure, and hostapd
> AFAICT doesn't enforce any deadlines even if you set it... But in any
> case the timeout reduction is helping immensely and avoiding a
> disconnect.

Yeah, reducing the time out might a good option. 3s feels like overkill,
especially if 1s timeout passes your tests.

But I do wonder what's the root cause here. Are you saying that SET_KEY
always works for you?

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-08-15 15:58         ` Kalle Valo
@ 2024-08-15 16:38           ` James Prestwood
  0 siblings, 0 replies; 18+ messages in thread
From: James Prestwood @ 2024-08-15 16:38 UTC (permalink / raw)
  To: Kalle Valo; +Cc: linux-wireless, ath10k

On 8/15/24 8:58 AM, Kalle Valo wrote:
> James Prestwood <prestwoj@gmail.com> writes:
>
>> On 8/15/24 7:03 AM, Kalle Valo wrote:
>>> James Prestwood <prestwoj@gmail.com> writes:
>>>
>>>> Hi,
>>>>
>>>> So I have no resolution to this (trying to get the AP vendor to chase
>>>> it down), but I'm toying with the idea of trying to work around
>>>> whatever issue the AP is having when this occurs. The only thing I can
>>>> think of is that there is a 3 second delay between the authentication
>>>> and reassociation, and perhaps this is causing some timeout in the AP
>>>> and in turn the deauth.
>>>>
>>>> I'm wondering how long it should take to add/remove a key from the
>>>> firmware? 3 seconds seems very long, and I question if this timeout is
>>>> really necessary or was just chosen arbitrarily? Is this something
>>>> that could be lowered down to e.g. 1 second without negative impacts?
>>>> The code in question is in ath10k_install_key:
>>>>
>>>> ret = ath10k_send_key(arvif, key, cmd, macaddr, flags);
>>>> if (ret)
>>>>       return ret;
>>>>
>>>> time_left = wait_for_completion_timeout(&ar->install_key_done, 3 * HZ);
>>>> if (time_left == 0)
>>>>       return -ETIMEDOUT;
>>> I can't remember anymore but I'm guessing the 3s delay was chosen
>>> arbitrarily just to be on the safe side and not get unnecessary
>>> timeouts.
>> Thanks, I have reduced this to 1 second and have had it running on a
>> client for ~19 hours. Still am seeing the timeouts, but no more than
>> prior. And even with the timeouts the roams are successful.
>>
>> After doing more looking in the spec I did see that there is
>> dot11ReassociationDeadline which may be coming into play here. Of
>> course these APs aren't advertising any TIE or even support FT
>> resource requests that so its impossible to know for sure, and hostapd
>> AFAICT doesn't enforce any deadlines even if you set it... But in any
>> case the timeout reduction is helping immensely and avoiding a
>> disconnect.
> Yeah, reducing the time out might a good option. 3s feels like overkill,
> especially if 1s timeout passes your tests.
>
> But I do wonder what's the root cause here. Are you saying that SET_KEY
> always works for you?

Yeah its only key removal that fails, we proceed on and adding the new 
key succeeds 100% of the time and in most cases this is fine, except 
these picky APs that don't like the 3 second delay.

Fwiw this seemed to start after going from 5.15 -> 6.2, which is a 
needle in a haystack, I know. Makes me think there is a race somewhere 
(like in the firmware) and the command timing changed just enough 
between 5.15 and 6.2 that it happens more frequently.

Thanks,

James


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-07-12 13:11 ath10k "failed to install key for vdev 0 peer <mac>: -110" James Prestwood
  2024-07-15 11:54 ` James Prestwood
@ 2024-08-16 10:19 ` Baochen Qiang
  2024-08-16 12:04   ` James Prestwood
  1 sibling, 1 reply; 18+ messages in thread
From: Baochen Qiang @ 2024-08-16 10:19 UTC (permalink / raw)
  To: James Prestwood, open list:MEDIATEK MT76 WIRELESS LAN DRIVER,
	ath10k



On 7/12/2024 9:11 PM, James Prestwood wrote:
> Hi,
> 
> I've seen this error mentioned on random forum posts, but its always associated with a kernel crash/warning or some very obvious negative behavior. I've noticed this occasionally and at one location very frequently during FT roaming, specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any negative behavior apart from a 3 second delay in sending the re-association frame since the kernel waits for this timeout. But we have some networks our clients run on that we do not own (different vendor), and we are seeing association timeouts after this error occurs and in some cases the AP is sending a deauthentication with reason code 8 instead of replying with a reassociation reply and an error status, which is quite odd.
> 
> We are chasing down this with the vendor of these APs as well, but the behavior always happens after we see this key removal failure/timeout on the client side. So it would appear there is potentially a problem on both the client and AP. My guess is _something_ about the re-association frame changes when this error is encountered, but I cannot see how that would be the case. We are working to get PCAPs now, but its through a 3rd party, so that timing is out of my control.
> 
> From the kernel code this error would appear innocuous, the old key is failing to be removed but it gets immediately replaced by the new key. And we don't see that addition failing. Am I understanding that logic correctly? I.e. this logic:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
> 
> Below are a few kernel logs of the issue happening, some with the deauth being sent by the AP, some with just timeouts:
> 
> --- No deauth frame sent, just association timeouts after the error ---
> 
> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
> Jul 11 00:05:36 kernel: wlan0: authenticated
> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=16)
> Jul 11 00:05:36 kernel: wlan0: associated
> 
> --- Deauth frame sent amidst the association timeouts ---
> 
> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
> Jul 11 00:43:24 kernel: wlan0: authenticated
> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=101)
> Jul 11 00:43:24 kernel: wlan0: associated
> 
Hi James, this is QCA6174, right? could you also share firmware version?

> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-08-16 10:19 ` Baochen Qiang
@ 2024-08-16 12:04   ` James Prestwood
  2024-09-04 18:03     ` Jeff Johnson
  0 siblings, 1 reply; 18+ messages in thread
From: James Prestwood @ 2024-08-16 12:04 UTC (permalink / raw)
  To: Baochen Qiang, open list:MEDIATEK MT76 WIRELESS LAN DRIVER,
	ath10k

Hi Baochen,

On 8/16/24 3:19 AM, Baochen Qiang wrote:
>
> On 7/12/2024 9:11 PM, James Prestwood wrote:
>> Hi,
>>
>> I've seen this error mentioned on random forum posts, but its always associated with a kernel crash/warning or some very obvious negative behavior. I've noticed this occasionally and at one location very frequently during FT roaming, specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any negative behavior apart from a 3 second delay in sending the re-association frame since the kernel waits for this timeout. But we have some networks our clients run on that we do not own (different vendor), and we are seeing association timeouts after this error occurs and in some cases the AP is sending a deauthentication with reason code 8 instead of replying with a reassociation reply and an error status, which is quite odd.
>>
>> We are chasing down this with the vendor of these APs as well, but the behavior always happens after we see this key removal failure/timeout on the client side. So it would appear there is potentially a problem on both the client and AP. My guess is _something_ about the re-association frame changes when this error is encountered, but I cannot see how that would be the case. We are working to get PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>
>>  From the kernel code this error would appear innocuous, the old key is failing to be removed but it gets immediately replaced by the new key. And we don't see that addition failing. Am I understanding that logic correctly? I.e. this logic:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
>>
>> Below are a few kernel logs of the issue happening, some with the deauth being sent by the AP, some with just timeouts:
>>
>> --- No deauth frame sent, just association timeouts after the error ---
>>
>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>> Jul 11 00:05:36 kernel: wlan0: authenticated
>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=16)
>> Jul 11 00:05:36 kernel: wlan0: associated
>>
>> --- Deauth frame sent amidst the association timeouts ---
>>
>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>> Jul 11 00:43:24 kernel: wlan0: authenticated
>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=101)
>> Jul 11 00:43:24 kernel: wlan0: associated
>>
> Hi James, this is QCA6174, right? could you also share firmware version?

Yep, using:

qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp 
crc32 bf907c7c

I did try in one instance the latest firmware, 309, and still saw the 
same behavior but 288 is what all our devices are running.

Thanks,

James


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-08-16 12:04   ` James Prestwood
@ 2024-09-04 18:03     ` Jeff Johnson
  2024-09-05  1:46       ` Baochen Qiang
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Johnson @ 2024-09-04 18:03 UTC (permalink / raw)
  To: James Prestwood, Baochen Qiang, linux-wireless, ath10k

On 8/16/2024 5:04 AM, James Prestwood wrote:
> Hi Baochen,
> 
> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>
>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>> Hi,
>>>
>>> I've seen this error mentioned on random forum posts, but its always associated with a kernel crash/warning or some very obvious negative behavior. I've noticed this occasionally and at one location very frequently during FT roaming, specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any negative behavior apart from a 3 second delay in sending the re-association frame since the kernel waits for this timeout. But we have some networks our clients run on that we do not own (different vendor), and we are seeing association timeouts after this error occurs and in some cases the AP is sending a deauthentication with reason code 8 instead of replying with a reassociation reply and an error status, which is quite odd.
>>>
>>> We are chasing down this with the vendor of these APs as well, but the behavior always happens after we see this key removal failure/timeout on the client side. So it would appear there is potentially a problem on both the client and AP. My guess is _something_ about the re-association frame changes when this error is encountered, but I cannot see how that would be the case. We are working to get PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>
>>>  From the kernel code this error would appear innocuous, the old key is failing to be removed but it gets immediately replaced by the new key. And we don't see that addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
>>>
>>> Below are a few kernel logs of the issue happening, some with the deauth being sent by the AP, some with just timeouts:
>>>
>>> --- No deauth frame sent, just association timeouts after the error ---
>>>
>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=16)
>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>
>>> --- Deauth frame sent amidst the association timeouts ---
>>>
>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=101)
>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>
>> Hi James, this is QCA6174, right? could you also share firmware version?
> 
> Yep, using:
> 
> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp 
> crc32 bf907c7c
> 
> I did try in one instance the latest firmware, 309, and still saw the 
> same behavior but 288 is what all our devices are running.
> 
> Thanks,
> 
> James

Baochen, are you looking more into this? Would prefer to fix the root cause
rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-09-04 18:03     ` Jeff Johnson
@ 2024-09-05  1:46       ` Baochen Qiang
  2024-11-25 13:32         ` James Prestwood
  2024-12-06  2:47         ` Baochen Qiang
  0 siblings, 2 replies; 18+ messages in thread
From: Baochen Qiang @ 2024-09-05  1:46 UTC (permalink / raw)
  To: Jeff Johnson, James Prestwood, linux-wireless, ath10k



On 9/5/2024 2:03 AM, Jeff Johnson wrote:
> On 8/16/2024 5:04 AM, James Prestwood wrote:
>> Hi Baochen,
>>
>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>
>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>> Hi,
>>>>
>>>> I've seen this error mentioned on random forum posts, but its always associated with a kernel crash/warning or some very obvious negative behavior. I've noticed this occasionally and at one location very frequently during FT roaming, specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any negative behavior apart from a 3 second delay in sending the re-association frame since the kernel waits for this timeout. But we have some networks our clients run on that we do not own (different vendor), and we are seeing association timeouts after this error occurs and in some cases the AP is sending a deauthentication with reason code 8 instead of replying with a reassociation reply and an error status, which is quite odd.
>>>>
>>>> We are chasing down this with the vendor of these APs as well, but the behavior always happens after we see this key removal failure/timeout on the client side. So it would appear there is potentially a problem on both the client and AP. My guess is _something_ about the re-association frame changes when this error is encountered, but I cannot see how that would be the case. We are working to get PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>
>>>>  From the kernel code this error would appear innocuous, the old key is failing to be removed but it gets immediately replaced by the new key. And we don't see that addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
>>>>
>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent by the AP, some with just timeouts:
>>>>
>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>
>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=16)
>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>
>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>
>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=101)
>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>
>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>
>> Yep, using:
>>
>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp 
>> crc32 bf907c7c
>>
>> I did try in one instance the latest firmware, 309, and still saw the 
>> same behavior but 288 is what all our devices are running.
>>
>> Thanks,
>>
>> James
> 
> Baochen, are you looking more into this? Would prefer to fix the root cause
> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
I asked CST team to try to reproduce this issue such that we can get firmware dump for debug further. What I got is that CST team is currently busy at other critical schedules and they are planning to debug this ath10k issue after those schedules get finished.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-09-05  1:46       ` Baochen Qiang
@ 2024-11-25 13:32         ` James Prestwood
  2024-11-26  2:56           ` Baochen Qiang
  2024-12-06  2:47         ` Baochen Qiang
  1 sibling, 1 reply; 18+ messages in thread
From: James Prestwood @ 2024-11-25 13:32 UTC (permalink / raw)
  To: Baochen Qiang, Jeff Johnson, linux-wireless, ath10k, Kalle Valo

Hi Baochen,

On 9/4/24 6:46 PM, Baochen Qiang wrote:
>
> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>> Hi Baochen,
>>>
>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>> Hi,
>>>>>
>>>>> I've seen this error mentioned on random forum posts, but its always associated with a kernel crash/warning or some very obvious negative behavior. I've noticed this occasionally and at one location very frequently during FT roaming, specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any negative behavior apart from a 3 second delay in sending the re-association frame since the kernel waits for this timeout. But we have some networks our clients run on that we do not own (different vendor), and we are seeing association timeouts after this error occurs and in some cases the AP is sending a deauthentication with reason code 8 instead of replying with a reassociation reply and an error status, which is quite odd.
>>>>>
>>>>> We are chasing down this with the vendor of these APs as well, but the behavior always happens after we see this key removal failure/timeout on the client side. So it would appear there is potentially a problem on both the client and AP. My guess is _something_ about the re-association frame changes when this error is encountered, but I cannot see how that would be the case. We are working to get PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>
>>>>>   From the kernel code this error would appear innocuous, the old key is failing to be removed but it gets immediately replaced by the new key. And we don't see that addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
>>>>>
>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent by the AP, some with just timeouts:
>>>>>
>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>
>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=16)
>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>
>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>
>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=101)
>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>
>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>> Yep, using:
>>>
>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>> crc32 bf907c7c
>>>
>>> I did try in one instance the latest firmware, 309, and still saw the
>>> same behavior but 288 is what all our devices are running.
>>>
>>> Thanks,
>>>
>>> James
>> Baochen, are you looking more into this? Would prefer to fix the root cause
>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
> I asked CST team to try to reproduce this issue such that we can get firmware dump for debug further. What I got is that CST team is currently busy at other critical schedules and they are planning to debug this ath10k issue after those schedules get finished.

Any movement on this front? We are still carrying that RFC patch to work 
around the associated compatibility issues with Cisco APs when this 
timeout occurs.

While I do agree the RFC patch isn't optimal, trying to get a firmware 
fix for ~6 year old hardware also may not be very easy. fwiw we've been 
running the RFC patch for about 3 months now, as of today its running on 
over 4000 client devices. So IMO the patch itself is safe if there was 
any concern.

Thanks,

James


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-11-25 13:32         ` James Prestwood
@ 2024-11-26  2:56           ` Baochen Qiang
  0 siblings, 0 replies; 18+ messages in thread
From: Baochen Qiang @ 2024-11-26  2:56 UTC (permalink / raw)
  To: James Prestwood, Jeff Johnson, linux-wireless, ath10k, Kalle Valo



On 11/25/2024 9:32 PM, James Prestwood wrote:
> Hi Baochen,
> 
> On 9/4/24 6:46 PM, Baochen Qiang wrote:
>>
>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>> Hi Baochen,
>>>>
>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've seen this error mentioned on random forum posts, but its always associated with
>>>>>> a kernel crash/warning or some very obvious negative behavior. I've noticed this
>>>>>> occasionally and at one location very frequently during FT roaming, specifically
>>>>>> just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any
>>>>>> negative behavior apart from a 3 second delay in sending the re-association frame
>>>>>> since the kernel waits for this timeout. But we have some networks our clients run
>>>>>> on that we do not own (different vendor), and we are seeing association timeouts
>>>>>> after this error occurs and in some cases the AP is sending a deauthentication with
>>>>>> reason code 8 instead of replying with a reassociation reply and an error status,
>>>>>> which is quite odd.
>>>>>>
>>>>>> We are chasing down this with the vendor of these APs as well, but the behavior
>>>>>> always happens after we see this key removal failure/timeout on the client side. So
>>>>>> it would appear there is potentially a problem on both the client and AP. My guess
>>>>>> is _something_ about the re-association frame changes when this error is
>>>>>> encountered, but I cannot see how that would be the case. We are working to get
>>>>>> PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>>
>>>>>>   From the kernel code this error would appear innocuous, the old key is failing to
>>>>>> be removed but it gets immediately replaced by the new key. And we don't see that
>>>>>> addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/
>>>>>> mac80211/key.c#n503
>>>>>>
>>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent
>>>>>> by the AP, some with just timeouts:
>>>>>>
>>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>>
>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>> <new BSS>
>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>> peer <previous BSS>: -110
>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>> hardware (-110)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>> aid=16)
>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>
>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>
>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>> <new BSS>
>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>> peer <previous BSS>: -110
>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>> hardware (-110)
>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating
>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>> aid=101)
>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>
>>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>> Yep, using:
>>>>
>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>>> crc32 bf907c7c
>>>>
>>>> I did try in one instance the latest firmware, 309, and still saw the
>>>> same behavior but 288 is what all our devices are running.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>> Baochen, are you looking more into this? Would prefer to fix the root cause
>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
>> I asked CST team to try to reproduce this issue such that we can get firmware dump for
>> debug further. What I got is that CST team is currently busy at other critical schedules
>> and they are planning to debug this ath10k issue after those schedules get finished.
> 
> Any movement on this front? We are still carrying that RFC patch to work around the
> associated compatibility issues with Cisco APs when this timeout occurs.
I ask the test team again, the response is that hopefully they can get bandwidth next week.

> 
> While I do agree the RFC patch isn't optimal, trying to get a firmware fix for ~6 year old
> hardware also may not be very easy. fwiw we've been running the RFC patch for about 3
> months now, as of today its running on over 4000 client devices. So IMO the patch itself
> is safe if there was any concern.
thanks for the info.

> 
> Thanks,
> 
> James
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-09-05  1:46       ` Baochen Qiang
  2024-11-25 13:32         ` James Prestwood
@ 2024-12-06  2:47         ` Baochen Qiang
  2024-12-06 12:27           ` James Prestwood
  1 sibling, 1 reply; 18+ messages in thread
From: Baochen Qiang @ 2024-12-06  2:47 UTC (permalink / raw)
  To: Jeff Johnson, James Prestwood, linux-wireless, ath10k



On 9/5/2024 9:46 AM, Baochen Qiang wrote:
> 
> 
> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>> Hi Baochen,
>>>
>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>
>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>> Hi,
>>>>>
>>>>> I've seen this error mentioned on random forum posts, but its always associated with a kernel crash/warning or some very obvious negative behavior. I've noticed this occasionally and at one location very frequently during FT roaming, specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any negative behavior apart from a 3 second delay in sending the re-association frame since the kernel waits for this timeout. But we have some networks our clients run on that we do not own (different vendor), and we are seeing association timeouts after this error occurs and in some cases the AP is sending a deauthentication with reason code 8 instead of replying with a reassociation reply and an error status, which is quite odd.
>>>>>
>>>>> We are chasing down this with the vendor of these APs as well, but the behavior always happens after we see this key removal failure/timeout on the client side. So it would appear there is potentially a problem on both the client and AP. My guess is _something_ about the re-association frame changes when this error is encountered, but I cannot see how that would be the case. We are working to get PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>
>>>>>  From the kernel code this error would appear innocuous, the old key is failing to be removed but it gets immediately replaced by the new key. And we don't see that addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
>>>>>
>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent by the AP, some with just timeouts:
>>>>>
>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>
>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=16)
>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>
>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>
>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=101)
>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>
>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>
>>> Yep, using:
>>>
>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp 
>>> crc32 bf907c7c
>>>
>>> I did try in one instance the latest firmware, 309, and still saw the 
>>> same behavior but 288 is what all our devices are running.
>>>
>>> Thanks,
>>>
>>> James
>>
>> Baochen, are you looking more into this? Would prefer to fix the root cause
>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
> I asked CST team to try to reproduce this issue such that we can get firmware dump for debug further. What I got is that CST team is currently busy at other critical schedules and they are planning to debug this ath10k issue after those schedules get finished.
> 

Jeff, I am notified that CST team can not reproduce this issue.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-12-06  2:47         ` Baochen Qiang
@ 2024-12-06 12:27           ` James Prestwood
  2024-12-09  6:48             ` Baochen Qiang
  0 siblings, 1 reply; 18+ messages in thread
From: James Prestwood @ 2024-12-06 12:27 UTC (permalink / raw)
  To: Baochen Qiang, Jeff Johnson, linux-wireless, ath10k

Hi Baochen,

On 12/5/24 6:47 PM, Baochen Qiang wrote:
>
> On 9/5/2024 9:46 AM, Baochen Qiang wrote:
>>
>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>> Hi Baochen,
>>>>
>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've seen this error mentioned on random forum posts, but its always associated with a kernel crash/warning or some very obvious negative behavior. I've noticed this occasionally and at one location very frequently during FT roaming, specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm not seeing any negative behavior apart from a 3 second delay in sending the re-association frame since the kernel waits for this timeout. But we have some networks our clients run on that we do not own (different vendor), and we are seeing association timeouts after this error occurs and in some cases the AP is sending a deauthentication with reason code 8 instead of replying with a reassociation reply and an error status, which is quite odd.
>>>>>>
>>>>>> We are chasing down this with the vendor of these APs as well, but the behavior always happens after we see this key removal failure/timeout on the client side. So it would appear there is potentially a problem on both the client and AP. My guess is _something_ about the re-association frame changes when this error is encountered, but I cannot see how that would be the case. We are working to get PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>>
>>>>>>   From the kernel code this error would appear innocuous, the old key is failing to be removed but it gets immediately replaced by the new key. And we don't see that addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>>
>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/mac80211/key.c#n503
>>>>>>
>>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent by the AP, some with just timeouts:
>>>>>>
>>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>>
>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=16)
>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>
>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>
>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to <new BSS>
>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0 peer <previous BSS>: -110
>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from hardware (-110)
>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0 aid=101)
>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>
>>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>> Yep, using:
>>>>
>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>>> crc32 bf907c7c
>>>>
>>>> I did try in one instance the latest firmware, 309, and still saw the
>>>> same behavior but 288 is what all our devices are running.
>>>>
>>>> Thanks,
>>>>
>>>> James
>>> Baochen, are you looking more into this? Would prefer to fix the root cause
>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
>> I asked CST team to try to reproduce this issue such that we can get firmware dump for debug further. What I got is that CST team is currently busy at other critical schedules and they are planning to debug this ath10k issue after those schedules get finished.
>>
> Jeff, I am notified that CST team can not reproduce this issue.

Thanks for reaching out to them at least. Maybe the firmware team can 
provide some info about how long it _should_ take to remove a key and we 
can make the timeout reflect that?

Thanks,

James



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-12-06 12:27           ` James Prestwood
@ 2024-12-09  6:48             ` Baochen Qiang
  2024-12-09 12:37               ` James Prestwood
  0 siblings, 1 reply; 18+ messages in thread
From: Baochen Qiang @ 2024-12-09  6:48 UTC (permalink / raw)
  To: James Prestwood, Jeff Johnson, linux-wireless, ath10k



On 12/6/2024 8:27 PM, James Prestwood wrote:
> Hi Baochen,
> 
> On 12/5/24 6:47 PM, Baochen Qiang wrote:
>>
>> On 9/5/2024 9:46 AM, Baochen Qiang wrote:
>>>
>>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>>> Hi Baochen,
>>>>>
>>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've seen this error mentioned on random forum posts, but its always associated
>>>>>>> with a kernel crash/warning or some very obvious negative behavior. I've noticed
>>>>>>> this occasionally and at one location very frequently during FT roaming,
>>>>>>> specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm
>>>>>>> not seeing any negative behavior apart from a 3 second delay in sending the re-
>>>>>>> association frame since the kernel waits for this timeout. But we have some
>>>>>>> networks our clients run on that we do not own (different vendor), and we are
>>>>>>> seeing association timeouts after this error occurs and in some cases the AP is
>>>>>>> sending a deauthentication with reason code 8 instead of replying with a
>>>>>>> reassociation reply and an error status, which is quite odd.
>>>>>>>
>>>>>>> We are chasing down this with the vendor of these APs as well, but the behavior
>>>>>>> always happens after we see this key removal failure/timeout on the client side. So
>>>>>>> it would appear there is potentially a problem on both the client and AP. My guess
>>>>>>> is _something_ about the re-association frame changes when this error is
>>>>>>> encountered, but I cannot see how that would be the case. We are working to get
>>>>>>> PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>>>
>>>>>>>   From the kernel code this error would appear innocuous, the old key is failing to
>>>>>>> be removed but it gets immediately replaced by the new key. And we don't see that
>>>>>>> addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>>>
>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/
>>>>>>> mac80211/key.c#n503
>>>>>>>
>>>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent
>>>>>>> by the AP, some with just timeouts:
>>>>>>>
>>>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>>>
>>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>>> <new BSS>
>>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>>> peer <previous BSS>: -110
>>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>>> hardware (-110)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>>> aid=16)
>>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>>
>>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>>
>>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>>> <new BSS>
>>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>>> peer <previous BSS>: -110
>>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>>> hardware (-110)
>>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating
>>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>>> aid=101)
>>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>>
>>>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>>> Yep, using:
>>>>>
>>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>>>> crc32 bf907c7c
>>>>>
>>>>> I did try in one instance the latest firmware, 309, and still saw the
>>>>> same behavior but 288 is what all our devices are running.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> James
>>>> Baochen, are you looking more into this? Would prefer to fix the root cause
>>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
>>> I asked CST team to try to reproduce this issue such that we can get firmware dump for
>>> debug further. What I got is that CST team is currently busy at other critical
>>> schedules and they are planning to debug this ath10k issue after those schedules get
>>> finished.
>>>
>> Jeff, I am notified that CST team can not reproduce this issue.
> 
> Thanks for reaching out to them at least. Maybe the firmware team can provide some info
> about how long it _should_ take to remove a key and we can make the timeout reflect that?

are you implying that the failure is due to a not-long-enough wait in host driver? or you
want to know the maximum time firmware needs in removing key, and if it is less than 3s we
can reduce current timeout to WAR the issue you hit?

> 
> Thanks,
> 
> James
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-12-09  6:48             ` Baochen Qiang
@ 2024-12-09 12:37               ` James Prestwood
  2025-11-14 21:52                 ` James Prestwood
  0 siblings, 1 reply; 18+ messages in thread
From: James Prestwood @ 2024-12-09 12:37 UTC (permalink / raw)
  To: Baochen Qiang, Jeff Johnson, linux-wireless, ath10k


On 12/8/24 10:48 PM, Baochen Qiang wrote:
>
> On 12/6/2024 8:27 PM, James Prestwood wrote:
>> Hi Baochen,
>>
>> On 12/5/24 6:47 PM, Baochen Qiang wrote:
>>> On 9/5/2024 9:46 AM, Baochen Qiang wrote:
>>>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>>>> Hi Baochen,
>>>>>>
>>>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've seen this error mentioned on random forum posts, but its always associated
>>>>>>>> with a kernel crash/warning or some very obvious negative behavior. I've noticed
>>>>>>>> this occasionally and at one location very frequently during FT roaming,
>>>>>>>> specifically just after CMD_ASSOCIATE is issued. For our company run networks I'm
>>>>>>>> not seeing any negative behavior apart from a 3 second delay in sending the re-
>>>>>>>> association frame since the kernel waits for this timeout. But we have some
>>>>>>>> networks our clients run on that we do not own (different vendor), and we are
>>>>>>>> seeing association timeouts after this error occurs and in some cases the AP is
>>>>>>>> sending a deauthentication with reason code 8 instead of replying with a
>>>>>>>> reassociation reply and an error status, which is quite odd.
>>>>>>>>
>>>>>>>> We are chasing down this with the vendor of these APs as well, but the behavior
>>>>>>>> always happens after we see this key removal failure/timeout on the client side. So
>>>>>>>> it would appear there is potentially a problem on both the client and AP. My guess
>>>>>>>> is _something_ about the re-association frame changes when this error is
>>>>>>>> encountered, but I cannot see how that would be the case. We are working to get
>>>>>>>> PCAPs now, but its through a 3rd party, so that timing is out of my control.
>>>>>>>>
>>>>>>>>    From the kernel code this error would appear innocuous, the old key is failing to
>>>>>>>> be removed but it gets immediately replaced by the new key. And we don't see that
>>>>>>>> addition failing. Am I understanding that logic correctly? I.e. this logic:
>>>>>>>>
>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/
>>>>>>>> mac80211/key.c#n503
>>>>>>>>
>>>>>>>> Below are a few kernel logs of the issue happening, some with the deauth being sent
>>>>>>>> by the AP, some with just timeouts:
>>>>>>>>
>>>>>>>> --- No deauth frame sent, just association timeouts after the error ---
>>>>>>>>
>>>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>>>> <new BSS>
>>>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>>>> peer <previous BSS>: -110
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>>>> hardware (-110)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> timed out
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>>>> aid=16)
>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>>>
>>>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>>>
>>>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous BSS> for new assoc to
>>>>>>>> <new BSS>
>>>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to install key for vdev 0
>>>>>>>> peer <previous BSS>: -110
>>>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, <previous BSS>) from
>>>>>>>> hardware (-110)
>>>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> while associating
>>>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> (capab=0x1111 status=0
>>>>>>>> aid=101)
>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>>>
>>>>>>> Hi James, this is QCA6174, right? could you also share firmware version?
>>>>>> Yep, using:
>>>>>>
>>>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features wowlan,ignore-otp,mfp
>>>>>> crc32 bf907c7c
>>>>>>
>>>>>> I did try in one instance the latest firmware, 309, and still saw the
>>>>>> same behavior but 288 is what all our devices are running.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> James
>>>>> Baochen, are you looking more into this? Would prefer to fix the root cause
>>>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key removal failure"
>>>> I asked CST team to try to reproduce this issue such that we can get firmware dump for
>>>> debug further. What I got is that CST team is currently busy at other critical
>>>> schedules and they are planning to debug this ath10k issue after those schedules get
>>>> finished.
>>>>
>>> Jeff, I am notified that CST team can not reproduce this issue.
>> Thanks for reaching out to them at least. Maybe the firmware team can provide some info
>> about how long it _should_ take to remove a key and we can make the timeout reflect that?
> are you implying that the failure is due to a not-long-enough wait in host driver? or you
> want to know the maximum time firmware needs in removing key, and if it is less than 3s we
> can reduce current timeout to WAR the issue you hit?
No I'm not implying the wait isn't long enough. I would like to know the 
maximum time the firmware should take normally and only wait that amount 
of time, which would fix the issues we see with Cisco APs.
>
>> Thanks,
>>
>> James
>>
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: ath10k "failed to install key for vdev 0 peer <mac>: -110"
  2024-12-09 12:37               ` James Prestwood
@ 2025-11-14 21:52                 ` James Prestwood
  0 siblings, 0 replies; 18+ messages in thread
From: James Prestwood @ 2025-11-14 21:52 UTC (permalink / raw)
  To: Baochen Qiang, Jeff Johnson, linux-wireless, ath10k


On 12/9/24 4:37 AM, James Prestwood wrote:
>
> On 12/8/24 10:48 PM, Baochen Qiang wrote:
>>
>> On 12/6/2024 8:27 PM, James Prestwood wrote:
>>> Hi Baochen,
>>>
>>> On 12/5/24 6:47 PM, Baochen Qiang wrote:
>>>> On 9/5/2024 9:46 AM, Baochen Qiang wrote:
>>>>> On 9/5/2024 2:03 AM, Jeff Johnson wrote:
>>>>>> On 8/16/2024 5:04 AM, James Prestwood wrote:
>>>>>>> Hi Baochen,
>>>>>>>
>>>>>>> On 8/16/24 3:19 AM, Baochen Qiang wrote:
>>>>>>>> On 7/12/2024 9:11 PM, James Prestwood wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've seen this error mentioned on random forum posts, but its 
>>>>>>>>> always associated
>>>>>>>>> with a kernel crash/warning or some very obvious negative 
>>>>>>>>> behavior. I've noticed
>>>>>>>>> this occasionally and at one location very frequently during 
>>>>>>>>> FT roaming,
>>>>>>>>> specifically just after CMD_ASSOCIATE is issued. For our 
>>>>>>>>> company run networks I'm
>>>>>>>>> not seeing any negative behavior apart from a 3 second delay 
>>>>>>>>> in sending the re-
>>>>>>>>> association frame since the kernel waits for this timeout. But 
>>>>>>>>> we have some
>>>>>>>>> networks our clients run on that we do not own (different 
>>>>>>>>> vendor), and we are
>>>>>>>>> seeing association timeouts after this error occurs and in 
>>>>>>>>> some cases the AP is
>>>>>>>>> sending a deauthentication with reason code 8 instead of 
>>>>>>>>> replying with a
>>>>>>>>> reassociation reply and an error status, which is quite odd.
>>>>>>>>>
>>>>>>>>> We are chasing down this with the vendor of these APs as well, 
>>>>>>>>> but the behavior
>>>>>>>>> always happens after we see this key removal failure/timeout 
>>>>>>>>> on the client side. So
>>>>>>>>> it would appear there is potentially a problem on both the 
>>>>>>>>> client and AP. My guess
>>>>>>>>> is _something_ about the re-association frame changes when 
>>>>>>>>> this error is
>>>>>>>>> encountered, but I cannot see how that would be the case. We 
>>>>>>>>> are working to get
>>>>>>>>> PCAPs now, but its through a 3rd party, so that timing is out 
>>>>>>>>> of my control.
>>>>>>>>>
>>>>>>>>>    From the kernel code this error would appear innocuous, the 
>>>>>>>>> old key is failing to
>>>>>>>>> be removed but it gets immediately replaced by the new key. 
>>>>>>>>> And we don't see that
>>>>>>>>> addition failing. Am I understanding that logic correctly? 
>>>>>>>>> I.e. this logic:
>>>>>>>>>
>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ 
>>>>>>>>>
>>>>>>>>> mac80211/key.c#n503
>>>>>>>>>
>>>>>>>>> Below are a few kernel logs of the issue happening, some with 
>>>>>>>>> the deauth being sent
>>>>>>>>> by the AP, some with just timeouts:
>>>>>>>>>
>>>>>>>>> --- No deauth frame sent, just association timeouts after the 
>>>>>>>>> error ---
>>>>>>>>>
>>>>>>>>> Jul 11 00:05:30 kernel: wlan0: disconnect from AP <previous 
>>>>>>>>> BSS> for new assoc to
>>>>>>>>> <new BSS>
>>>>>>>>> Jul 11 00:05:33 kernel: ath10k_pci 0000:02:00.0: failed to 
>>>>>>>>> install key for vdev 0
>>>>>>>>> peer <previous BSS>: -110
>>>>>>>>> Jul 11 00:05:33 kernel: wlan0: failed to remove key 
>>>>>>>>> (0, <previous BSS>) from
>>>>>>>>> hardware (-110)
>>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 2/3)
>>>>>>>>> Jul 11 00:05:33 kernel: wlan0: associate with <new BSS> (try 3/3)
>>>>>>>>> Jul 11 00:05:33 kernel: wlan0: association with <new BSS> 
>>>>>>>>> timed out
>>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticate with <new BSS>
>>>>>>>>> Jul 11 00:05:36 kernel: wlan0: send auth to <new BSS>a (try 1/3)
>>>>>>>>> Jul 11 00:05:36 kernel: wlan0: authenticated
>>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>>> Jul 11 00:05:36 kernel: wlan0: RX AssocResp from <new BSS> 
>>>>>>>>> (capab=0x1111 status=0
>>>>>>>>> aid=16)
>>>>>>>>> Jul 11 00:05:36 kernel: wlan0: associated
>>>>>>>>>
>>>>>>>>> --- Deauth frame sent amidst the association timeouts ---
>>>>>>>>>
>>>>>>>>> Jul 11 00:43:18 kernel: wlan0: disconnect from AP <previous 
>>>>>>>>> BSS> for new assoc to
>>>>>>>>> <new BSS>
>>>>>>>>> Jul 11 00:43:21 kernel: ath10k_pci 0000:02:00.0: failed to 
>>>>>>>>> install key for vdev 0
>>>>>>>>> peer <previous BSS>: -110
>>>>>>>>> Jul 11 00:43:21 kernel: wlan0: failed to remove key (0, 
>>>>>>>>> <previous BSS>) from
>>>>>>>>> hardware (-110)
>>>>>>>>> Jul 11 00:43:21 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>>> Jul 11 00:43:21 kernel: wlan0: deauthenticated from <new BSS> 
>>>>>>>>> while associating
>>>>>>>>> (Reason: 8=DISASSOC_STA_HAS_LEFT)
>>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticate with <new BSS>
>>>>>>>>> Jul 11 00:43:24 kernel: wlan0: send auth to <new BSS> (try 1/3)
>>>>>>>>> Jul 11 00:43:24 kernel: wlan0: authenticated
>>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associate with <new BSS> (try 1/3)
>>>>>>>>> Jul 11 00:43:24 kernel: wlan0: RX AssocResp from <new BSS> 
>>>>>>>>> (capab=0x1111 status=0
>>>>>>>>> aid=101)
>>>>>>>>> Jul 11 00:43:24 kernel: wlan0: associated
>>>>>>>>>
>>>>>>>> Hi James, this is QCA6174, right? could you also share firmware 
>>>>>>>> version?
>>>>>>> Yep, using:
>>>>>>>
>>>>>>> qca6174 hw3.2 target 0x05030000 chip_id 0x00340aff sub 1dac:0261
>>>>>>> firmware ver WLAN.RM.4.4.1-00288- api 6 features 
>>>>>>> wowlan,ignore-otp,mfp
>>>>>>> crc32 bf907c7c
>>>>>>>
>>>>>>> I did try in one instance the latest firmware, 309, and still 
>>>>>>> saw the
>>>>>>> same behavior but 288 is what all our devices are running.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> James
>>>>>> Baochen, are you looking more into this? Would prefer to fix the 
>>>>>> root cause
>>>>>> rather than take "[RFC 0/1] wifi: ath10k: improvement on key 
>>>>>> removal failure"
>>>>> I asked CST team to try to reproduce this issue such that we can 
>>>>> get firmware dump for
>>>>> debug further. What I got is that CST team is currently busy at 
>>>>> other critical
>>>>> schedules and they are planning to debug this ath10k issue after 
>>>>> those schedules get
>>>>> finished.
>>>>>
>>>> Jeff, I am notified that CST team can not reproduce this issue.
>>> Thanks for reaching out to them at least. Maybe the firmware team 
>>> can provide some info
>>> about how long it _should_ take to remove a key and we can make the 
>>> timeout reflect that?
>> are you implying that the failure is due to a not-long-enough wait in 
>> host driver? or you
>> want to know the maximum time firmware needs in removing key, and if 
>> it is less than 3s we
>> can reduce current timeout to WAR the issue you hit?
> No I'm not implying the wait isn't long enough. I would like to know 
> the maximum time the firmware should take normally and only wait that 
> amount of time, which would fix the issues we see with Cisco APs.
>>
>>> Thanks,
>>>
>>> James
>>>
>>>
Attempting to revive this thread again with additional information. 
After initially discovering this I have been carrying a patch which 
lowers the timeout to 1 second instead of 3. Though undesirable (since 
it delays roams by 1 second) it did work around the issue with Cisco 
APs. Unfortunately we now see the same issue with another vendor, 
"Extreme Networks", despite the delay being only 1 second.

I can't remember if it was mentioned but we do not see this failure with 
other AP vendors like Meraki or Aruba, and even some clients that use 
Cisco don't experience it. But it appears to happen more (sometimes 90%+ 
of the time) with certain AP vendors. I cannot begin to imagine how the 
AP would have any effect on the driver/firmware's ability to remove a 
key locally, but here we are.

Currently I'm thinking I have 2 options:

   - Further reduce the wait, but given the failure happens so 
consistently the roaming time will be at minimum whatever I set the 
timeout to.

   - Remove the wait entirely for DISABLE_KEY. I have no idea if this is 
safe/recommenced but given the failure isn't handled (only an error log) 
it feels like I could remove it.

Thanks,

James


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-11-14 21:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-12 13:11 ath10k "failed to install key for vdev 0 peer <mac>: -110" James Prestwood
2024-07-15 11:54 ` James Prestwood
2024-08-12 17:33   ` James Prestwood
2024-08-15 14:03     ` Kalle Valo
2024-08-15 15:47       ` James Prestwood
2024-08-15 15:58         ` Kalle Valo
2024-08-15 16:38           ` James Prestwood
2024-08-16 10:19 ` Baochen Qiang
2024-08-16 12:04   ` James Prestwood
2024-09-04 18:03     ` Jeff Johnson
2024-09-05  1:46       ` Baochen Qiang
2024-11-25 13:32         ` James Prestwood
2024-11-26  2:56           ` Baochen Qiang
2024-12-06  2:47         ` Baochen Qiang
2024-12-06 12:27           ` James Prestwood
2024-12-09  6:48             ` Baochen Qiang
2024-12-09 12:37               ` James Prestwood
2025-11-14 21:52                 ` James Prestwood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).