* Reproducible issue in hacked 3.17 kernel, CT firmware
@ 2014-12-30 19:18 Ben Greear
2015-01-07 9:58 ` Michal Kazior
0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2014-12-30 19:18 UTC (permalink / raw)
To: ath10k
yeah, so maybe not reproducible upstream, but anyway...
My test case is to re-associate 4 stations over and over again, with
a scan and a 5 second sleep between iterations. After
a short time, something goes weird and OS is mostly hung, probably
because important locks are held while ath10k is timing out communication
to firmware.
The last message I see from firmware is that it is deleting vdev 4.
I do not see any indication that firmware is crashed, but something
is wrong, maybe mgt buffers are used up?
I'm going to debug this further, but I am curious why the logs appear
to show that we continue sending cmds (cts_prot, for example) after the
vdev is configured down?
[ 339.958906] sta104: deauthenticating from 04:f0:21:03:38:99 by local choice (Reason: 2=PREV_AUTH_NOT_VALID)
[ 339.958918] ath10k_pci 0000:04:00.0: mac ampdu vdev_id 4 sta 04:f0:21:03:38:99 tid 0 action 1
[ 339.958968] ath10k_pci 0000:04:00.0: wmi vdev install key idx 0 cipher 0 len 16
[ 339.959076] ath10k_pci 0000:04:00.0: mac vdev 4 peer delete 04:f0:21:03:38:99 (sta gone)
[ 339.959080] ath10k_pci 0000:04:00.0: wmi peer delete vdev_id 4 peer_addr 04:f0:21:03:38:99
[ 339.959287] ath10k_pci 0000:04:00.0: mac vdev 4 stop (disassociated
[ 339.959290] ath10k_pci 0000:04:00.0: wmi vdev stop id 0x4
[ 339.959387] ath10k_pci 0000:04:00.0: WMI_VDEV_STOPPED_EVENTID
[ 339.959405] ath10k_pci 0000:04:00.0: mac vdev 4 down
[ 339.959407] ath10k_pci 0000:04:00.0: wmi mgmt vdev down id 0x4
[ 339.959491] ath10k_pci 0000:04:00.0: mac vdev 4 cts_prot 0
[ 339.959495] ath10k_pci 0000:04:00.0: wmi vdev id 0x4 set param 43 value 0
[ 339.959499] ath10k_pci 0000:04:00.0: mac vdev 4 slot_time 1
[ 339.959501] ath10k_pci 0000:04:00.0: wmi vdev id 0x4 set param 7 value 1
[ 340.104623] ath10k_pci 0000:04:00.0: wmi event debug mesg len 144
[ 340.104645] ATH10K_DBG_BUFFER:
[ 340.104654] ath10k: [0000]: 00059396 100C2403 00000001 00000001 0043AF90 00000000 00059436 04104C1C
[ 340.104662] ath10k: [0008]: 00000000 00059437 14105C0A 009B83F4 0043874C 009B83F4 009B81D4 009B83F4
[ 340.104670] ath10k: [0016]: 00059437 0C104C30 711000BB 00422368 0043874C 00059437 0C104C30 711000BB
[ 340.104684] ath10k: [0024]: 004223A0 0043874C 00059437 0C104C30 711000BB 004223D8 0043874C 00059437
[ 340.104687] ath10k: [0032]: 08104C08 00000001 00000008
[ 340.104688] ATH10K_END
[ 342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11
[ 342.962509] ath10k_pci 0000:04:00.0: mac vdev 4 preamble 1n
[ 342.962512] ath10k_pci 0000:04:00.0: wmi vdev id 0x4 set param 8 value 1
[ 345.965900] ath10k_pci 0000:04:00.0: failed to set preamble for vdev 4: -11
[ 345.965916] ath10k_pci 0000:04:00.0: wmi pdev set wmm params
[ 348.969287] ath10k_pci 0000:04:00.0: failed to set wmm params: -11
[ 348.969307] ath10k_pci 0000:04:00.0: wmi pdev set wmm params
[ 351.972696] ath10k_pci 0000:04:00.0: failed to set wmm params: -11
[ 351.972713] ath10k_pci 0000:04:00.0: wmi pdev set wmm params
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Reproducible issue in hacked 3.17 kernel, CT firmware
2014-12-30 19:18 Reproducible issue in hacked 3.17 kernel, CT firmware Ben Greear
@ 2015-01-07 9:58 ` Michal Kazior
2015-01-07 13:38 ` Ben Greear
0 siblings, 1 reply; 4+ messages in thread
From: Michal Kazior @ 2015-01-07 9:58 UTC (permalink / raw)
To: Ben Greear; +Cc: ath10k
On 30 December 2014 at 20:18, Ben Greear <greearb@candelatech.com> wrote:
> yeah, so maybe not reproducible upstream, but anyway...
>
> My test case is to re-associate 4 stations over and over again, with
> a scan and a 5 second sleep between iterations. After
> a short time, something goes weird and OS is mostly hung, probably
> because important locks are held while ath10k is timing out communication
> to firmware.
>
> The last message I see from firmware is that it is deleting vdev 4.
>
> I do not see any indication that firmware is crashed, but something
> is wrong, maybe mgt buffers are used up?
[...]
> [ 342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11
-11 = -EAGAIN = out of wmi-htc tx credits. I wonder what the dbg
buffer is trying to say.
Either host sent a corrupted message and clogged up firmware buffers,
firmware is busy processing other commands (wmi mgmt tx, wmi bcn
non-dma tx) or became confused/corrupted.
> I'm going to debug this further, but I am curious why the logs appear
> to show that we continue sending cmds (cts_prot, for example) after the
> vdev is configured down?
This is implied by mac80211. See ieee80211_set_disassoc(): it calls
sta_info_flush() then ieee80211_reset_erp_info() and later
ieee80211_bss_info_change_notify(). These yield ath10k_bss_disassoc()
and later ath10k_bss_info_changed() respectively.
Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Reproducible issue in hacked 3.17 kernel, CT firmware
2015-01-07 9:58 ` Michal Kazior
@ 2015-01-07 13:38 ` Ben Greear
2015-01-07 18:13 ` Ben Greear
0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2015-01-07 13:38 UTC (permalink / raw)
To: Michal Kazior; +Cc: ath10k
On 01/07/2015 01:58 AM, Michal Kazior wrote:
> On 30 December 2014 at 20:18, Ben Greear <greearb@candelatech.com> wrote:
>> yeah, so maybe not reproducible upstream, but anyway...
>>
>> My test case is to re-associate 4 stations over and over again, with
>> a scan and a 5 second sleep between iterations. After
>> a short time, something goes weird and OS is mostly hung, probably
>> because important locks are held while ath10k is timing out communication
>> to firmware.
>>
>> The last message I see from firmware is that it is deleting vdev 4.
>>
>> I do not see any indication that firmware is crashed, but something
>> is wrong, maybe mgt buffers are used up?
> [...]
>> [ 342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11
>
> -11 = -EAGAIN = out of wmi-htc tx credits. I wonder what the dbg
> buffer is trying to say.
>
> Either host sent a corrupted message and clogged up firmware buffers,
> firmware is busy processing other commands (wmi mgmt tx, wmi bcn
> non-dma tx) or became confused/corrupted.
I finally got back to debugging this yesterday, and interestingly, when
I added dbglog calls in the firmware around the credit handling, the problem is 'fixed'.
Looks like it ran overnight, where as before it would fail within a few minutes.
So, maybe a race around pci memory flushing or something like that?
I'll slowly back out my debug today and see what I can see.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Reproducible issue in hacked 3.17 kernel, CT firmware
2015-01-07 13:38 ` Ben Greear
@ 2015-01-07 18:13 ` Ben Greear
0 siblings, 0 replies; 4+ messages in thread
From: Ben Greear @ 2015-01-07 18:13 UTC (permalink / raw)
To: Michal Kazior; +Cc: ath10k
On 01/07/2015 05:38 AM, Ben Greear wrote:
>
>
> On 01/07/2015 01:58 AM, Michal Kazior wrote:
>> On 30 December 2014 at 20:18, Ben Greear <greearb@candelatech.com> wrote:
>>> yeah, so maybe not reproducible upstream, but anyway...
>>>
>>> My test case is to re-associate 4 stations over and over again, with
>>> a scan and a 5 second sleep between iterations. After
>>> a short time, something goes weird and OS is mostly hung, probably
>>> because important locks are held while ath10k is timing out communication
>>> to firmware.
>>>
>>> The last message I see from firmware is that it is deleting vdev 4.
>>>
>>> I do not see any indication that firmware is crashed, but something
>>> is wrong, maybe mgt buffers are used up?
>> [...]
>>> [ 342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11
>>
>> -11 = -EAGAIN = out of wmi-htc tx credits. I wonder what the dbg
>> buffer is trying to say.
>>
>> Either host sent a corrupted message and clogged up firmware buffers,
>> firmware is busy processing other commands (wmi mgmt tx, wmi bcn
>> non-dma tx) or became confused/corrupted.
>
> I finally got back to debugging this yesterday, and interestingly, when
> I added dbglog calls in the firmware around the credit handling, the problem is 'fixed'.
>
> Looks like it ran overnight, where as before it would fail within a few minutes.
>
> So, maybe a race around pci memory flushing or something like that?
>
> I'll slowly back out my debug today and see what I can see.
It finally locked up this morning...I see last credit consumed at 8:37:02, and then
finally I get two credits from the firmware at 9:12:42.
I guess more instrumentation is required :P
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-01-07 18:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-30 19:18 Reproducible issue in hacked 3.17 kernel, CT firmware Ben Greear
2015-01-07 9:58 ` Michal Kazior
2015-01-07 13:38 ` Ben Greear
2015-01-07 18:13 ` Ben Greear
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.