All of lore.kernel.org
 help / color / mirror / Atom feed
* Reproducible issue in hacked 3.17 kernel, CT firmware
@ 2014-12-30 19:18 Ben Greear
  2015-01-07  9:58 ` Michal Kazior
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2014-12-30 19:18 UTC (permalink / raw)
  To: ath10k

yeah, so maybe not reproducible upstream, but anyway...

My test case is to re-associate 4 stations over and over again, with
a scan and a 5 second sleep between iterations.  After
a short time, something goes weird and OS is mostly hung, probably
because important locks are held while ath10k is timing out communication
to firmware.

The last message I see from firmware is that it is deleting vdev 4.

I do not see any indication that firmware is crashed, but something
is wrong, maybe mgt buffers are used up?

I'm going to debug this further, but I am curious why the logs appear
to show that we continue sending cmds (cts_prot, for example) after the
vdev is configured down?


[  339.958906] sta104: deauthenticating from 04:f0:21:03:38:99 by local choice (Reason: 2=PREV_AUTH_NOT_VALID)
[  339.958918] ath10k_pci 0000:04:00.0: mac ampdu vdev_id 4 sta 04:f0:21:03:38:99 tid 0 action 1
[  339.958968] ath10k_pci 0000:04:00.0: wmi vdev install key idx 0 cipher 0 len 16
[  339.959076] ath10k_pci 0000:04:00.0: mac vdev 4 peer delete 04:f0:21:03:38:99 (sta gone)
[  339.959080] ath10k_pci 0000:04:00.0: wmi peer delete vdev_id 4 peer_addr 04:f0:21:03:38:99
[  339.959287] ath10k_pci 0000:04:00.0: mac vdev 4 stop (disassociated
[  339.959290] ath10k_pci 0000:04:00.0: wmi vdev stop id 0x4
[  339.959387] ath10k_pci 0000:04:00.0: WMI_VDEV_STOPPED_EVENTID
[  339.959405] ath10k_pci 0000:04:00.0: mac vdev 4 down
[  339.959407] ath10k_pci 0000:04:00.0: wmi mgmt vdev down id 0x4
[  339.959491] ath10k_pci 0000:04:00.0: mac vdev 4 cts_prot 0
[  339.959495] ath10k_pci 0000:04:00.0: wmi vdev id 0x4 set param 43 value 0
[  339.959499] ath10k_pci 0000:04:00.0: mac vdev 4 slot_time 1
[  339.959501] ath10k_pci 0000:04:00.0: wmi vdev id 0x4 set param 7 value 1
[  340.104623] ath10k_pci 0000:04:00.0: wmi event debug mesg len 144
[  340.104645] ATH10K_DBG_BUFFER:
[  340.104654] ath10k: [0000]: 00059396 100C2403 00000001 00000001 0043AF90 00000000 00059436 04104C1C
[  340.104662] ath10k: [0008]: 00000000 00059437 14105C0A 009B83F4 0043874C 009B83F4 009B81D4 009B83F4
[  340.104670] ath10k: [0016]: 00059437 0C104C30 711000BB 00422368 0043874C 00059437 0C104C30 711000BB
[  340.104684] ath10k: [0024]: 004223A0 0043874C 00059437 0C104C30 711000BB 004223D8 0043874C 00059437
[  340.104687] ath10k: [0032]: 08104C08 00000001 00000008
[  340.104688] ATH10K_END
[  342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11
[  342.962509] ath10k_pci 0000:04:00.0: mac vdev 4 preamble 1n
[  342.962512] ath10k_pci 0000:04:00.0: wmi vdev id 0x4 set param 8 value 1
[  345.965900] ath10k_pci 0000:04:00.0: failed to set preamble for vdev 4: -11
[  345.965916] ath10k_pci 0000:04:00.0: wmi pdev set wmm params
[  348.969287] ath10k_pci 0000:04:00.0: failed to set wmm params: -11
[  348.969307] ath10k_pci 0000:04:00.0: wmi pdev set wmm params
[  351.972696] ath10k_pci 0000:04:00.0: failed to set wmm params: -11
[  351.972713] ath10k_pci 0000:04:00.0: wmi pdev set wmm params

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Reproducible issue in hacked 3.17 kernel, CT firmware
  2014-12-30 19:18 Reproducible issue in hacked 3.17 kernel, CT firmware Ben Greear
@ 2015-01-07  9:58 ` Michal Kazior
  2015-01-07 13:38   ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Kazior @ 2015-01-07  9:58 UTC (permalink / raw)
  To: Ben Greear; +Cc: ath10k

On 30 December 2014 at 20:18, Ben Greear <greearb@candelatech.com> wrote:
> yeah, so maybe not reproducible upstream, but anyway...
>
> My test case is to re-associate 4 stations over and over again, with
> a scan and a 5 second sleep between iterations.  After
> a short time, something goes weird and OS is mostly hung, probably
> because important locks are held while ath10k is timing out communication
> to firmware.
>
> The last message I see from firmware is that it is deleting vdev 4.
>
> I do not see any indication that firmware is crashed, but something
> is wrong, maybe mgt buffers are used up?
[...]
> [  342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11

-11 = -EAGAIN = out of wmi-htc tx credits. I wonder what the dbg
buffer is trying to say.

Either host sent a corrupted message and clogged up firmware buffers,
firmware is busy processing other commands (wmi mgmt tx, wmi bcn
non-dma tx) or became confused/corrupted.


> I'm going to debug this further, but I am curious why the logs appear
> to show that we continue sending cmds (cts_prot, for example) after the
> vdev is configured down?

This is implied by mac80211. See ieee80211_set_disassoc(): it calls
sta_info_flush() then ieee80211_reset_erp_info() and later
ieee80211_bss_info_change_notify(). These yield ath10k_bss_disassoc()
and later ath10k_bss_info_changed() respectively.


Michał

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Reproducible issue in hacked 3.17 kernel, CT firmware
  2015-01-07  9:58 ` Michal Kazior
@ 2015-01-07 13:38   ` Ben Greear
  2015-01-07 18:13     ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2015-01-07 13:38 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k



On 01/07/2015 01:58 AM, Michal Kazior wrote:
> On 30 December 2014 at 20:18, Ben Greear <greearb@candelatech.com> wrote:
>> yeah, so maybe not reproducible upstream, but anyway...
>>
>> My test case is to re-associate 4 stations over and over again, with
>> a scan and a 5 second sleep between iterations.  After
>> a short time, something goes weird and OS is mostly hung, probably
>> because important locks are held while ath10k is timing out communication
>> to firmware.
>>
>> The last message I see from firmware is that it is deleting vdev 4.
>>
>> I do not see any indication that firmware is crashed, but something
>> is wrong, maybe mgt buffers are used up?
> [...]
>> [  342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11
>
> -11 = -EAGAIN = out of wmi-htc tx credits. I wonder what the dbg
> buffer is trying to say.
>
> Either host sent a corrupted message and clogged up firmware buffers,
> firmware is busy processing other commands (wmi mgmt tx, wmi bcn
> non-dma tx) or became confused/corrupted.

I finally got back to debugging this yesterday, and interestingly, when
I added dbglog calls in the firmware around the credit handling, the problem is 'fixed'.

Looks like it ran overnight, where as before it would fail within a few minutes.

So, maybe a race around pci memory flushing or something like that?

I'll slowly back out my debug today and see what I can see.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Reproducible issue in hacked 3.17 kernel, CT firmware
  2015-01-07 13:38   ` Ben Greear
@ 2015-01-07 18:13     ` Ben Greear
  0 siblings, 0 replies; 4+ messages in thread
From: Ben Greear @ 2015-01-07 18:13 UTC (permalink / raw)
  To: Michal Kazior; +Cc: ath10k

On 01/07/2015 05:38 AM, Ben Greear wrote:
> 
> 
> On 01/07/2015 01:58 AM, Michal Kazior wrote:
>> On 30 December 2014 at 20:18, Ben Greear <greearb@candelatech.com> wrote:
>>> yeah, so maybe not reproducible upstream, but anyway...
>>>
>>> My test case is to re-associate 4 stations over and over again, with
>>> a scan and a 5 second sleep between iterations.  After
>>> a short time, something goes weird and OS is mostly hung, probably
>>> because important locks are held while ath10k is timing out communication
>>> to firmware.
>>>
>>> The last message I see from firmware is that it is deleting vdev 4.
>>>
>>> I do not see any indication that firmware is crashed, but something
>>> is wrong, maybe mgt buffers are used up?
>> [...]
>>> [  342.962494] ath10k_pci 0000:04:00.0: failed to set erp slot for vdev 4: -11
>>
>> -11 = -EAGAIN = out of wmi-htc tx credits. I wonder what the dbg
>> buffer is trying to say.
>>
>> Either host sent a corrupted message and clogged up firmware buffers,
>> firmware is busy processing other commands (wmi mgmt tx, wmi bcn
>> non-dma tx) or became confused/corrupted.
> 
> I finally got back to debugging this yesterday, and interestingly, when
> I added dbglog calls in the firmware around the credit handling, the problem is 'fixed'.
> 
> Looks like it ran overnight, where as before it would fail within a few minutes.
> 
> So, maybe a race around pci memory flushing or something like that?
> 
> I'll slowly back out my debug today and see what I can see.

It finally locked up this morning...I see last credit consumed at 8:37:02, and then
finally I get two credits from the firmware at 9:12:42.

I guess more instrumentation is required :P

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-01-07 18:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-30 19:18 Reproducible issue in hacked 3.17 kernel, CT firmware Ben Greear
2015-01-07  9:58 ` Michal Kazior
2015-01-07 13:38   ` Ben Greear
2015-01-07 18:13     ` Ben Greear

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.