* What to do about hung firmware?
@ 2013-11-05 18:51 Ben Greear
2013-11-06 7:07 ` Michal Kazior
0 siblings, 1 reply; 6+ messages in thread
From: Ben Greear @ 2013-11-05 18:51 UTC (permalink / raw)
To: ath10k
I'm seeing cases where it appears the firmware just gets
stuck and will not answer any WMI requests.
ath10k just patiently keeps timing out WMI commands,
(while holding locks, and making the whole system run slow).
Should we maybe keep a last-msg-from firmware time stamp
and just whack the firmware if we detect it hung? In addition
to this, we could add some 'ping' message that will get sent
periodically to the firmware to make sure it is alive.
We should be able to do this with existing WMI API, just
need to pick a message to send that expects some response.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What to do about hung firmware?
2013-11-05 18:51 What to do about hung firmware? Ben Greear
@ 2013-11-06 7:07 ` Michal Kazior
2013-11-06 7:21 ` Ben Greear
2013-11-06 7:46 ` Kalle Valo
0 siblings, 2 replies; 6+ messages in thread
From: Michal Kazior @ 2013-11-06 7:07 UTC (permalink / raw)
To: Ben Greear; +Cc: ath10k
On 5 November 2013 19:51, Ben Greear <greearb@candelatech.com> wrote:
> I'm seeing cases where it appears the firmware just gets
> stuck and will not answer any WMI requests.
You probably mean FW doesn't replenish HTT TX credits for WMI.
> ath10k just patiently keeps timing out WMI commands,
> (while holding locks, and making the whole system run slow).
>
> Should we maybe keep a last-msg-from firmware time stamp
> and just whack the firmware if we detect it hung? In addition
> to this, we could add some 'ping' message that will get sent
> periodically to the firmware to make sure it is alive.
> We should be able to do this with existing WMI API, just
> need to pick a message to send that expects some response.
Probably the easiest/shortest way to do this is to store a timestamp
in ath10k_wmi_op_ep_tx_credits() and check against it in
ath10k_wmi_cmd_send(). Once you deem FW stopped responding you could
queue ar->restat_work.
You probably could try WMI_ECHO_CMDID to implement a keep alive when
idling (i.e. not sending WMI commands for a few seconds at least).
Michał
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What to do about hung firmware?
2013-11-06 7:07 ` Michal Kazior
@ 2013-11-06 7:21 ` Ben Greear
2013-11-06 7:46 ` Kalle Valo
1 sibling, 0 replies; 6+ messages in thread
From: Ben Greear @ 2013-11-06 7:21 UTC (permalink / raw)
To: Michal Kazior; +Cc: ath10k
On 11/05/2013 11:07 PM, Michal Kazior wrote:
> On 5 November 2013 19:51, Ben Greear <greearb@candelatech.com> wrote:
>> I'm seeing cases where it appears the firmware just gets
>> stuck and will not answer any WMI requests.
>
> You probably mean FW doesn't replenish HTT TX credits for WMI.
Maybe so..I added some debugging to check on that, but then of course
I could not reproduce the problem.
>> ath10k just patiently keeps timing out WMI commands,
>> (while holding locks, and making the whole system run slow).
>>
>> Should we maybe keep a last-msg-from firmware time stamp
>> and just whack the firmware if we detect it hung? In addition
>> to this, we could add some 'ping' message that will get sent
>> periodically to the firmware to make sure it is alive.
>> We should be able to do this with existing WMI API, just
>> need to pick a message to send that expects some response.
>
> Probably the easiest/shortest way to do this is to store a timestamp
> in ath10k_wmi_op_ep_tx_credits() and check against it in
> ath10k_wmi_cmd_send(). Once you deem FW stopped responding you could
> queue ar->restat_work.
>
> You probably could try WMI_ECHO_CMDID to implement a keep alive when
> idling (i.e. not sending WMI commands for a few seconds at least).
Sounds good. I'll work on this if I start seeing the lockups
again...
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What to do about hung firmware?
2013-11-06 7:07 ` Michal Kazior
2013-11-06 7:21 ` Ben Greear
@ 2013-11-06 7:46 ` Kalle Valo
2013-11-06 16:51 ` Ben Greear
1 sibling, 1 reply; 6+ messages in thread
From: Kalle Valo @ 2013-11-06 7:46 UTC (permalink / raw)
To: Michal Kazior; +Cc: Ben Greear, ath10k
Michal Kazior <michal.kazior@tieto.com> writes:
> You probably could try WMI_ECHO_CMDID to implement a keep alive when
> idling (i.e. not sending WMI commands for a few seconds at least).
Sending something periodically would be bad from power consumption point
of view. We would need to either disable it by default, only send it if
there's a problem or something like that.
--
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What to do about hung firmware?
2013-11-06 7:46 ` Kalle Valo
@ 2013-11-06 16:51 ` Ben Greear
2013-11-06 17:40 ` Kalle Valo
0 siblings, 1 reply; 6+ messages in thread
From: Ben Greear @ 2013-11-06 16:51 UTC (permalink / raw)
To: Kalle Valo; +Cc: Michal Kazior, ath10k
On 11/05/2013 11:46 PM, Kalle Valo wrote:
> Michal Kazior <michal.kazior@tieto.com> writes:
>
>> You probably could try WMI_ECHO_CMDID to implement a keep alive when
>> idling (i.e. not sending WMI commands for a few seconds at least).
>
> Sending something periodically would be bad from power consumption point
> of view. We would need to either disable it by default, only send it if
> there's a problem or something like that.
Ok, how about this:
If we hit the 3*HZ timeout, then we send a ping to the firmware
even if we are out of tickets.
If we get no response to that in 3*HZ or so, then consider firmware
hung and reset it.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: What to do about hung firmware?
2013-11-06 16:51 ` Ben Greear
@ 2013-11-06 17:40 ` Kalle Valo
0 siblings, 0 replies; 6+ messages in thread
From: Kalle Valo @ 2013-11-06 17:40 UTC (permalink / raw)
To: Ben Greear; +Cc: Michal Kazior, ath10k
Ben Greear <greearb@candelatech.com> writes:
> On 11/05/2013 11:46 PM, Kalle Valo wrote:
>> Michal Kazior <michal.kazior@tieto.com> writes:
>>
>>> You probably could try WMI_ECHO_CMDID to implement a keep alive when
>>> idling (i.e. not sending WMI commands for a few seconds at least).
>>
>> Sending something periodically would be bad from power consumption point
>> of view. We would need to either disable it by default, only send it if
>> there's a problem or something like that.
>
> Ok, how about this:
>
> If we hit the 3*HZ timeout, then we send a ping to the firmware
> even if we are out of tickets.
>
> If we get no response to that in 3*HZ or so, then consider firmware
> hung and reset it.
Sounds good to me.
--
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-11-06 17:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-05 18:51 What to do about hung firmware? Ben Greear
2013-11-06 7:07 ` Michal Kazior
2013-11-06 7:21 ` Ben Greear
2013-11-06 7:46 ` Kalle Valo
2013-11-06 16:51 ` Ben Greear
2013-11-06 17:40 ` Kalle Valo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.