All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: Michal Kazior <michal.kazior@tieto.com>
Cc: ath10k <ath10k@lists.infradead.org>
Subject: Re: Anyone seeing tx-credits 'hang'?
Date: Fri, 09 Jan 2015 08:55:35 -0800	[thread overview]
Message-ID: <54B00807.9060909@candelatech.com> (raw)
In-Reply-To: <CA+BoTQk86cnC1eRwqAbbnkMe7rDTvctH8o-JPvM3yT56nEes6g@mail.gmail.com>

On 01/09/2015 02:34 AM, Michal Kazior wrote:
> On 8 January 2015 at 22:24, Ben Greear <greearb@candelatech.com> wrote:
>> I am still working on tracking down tx-credits hang, where it appears
>> to the driver that firmware does not return tx credits, and the driver
>> then gets lots of -11 errors from htc/wmi and will not recover (well,
>> once it recovered after hanging for about 45 minutes, for reasons that are totally
>> beyond me.  I do not normally wait so long).
>>
>> I am using a hacked ath10k driver and CT firmware, but I am suspicious that the problem
>> is not unique to me, though I probably hit the problem much more often
>> due to the types of stress tests I am running.
> 
> I don't recall seeing it recently.
> 
> 
>> I have implemented a keep-alive between my driver and CT firmware,
>> and firmware will assert if it does not get a message within
>> about 10 seconds.  This is a wmi-message, so if we hang due to credits,
>> the firmware will assert and dump a nice crash log (and host can recover).
> 
> FYI the default time mgmt tx can be stuck is 10 seconds (vide the
> tx-credit starvation issue due to hostapd's inactivity measures).

One thing I noticed yesterday is that when the driver tries to put a
vdev down, the firmware will try to flush, and will delay vdev-down
event until fw is flushed.  I changed CT firmware to automatically
flush in this case, but perhaps the driver should explicitly ask
firmware to flush the vdev before putting it down?

Once the driver gets out of sync due to timeouts, the firmware
is likely to assert soon after if wmi hang doesn't happen because
firmware will think vdev is up when it is not, or vice versa.

Also, I notice a pattern in the failure case.

The sequence is almost always something like this:

[lots of vdev up/down, re-associate, etc]

vdev down (this would have timed out if I didn't put in the flush)
  * vdev down is usually last wmi cmd firmware receives.
driver tries to delete peer, that times out (firmware wmi layer never
  saw the command)
firmware reports one or two more messages to driver, and if it manages to report
a dbglog, that shows a tx-timeout message usually within a second of
the vdev down.  This happens whether or not I flush the vdev bringing it
down.

At this point, one more request from driver may be sent, after that,
it is credit starvation.  Firmware continues to run (timers fire, etc).

I think that firmware is also waiting on a completion event from the
CE layer...I plan to dig into that more today.

>> One crash I looked at closely appears to show the firmware thinking it
>> has returned all credits, but driver never received them.  What is more,
>> it seems that the driver thought it sent one additional wmi command
>> that the firmware did not receive in the wmi message handling code.
> 
> Hmm.. A couple of ideas:
>  a) lost interrupt
>  b) silently dropped event buffer (in fw, e.g. due to unforseen lack
> of resources)
>  c) memory barrier / ordering issue (delivered/submitted buffer was a
> mess - I don't know if you're checking the buffer in/out count or
> analyzed all the way down to copy engine)
> 
> You could try adding a few extra mb() (e.g. before copy engine ring
> indexes are updated) for (c), at least in ath10k.
> 
> You could try changing _service_any() to ignore copy engine summary
> mask and iterate i=0..CE_COUNT-1 and try polling htc-wmi rx pipe (or
> just simply all of them :P) with ath10k_hif_send_complete_check().

Yes, I suspect CE transport issue...I have not dug into that code yet,
but I will do so today.

Thanks,
Ben

> 
> 
> Michal
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

  reply	other threads:[~2015-01-09 16:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-08 21:24 Anyone seeing tx-credits 'hang'? Ben Greear
2015-01-09 10:34 ` Michal Kazior
2015-01-09 16:55   ` Ben Greear [this message]
2015-01-12  8:06     ` Michal Kazior
2015-01-12 16:51       ` Ben Greear
2015-01-13 19:07       ` Ben Greear
2015-01-14  9:45         ` Michal Kazior
2015-01-14 17:57           ` Ben Greear
     [not found]             ` <54B6D67C.4090006@qca.qualcomm.com>
     [not found]               ` <54B6DE13.1080609@candelatech.com>
2015-01-15  1:54                 ` Peter Oh
2015-01-15  7:48             ` Michal Kazior
2015-01-15 17:17               ` Ben Greear
2015-01-20  4:34               ` Ben Greear
2015-01-21  7:22                 ` Michal Kazior
2015-01-21 15:42                   ` Ben Greear
2015-01-22  6:11                     ` Michal Kazior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54B00807.9060909@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=ath10k@lists.infradead.org \
    --cc=michal.kazior@tieto.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.